Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORM-2826: Set key/value deserializer fields when using the convenience builder methods in KafkaSpoutConfig #2428

Merged
merged 1 commit into from Dec 16, 2017

Conversation

srdo
Copy link
Contributor

@srdo srdo commented Nov 19, 2017

Builds on STORM-2825.

}
if (valueDesClazz != null) {
this.kafkaProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, valueDesClazz);
if (!this.kafkaProps.containsKey(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srdo can you please clarify what you are trying to do? What happens if this if statement is false? Won't it cause kafkaProps to keep whatever value they have set and the fields keyClassDeserializer something else? What are the implications of that ?

The two ifs bellow, on lines 296 and 299, I think they can possibly be both true and with different values, if you are dealing with subtypes. If so, what happens in that case?

Copy link
Contributor Author

@srdo srdo Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll explain. In #2155 I changed the KafkaSpoutConfig API a bunch to try to avoid having custom methods for properties that users should just set via kafkaProps. Part of the change removes keyDes and keyDesClass, and tells users to set the corresponding properties in kafkaProps instead.

When the changes were backported, I deprecated all constructors or methods referring to those fields, and switched the KafkaSpoutConfig.builder convenience methods to use a constructor that just sets the right properties in kafkaProps. It turns out this is a breaking change for users that build a KafkaSpoutConfig and use getKeyDeserializer or getValueDeserializer for anything, because they are now null when using the convenience builders, where they defaulted to StringDeserializers before.

In order to retain backwards compatibility, the builders have to set the key/value deserializer fields to StringDeserializer again. I still want to get rid of the fields though, so to allow users to switch to using kafkaProps instead, we'll only use the fields if the corresponding properties in kafkaProps are not set. If we set the properties based on the fields unconditionally, we would overwrite the deserializer settings for users that set the properties in kafkaProps.

What happens if this if statement is false

If the expression is false, the field settings are ignored. This is the behavior I think we want, since it means the user must have set the right property in kafkaProps. The consequence of this is a mismatch between what is in kafkaProps, and what is set in the key deserializer field. I suppose we could overwrite the key deserializer field with whatever is in kafkaProps to resolve the conflict?

The two ifs bellow, on lines 296 and 299, I think they can possibly be both true and with different values, if you are dealing with subtypes. If so, what happens in that case?

We would use the deserializer from the keyDes field. I think this kind of ambiguity is pretty nasty, but it's consistent with previous behavior.

@srdo srdo force-pushed the STORM-2826 branch 2 times, most recently from dc383c4 to baf9e7a Compare November 21, 2017 19:08
this.kafkaProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, valueDesClazz);
}
if (valueDes != null) {
} else if (keyDesClazz != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this change is necessary, and if change is necessary, why we don't change above constructor as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the changes in #2215 is to make users set most Kafka properties through setProp instead of duplicating configuration parameters in KafkaSpoutConfig. These properties include key and value deserializers.

In order to provide backward compatibility that PR tries to ensure that if the keyDes/keyDesClazz field is set, then the corresponding kafkaProps property is also set. The spout doesn't use the fields anymore, it only reads from the kafkaProps map. The setKey/setValue methods were also deprecated, and users were directed to use setProp instead. Additionally the convenience builder methods were changed so they didn't set the keyDes/keyDesClazz fields, but just set properties in kafkaProps instead.

The issue Alexandre hit during testing was that he was doing something like kafkaSpoutConfig.getKeyDeserializer().getClass() on a KafkaSpoutConfig built from one of the convenience builders. Since keyDes/keyDesClazz aren't being set by the default builders anymore, this causes an NPE.

I think we still want to be able to provide the simplified KafkaSpoutConfig API to 1.x, and we still want to encourage users to use setProp instead of setKey/setValue. In order to fix the NPE we also have to set keyDes/keyDesClazz in the convenience builder methods again.

Without this change the code behaves in a very confusing way when mixing use of setKey/setValue and setProp. The added unit tests demonstrate cases where the current code would act counterintuitively. Try pasting the current code into the modified Builder constructor, and you'll see test failures.

The modified constructor is used when setKey/setValue is called. With the current code, calling e.g. setKey will actually overwrite both the key and value properties in kafkaProps if keyDesClazz and valueDesClazz were set in the old Builder.

For example, if you did KafkaSpoutConfig.builder(topic).setProp(MyValueDeserializer.class).setKey(MyKeyDeserializer.class), you end up with a KafkaSpoutConfig where the value deserializer is StringDeserializer. This is because the builder method now sets both keyDesClazz and valueDesClazz by default. When setKey calls the copy constructor, it overwrites both the key and value deserializer properties in kafkaProps with the values of the key/valueDesClazz fields. The updated code makes sure that if you call setKey, then only the key deserializer property will be set.

The change isn't necessary for the constructor further up, because it isn't a copy constructor. The problem with the copy constructor is that it takes in a kafkaProps that already might contain settings for key/value deserializers and overwrites them. The unmodified constructor has an empty kafkaProps, so there's no issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. That doesn't looks like easy to understand, but I understand that's unavoidable for backward compatibility.

@HeartSaVioR
Copy link
Contributor

+1
@hmcl It would be great if you could finish the review, sure, in several days after Thanksgiving week.

@hmcl
Copy link
Contributor

hmcl commented Dec 2, 2017

@HeartSaVioR @srdo I will finalize my review by Monday PST

@arunmahadevan
Copy link
Contributor

@HeartSaVioR can we look at merging this? @hmcl if you have any further comments you can put it here asap.

@hmcl
Copy link
Contributor

hmcl commented Dec 7, 2017

@arunmahadevan I am looking into this now. Thanks.

@@ -359,7 +356,7 @@ private Builder(Builder<?, ?> builder, SerializableDeserializer<K> keyDes, Class
*/
@Deprecated
public <NK> Builder<NK, V> setKey(Class<? extends Deserializer<NK>> clazz) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srdo do you know why this method returns a new builder object? I can't figure a reason for it to so. I suspect that the only reason for that to happen is because the fields of the builder class are final (e.g keyDesClassClazz), and to make the generics work. There is no benefit in having fields inside the builder class to be final. The code snippet bellow also fixes the generics problem. Any reason not to get rid of the builder (with copy constructor) class completely and make this method like this:

public Builder<K,V> setKey(Class<? extends Deserializer<K>> clazz) {
            this.keyDesClazz = clazz;
            if (keyDesClazz != null) {
                this.kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDesClazz);
            }
            return this;
        }

We should do something similar to the other 3 methods. In my opinion has become a bit confusing, and I believe this is one of the last few opportunities we have to make it better. Please let me know your thoughts. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the reasons you mention are the reasons these methods return new builders.

I agree that there is no reason for the fields to be final.

The snippet breaks the ability to change the key/value deserializer types. The existing code allows you to do e.g.

@Test
    public void test() {
       
        KafkaSpoutConfig<String, byte[]> conf = 
            //Use default <String, String> Builder
            KafkaSpoutConfig.builder("localhost:1234", "topic") 
            //Change to byte array value deserializer
            .setValue(ByteArrayDeserializer.class)
            .build();
    }

which now fails to compile because the new type bound on setKey/setValue prevents changing from Deserializer<String> to Deserializer<byte[]>.

You're right that the existing API is somewhat confusing, but it's getting removed in 2.0.0 and is deprecated here. I'd rather not add breaking changes to it if we can avoid it, because if we were going to break the API in a minor version I think we should just have removed the deprecated methods entirely.

@hmcl
Copy link
Contributor

hmcl commented Dec 9, 2017

@srdo aiming to getting this PR merged more quickly I created a PR with a suggested fix off your branch. If you agree with the fix, can you please incorporate it, squash the commits, and push it again here. I will then review it right away. Thanks.

@hmcl
Copy link
Contributor

hmcl commented Dec 12, 2017

@srdo although some of these methods have been deprecated for 2.0, customers that are currently in a 1.x.y version will likely use this version for a few years. We will have to maintain this codebase for quite a long time, and therefore I am in favor of making at least the code a bit more readable. I had quite a hard time to understand what the existing code is doing. I have another suggestion, which I also shared with you on a PR.

I will leave it up to you which one to pick and I am +1 after that such that we can move forward. Thanks.

@srdo
Copy link
Contributor Author

srdo commented Dec 13, 2017

@hmcl Thanks, merged your PR and squashed to two commits. The only change since your PR is a whitespace change, and removing an unnecessary else (there was a throw in the if block)

@hmcl
Copy link
Contributor

hmcl commented Dec 13, 2017

+1. Can you please squash. Thanks.

@srdo
Copy link
Contributor Author

srdo commented Dec 13, 2017

Squashed to one commit. Not entirely sure if this is okay, since authorship is lost for your changes, but maybe it's not a big deal?

@hmcl
Copy link
Contributor

hmcl commented Dec 14, 2017

+1. @srdo don't worry about the authorship. It was just a code review for which I created a PR to make it easier to discuss.

@asfgit asfgit merged commit 1811351 into apache:1.x-branch Dec 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants