Skip to content

AVRO-1932: Java: Allow setting the SchemaStore on generated classes.#141

Closed
nielsbasjes wants to merge 4 commits intoapache:masterfrom
nielsbasjes:AVRO-1932
Closed

AVRO-1932: Java: Allow setting the SchemaStore on generated classes.#141
nielsbasjes wants to merge 4 commits intoapache:masterfrom
nielsbasjes:AVRO-1932

Conversation

@nielsbasjes
Copy link
Contributor

This change is needed to allow setting a different SchemaStore in the IDL generated classes.

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch.
I've added some comments. Please, check them out.

public void setSchemaStore(SchemaStore newResolver) {
if (newResolver != null) {
this.resolver = newResolver;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you think the user of this method should be informed about the fact that the resolver was not set? I would suggest throwing an exception here (e.g. IllegalStateException).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier: Remove that check and thus allow 'removing' the SchemaStore again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this would be simpler if it fits the logic. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. The resolver is an optional thing that is common to be null.

*
* @param newResolver a {@link SchemaStore} to use when decoding buffers
*/
public void setSchemaStore(SchemaStore newResolver) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class supposed to be thread-safe. I would suggest having this one synchronized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the modification and the description.
LGTM.

@rdblue
Copy link
Contributor

rdblue commented Oct 11, 2016

It looks like the only motivation for adding setSchemaStore is to change the store for the decoder that backs the fromByteBuffer convenience method. Since that's all static and shared for the whole JVM, I don't think it's a good idea to expose the decoder for that purpose, or to add the setSchemaStore method unless there's a separate use. (I don't think there is because it's easy to create a decoder with a different one. This only helps if you have to change an existing deocder.)

I originally thought that the best path for setting up a reader for evolving schemas would be to create a new MessageDecoder rather than using the one backing the convenience methods. But this seems pretty reasonable use case. What about instead of changing the schema store, adding a static method addPreviousVersion(Class) and addPreviousVersion(Schema)? I think that would be easier for users, while not making significant changes to the static decoder.

What do you think?

@nielsbasjes
Copy link
Contributor Author

The core of what I need is to be able to set a custom backend to retrieve the schemas from when using the IDL generated classes and transparently deserialize both older and newer schemas. I want to be able to add them manually (i.e. addSchema(Schema)) and be able to store them in an external system like for example http://docs.confluent.io/3.0.1/schema-registry/docs/index.html

If the producing application puts messages with a new schema into Kafka and the downstream application has not yet been updated I want it (without any code modification) be able to pickup the schema from the registry and simply read the provided record and continue. Thus simply having the option to explicitly add new schema's is not enough for the project I'm working on.

When trying to solve this I found a lot of the related variables are either 'final' and/or 'private'. This makes it impossible to add an extra schema to the generated classes at all. Effectively blocking the very nice schema evolution features from the application developer.

With the change I presented here the developer can set a single instance of the SchemaStore per (generated) class. I consider that to be enough. I do not expect people to have a need for multiple 'databases' for their schema's for a single record type (or even in a single application). If they do then they can always create a SchemaStore that retrieves the data from multiple backends.

I tried to minimize the impact on the generated classes so I chose to only add the getDecoder method to open it up and add a method to set the SchemaStore which in case of the generated classes was a final variable set to null.

The addPreviousVersion(class) idea can only be implemented as a method generated into the class as it needs to refer to the SCHEMA$ or the getClassSchema() which are both static and thus defy polymorphism. Also it would probably be useless as in a specific project I do not expect people to duplicate the schema definition in their source tree and change the name or package. So in a specific code base you (in general) do not have a second version as a java class of the same schema.
In cases where this does happen people can call the getClassSchema and simply pass that to addSchema(Schema)
So I think we should leave this out.
The
addPreviousVersion(Schema) is now possible yet it looks like this using this version of my code: MyClass.getDecoder.addSchema(Schema)

In fact today I wrote (as an experiment) a script that loops over the last 20 git commits, generate the source, do a creative fgrep/sed combination and generate a helper class that does a Foo.getDecoder.addSchema for each of the older versions into a class that is then part of my project.

Effectively I generate a static method that does something like this:

MyClass.getDecoder.addSchema(new org.apache.avro.Schema.Parser().parse(" ... JSon 1...");)
MyClass.getDecoder.addSchema(new org.apache.avro.Schema.Parser().parse(" ... JSon 2...");)
MyClass.getDecoder.addSchema(new org.apache.avro.Schema.Parser().parse(" ... JSon 3...");)

I'm still testing that experiment in combination with Flink and Kafka and a custom DeserializationSchema instance (sofar it looks good)
https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/kafka.html#the-deserializationschema

So from my perspective I think the current code in my pull request are (close to) optimal.

@rdblue
Copy link
Contributor

rdblue commented Oct 13, 2016

I don't think that setSchemaStore should be added because its primary purpose is to change the behavior of static methods. Those are convenience methods and I think the right way to use an alternate schema store is to create a message decoder and pass that store to its constructor.

I think it's fine to add the getDecoder and getEncoder methods if you want to use addSchema, I just though that it would be better to add just the one method to add another version (maybe addWriteSchema instead of addPreviousVersion). I'm happy either way.

@nielsbasjes
Copy link
Contributor Author

@rdblue Please have a look at this change. I made the SchemaStore final again yet I added a convenience function to more easily create a new decoder for the specific class. What do you think?

@rdblue
Copy link
Contributor

rdblue commented Oct 13, 2016

+1

Thanks, @nielsbasjes! I think this is a better way forward.

@nielsbasjes
Copy link
Contributor Author

Committed

@nielsbasjes nielsbasjes deleted the AVRO-1932 branch October 16, 2016 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants