Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transport: allow to de-serialize arbitrary objects given their name #12571

Merged

Conversation

javanna
Copy link
Member

@javanna javanna commented Jul 31, 2015

This commit makes it possible to serialize arbitrary objects by having them extend Writeable. When reading them though, we need to be able to identify which object we have to create, based on its name. This is useful for queries once we move to parsing on the coordinating node, as well as with aggregations and so on.

Introduced a new abstraction called NamedWriteable, which is supported by StreamOutput and StreamInput through writeNamedWriteable and readNamedWriteable methods. A new NamedWriteableRegistry is introduced also where named writeable prototypes need to be registered so that we are able to retrieve the proper instance of the writeable given its name and then de-serialize it calling readFrom against it.

We decided to streamline the support for NamedWriteables and make related methods available across the board in StreamInput and StreamOutput. That said the new write* and read* methods are package private so they can be tested but won't be made public. The idea is to add specific methods once we have named writeable to be streamed, e.g.:

public QueryBuilder readQuery() {
    return readNamedWriteable("query");
}

and

public void writeQuery(QueryBuilder queryBuilder) {
    writeNamedWriteable("query", queryBuilder);
}

The above methods cannot be added yet as neither queries nor aggs are streamable yet.

@colings86
Copy link
Contributor

Not sure how it could be done without producing a massive PR but its a shame that anything using the NamedWriteableAwareStreamInput needs to cast the StreamInput it receives.

Also I wonder if we should have a NamedWriteableAwareStreamOutput so we are sure that classes using NamedWritable are sure to write the object properly. So it would have writeNamedWritable(NamedWritable), writeOptionalNamedWritable(NamedWritable) and write NamedWritableArray(NamedWritable[])?

It seems like this change is going to get complicated due to the mechanism for determining when to wrap the StreamInput so I agree that maybe we should explore how big the context argument option is?

@javanna
Copy link
Member Author

javanna commented Jul 31, 2015

Also I wonder if we should have a NamedWriteableAwareStreamOutput so we are sure that classes using NamedWritable are sure to write the object properly. So it would have writeNamedWritable(NamedWritable), writeOptionalNamedWritable(NamedWritable) and write NamedWritableArray(NamedWritable[])?

That is why we have the new serializer object that exposes the methods to read and write named writeables, you have to effectively go through it so you can read and write named writeables. That said I agree with you the casting is not great, and the current wrapping of the stream is even worse :)

I am all for adding a context argument to all readFrom methods at this point. The only condition is that the context needs to expose final objects only and must not change its state while reading.

Let's see what @jpountz and @s1monw think about this.

@javanna javanna force-pushed the enhancement/named_writeable_serializer branch from b8597a8 to f97cba6 Compare August 4, 2015 15:41
@javanna
Copy link
Member Author

javanna commented Aug 4, 2015

After talking to @s1monw and @jpountz we decided to go back to something closer to the original implementation (what we have in the query-refactoring branch). We wrap the stream (only in case of request) and named writeables are supported across the board. The default registry is empty if the stream is not wrapped with one that has a non empty registry. Also we went for exposing specific readQuery and readAggregation method in the future rather than the generic readNamedWriteable and writeNamedWriteable methods. I updated the description of the PR accordingly and removed the WIP label, this is ready for review now.

@javanna javanna removed the WIP label Aug 5, 2015

@Override
public StreamInput setVersion(Version version) {
return delegate.setVersion(version);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the builder pattern is trappy here: I think this should call delegate.setVersion() and then return this? I quickly looked and I don't see many call sites relying on the fact that StreamInput.setVersion returns itself, so maybe we can make it void to remove the trap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed good catch...

@javanna javanna force-pushed the enhancement/named_writeable_serializer branch from fd6e6b0 to 566d64c Compare August 5, 2015 12:09
/**
* Wraps a {@link StreamInput} and associates it with a {@link NamedWriteableRegistry}
*/
public class FilterStreamInput extends StreamInput {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now wondering if naming is still fine, maybe NamedWriteableAwareStreamInput or something along those lines (and shorter!) would be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could keep this FilterStreamInput that just delegates everything, and add a new NamedWriteableAwareStreamInput (or just WriteableStreamInput) that extends it and adds the logic to look into the registry to deserialize?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, will do

@javanna
Copy link
Member Author

javanna commented Aug 6, 2015

@jpountz I updated the PR according to your comments, it's ready for another round of review

}

@Override
<C> C readNamedWriteable(@SuppressWarnings("unused") Class<C> categoryClass) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be:
<C> C readNamedWriteable(@SuppressWarnings("unused") Class<? extends C> categoryClass) throws IOException {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also what is categoryClass tagged as unused? I see it used to look up the registry?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the annotation is a bug, it is needed only in the base class, will fix. I am not sure about the Class<? extends C>. If we are deserializing an arbitrary query, the base class QueryBuilder implements NamedWriteable, but when we read an arbitrary query we cannot expect a subclass of it, we just do QueryBuilder query = readQuery(); which will call readNamedWriteable(QueryBuilder.class). That is why I kept Class as a type. Does it make sense?

* Default implementation throws {@link UnsupportedOperationException} as StreamInput doesn't hold a registry.
* Use {@link FilterInputStream} instead which wraps a stream and supports a {@link NamedWriteableRegistry} too.
*/
<C> C readNamedWriteable(@SuppressWarnings("unused") Class<C> categoryClass) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Class<C>/Class<? extends C>/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense as-is for the reason stated above

@javanna
Copy link
Member Author

javanna commented Aug 6, 2015

I pushed another commit that should address the last review

@jpountz
Copy link
Contributor

jpountz commented Aug 6, 2015

LGTM

This commit makes it possible to serialize arbitrary objects by having them extend Writeable. When reading them though, we need to be able to identify which object we have to create, based on its name. This is useful for queries once we move to parsing on the coordinating node, as well as with aggregations and so on.

Introduced a new abstraction called NamedWriteable, which is supported by StreamOutput and StreamInput through writeNamedWriteable and readNamedWriteable methods. A new NamedWriteableRegistry is introduced also where named writeable prototypes need to be registered so that we are able to retrieve the proper instance of the writeable given its name and then de-serialize it calling readFrom against it.

Closes elastic#12393
@javanna javanna force-pushed the enhancement/named_writeable_serializer branch from a005b50 to e1e9e1a Compare August 6, 2015 10:26
@javanna javanna removed the review label Aug 6, 2015
@javanna javanna merged commit e1e9e1a into elastic:master Aug 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants