Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Message.getReaderSchema() in Pulsar IO Sinks when possible #10557

Merged
merged 5 commits into from
May 14, 2021

Conversation

eolivelli
Copy link
Contributor

@eolivelli eolivelli commented May 12, 2021

Motivation

When you run a Sink and you call record.getSchema(), it does not return an accurate representation of the schema in case of a schema update.

For instance:

  • the topic starts with AVRO Schema schema1
  • the AutoConsumeSchema starts by populating the internal SchemaInfo with the definition of schema1
  • the topic advances to AVRO Schema schema2
  • the AutoConsumeSchema still reports SchemaInfo for schema1
  • record.getSchema().getNativeSchema() reports the wrong schema definition

Modifications

With this change we leverage PIP-85 Message.getReaderSchema() API that returns the exact schema used for the Message, that is the schema that these pieces for information updated to the same "schemaVersion" of the Message represented by the Record:

  • reports the correct getSchemaInfo()
  • reports the correct getNativeSchema()

I also fixing a problem in KeyValueSchema for atSchemaVersion(), the constructor of KeyValueSchema did not fill in the SchemaInfo data structure, resulting in NPEs.

Verifying this change

This change adds a new integration test and a unit test

Does this pull request potentially affect one of the following parts:

It affects Pulsar Sinks that implement Sink, in fact now Record.getSchema() will return accurate schema information.

Documentation

No need for docs, previous behaviour was unexpected, the new behaviour is what you expect.

@eolivelli
Copy link
Contributor Author

I am working on the integration test, I have marked this PR as "draft" for early review

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eolivelli eolivelli marked this pull request as ready for review May 12, 2021 14:56
@congbobo184
Copy link
Contributor

LGTM!

@eolivelli eolivelli merged commit 90117b2 into apache:master May 14, 2021
@eolivelli eolivelli deleted the impl/pulsario-getreaderschema branch May 14, 2021 05:29
eolivelli added a commit to datastax/pulsar that referenced this pull request May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants