-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue 11007] Schema.NATIVE_AVRO: add a version of AUTO_PRODUCE_BYTES that doesn't validate the message in encode
#11238
Conversation
encode
encode
encode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for sharing this work.
I left one comment
pulsar-client-api/src/main/java/org/apache/pulsar/client/api/Schema.java
Outdated
Show resolved
Hide resolved
@Zhen-hao thanks for your contribution. For this PR, do we need to update docs? |
I think so. Maybe next to that of I need to figure out why the build is failing and where to add tests. |
.../src/main/java/org/apache/pulsar/client/impl/schema/AutoProduceValidatedAvroBytesSchema.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/pulsar/client/impl/schema/AutoProduceValidatedAvroBytesSchema.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/pulsar/client/impl/schema/AutoProduceValidatedAvroBytesSchema.java
Outdated
Show resolved
Hide resolved
pulsar-client-api/src/main/java/org/apache/pulsar/client/api/Schema.java
Show resolved
Hide resolved
@Anonymitaet I see Should I only change |
how often does the CI run? I find the feedback loop too long. |
They are updated manually. If the doc changes apply to several versions, we need to update them all. Would you like to add docs accordingly? Then you can ping me to review, thanks |
I don't understand why the build is failing with
In @Override
public SchemaInfo getSchemaInfo() {
ensureSchemaInitialized();
return schema.getSchemaInfo();
} |
AutoProduceBytesSchema already has a "schema" field, and you are adding a new field that hides the inherited one |
OK. I think the easiest to do is copy the same code to the new class. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion we are on our way.
We have to add tests.
You can find examples like SimpleSchemaTest or SchemaTest....
@sijie is current approach following your expectations?
Tagging @codelipenghui @congbobo184 @gaoran10
.../src/main/java/org/apache/pulsar/client/impl/schema/AutoProduceValidatedAvroBytesSchema.java
Outdated
Show resolved
Hide resolved
I will find time over the weekend to add the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good!
I left one last comment about getNativeSchema
|
||
@Override | ||
public Optional<Object> getNativeSchema() { | ||
return Optional.ofNullable(schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you have to return the original org.apache.avro.Schema object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment looks not addressed yet
there is one test error I can't figure out:
from this code what am I missing here? @eolivelli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try to use the debugger with a break point in that point ? probably you have not overridden every method in the Schema interface.
by the way I believe that your new NativeAvroBytesSchema
is useful only for the Producer.
if you want to read raw bytes you simply do not set a Schema in the consumer or you use Schema.BYTES.
NativeAvroBytesSchema
does not perform validation and it does not decode the payload, so it is not worth to be used on the Consumer side.
I suggest you to:
- throw UnsupportedOperationException in all "decode" methods of NativeAvroBytesSchema
- write explicitly in the Javadoc that this is a Schema to be used on the Producer side
- use Schema.BYTES (or no Schema) in the Consumer in your test
|
||
@Override | ||
public Optional<Object> getNativeSchema() { | ||
return Optional.ofNullable(this.nativeSchema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it cannot be null
|
||
@Override | ||
public Schema<byte[]> clone() { | ||
return new AutoProduceBytesSchema<>(schema.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return new NativeAvroBytesSchema(nativeSchema)
You are right. I've made the changes as you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@sijie can you take another look please ?
@Zhen-hao I believe you can remove the "draft" flag from this PR |
Cool. I still need to update some documentation page before we can merge. |
where do you want to add docs apart from the JavaDocs ? |
I see maybe @Anonymitaet knows better if this is necessary. |
it is a good idea, btw we can separate code changes from documentation changes. you can follow up with a separate PR. |
If other reviewers agree, I don't mind doing that;) |
@Zhen-hao looks good to me! |
…te the message in `encode` (apache#11238) Fixes apache#11007 ### Motivation When ingesting event/message data from external systems such as Kafka and Cassandra, the events very often are already serialized with Avro with the schemas also available. In such cases, a Pulsar producer doesn't need to perform the validation step again when sending the events to a topic. ### Modifications Introduce a new class `AutoProduceValidatedAvroBytesSchema` that ~~extends `AutoProduceBytesSchema`~~ implements `Schema<byte[]>`. ~~TODO: make the `public AutoProduceValidatedAvroBytesSchema(org.apache.avro.Schema schema)` constructor accessible to the client.~~ Add `NATIVE_AVRO` method to `org.apache.pulsar.client.api.Schema` which calls `AutoProduceValidatedAvroBytesSchema`'s constructor via reflection.
@Zhen-hao thanks for your great work. Would you like to add docs accordingly? Then you can ping me to review, thanks |
That's the plan. I was overwhelmed by work. I will find time later this week or next week to make a new PR. |
@Anonymitaet unfortunately, I haven't got to it yet. but it is on my list... |
@Zhen-hao thanks for your feedback. Please do not forget to add docs to let users know the changes. |
encode
encode
…te the message in `encode` (apache#11238) Fixes apache#11007 ### Motivation When ingesting event/message data from external systems such as Kafka and Cassandra, the events very often are already serialized with Avro with the schemas also available. In such cases, a Pulsar producer doesn't need to perform the validation step again when sending the events to a topic. ### Modifications Introduce a new class `AutoProduceValidatedAvroBytesSchema` that ~~extends `AutoProduceBytesSchema`~~ implements `Schema<byte[]>`. ~~TODO: make the `public AutoProduceValidatedAvroBytesSchema(org.apache.avro.Schema schema)` constructor accessible to the client.~~ Add `NATIVE_AVRO` method to `org.apache.pulsar.client.api.Schema` which calls `AutoProduceValidatedAvroBytesSchema`'s constructor via reflection. (cherry picked from commit a78b029)
…te the message in `encode` (apache#11238) Fixes apache#11007 When ingesting event/message data from external systems such as Kafka and Cassandra, the events very often are already serialized with Avro with the schemas also available. In such cases, a Pulsar producer doesn't need to perform the validation step again when sending the events to a topic. Introduce a new class `AutoProduceValidatedAvroBytesSchema` that ~~extends `AutoProduceBytesSchema`~~ implements `Schema<byte[]>`. ~~TODO: make the `public AutoProduceValidatedAvroBytesSchema(org.apache.avro.Schema schema)` constructor accessible to the client.~~ Add `NATIVE_AVRO` method to `org.apache.pulsar.client.api.Schema` which calls `AutoProduceValidatedAvroBytesSchema`'s constructor via reflection. (cherry picked from commit a78b029)
…te the message in `encode` (apache#11238) Fixes apache#11007 ### Motivation When ingesting event/message data from external systems such as Kafka and Cassandra, the events very often are already serialized with Avro with the schemas also available. In such cases, a Pulsar producer doesn't need to perform the validation step again when sending the events to a topic. ### Modifications Introduce a new class `AutoProduceValidatedAvroBytesSchema` that ~~extends `AutoProduceBytesSchema`~~ implements `Schema<byte[]>`. ~~TODO: make the `public AutoProduceValidatedAvroBytesSchema(org.apache.avro.Schema schema)` constructor accessible to the client.~~ Add `NATIVE_AVRO` method to `org.apache.pulsar.client.api.Schema` which calls `AutoProduceValidatedAvroBytesSchema`'s constructor via reflection. (cherry picked from commit a78b029) (cherry picked from commit f4ef4f3)
…te the message in `encode` (apache#11238) Fixes apache#11007 ### Motivation When ingesting event/message data from external systems such as Kafka and Cassandra, the events very often are already serialized with Avro with the schemas also available. In such cases, a Pulsar producer doesn't need to perform the validation step again when sending the events to a topic. ### Modifications Introduce a new class `AutoProduceValidatedAvroBytesSchema` that ~~extends `AutoProduceBytesSchema`~~ implements `Schema<byte[]>`. ~~TODO: make the `public AutoProduceValidatedAvroBytesSchema(org.apache.avro.Schema schema)` constructor accessible to the client.~~ Add `NATIVE_AVRO` method to `org.apache.pulsar.client.api.Schema` which calls `AutoProduceValidatedAvroBytesSchema`'s constructor via reflection.
(If this PR fixes a github issue, please add
Fixes #<xyz>
.)Fixes #11007
Motivation
When ingesting event/message data from external systems such as Kafka and Cassandra, the events very often are already serialized with Avro with the schemas also available. In such cases, a Pulsar producer doesn't need to perform the validation step again when sending the events to a topic.
Modifications
Introduce a new class
AutoProduceValidatedAvroBytesSchema
thatextendsimplementsAutoProduceBytesSchema
Schema<byte[]>
.TODO: make thepublic AutoProduceValidatedAvroBytesSchema(org.apache.avro.Schema schema)
constructor accessible to the client.Add
NATIVE_AVRO
method toorg.apache.pulsar.client.api.Schema
which callsAutoProduceValidatedAvroBytesSchema
's constructor via reflection.Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If
yes
was chosen, please highlight the changesDocumentation