Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Schema] Support consume multiple schema types messages by AutoConsumeSchema #10604

Merged
merged 16 commits into from
May 23, 2021

Conversation

gaoran10
Copy link
Contributor

Based on the PR #10573

Motivation

Support consuming multiple schema types messages by AutoConsumeSchema.

Modifications

Describe the modifications you've done.

Verifying this change

Add a new unit test to verify consuming multiple schema type messages by AutoConsumeSchema.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work
it is a good complement for all of the schema related enhancements we delivered in 2.8.0

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have merged #10573
please rebase.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codelipenghui
Copy link
Contributor

Error:  Tests run: 6, Failures: 1, Errors: 0, Skipped: 5, Time elapsed: 0.273 s <<< FAILURE! - in org.apache.pulsar.client.impl.schema.generic.GenericSchemaTest
Error:  testAutoAvroSchema(org.apache.pulsar.client.impl.schema.generic.GenericSchemaTest)  Time elapsed: 0.034 s  <<< FAILURE!
java.lang.NullPointerException
	at org.apache.pulsar.client.impl.schema.generic.GenericSchemaTest.verifyFooRecord(GenericSchemaTest.java:191)
	at org.apache.pulsar.client.impl.schema.generic.GenericSchemaTest.testEncodeAndDecodeGenericRecord(GenericSchemaTest.java:118)
	at org.apache.pulsar.client.impl.schema.generic.GenericSchemaTest.testAutoAvroSchema(GenericSchemaTest.java:81)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

@@ -61,7 +62,7 @@ public void testGetSchema() throws Exception {
any(TopicName.class),
any(byte[].class)))
.thenReturn(completableFuture);
SchemaInfo schemaInfoByVersion = schemaProvider.getSchemaByVersion(new byte[0]).get();
SchemaInfo schemaInfoByVersion = schemaProvider.getSchemaByVersion(new LongSchemaVersion(0).bytes()).get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like we are changing the behaviour.
can this be a problem ?
do we need to add some compatibility layer ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change of the class MultiVersionSchemaInfoProvider is not necessary and I revert this test. Thanks.

@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I left one suggestion for the test case, PTAL

Assert.assertEquals(genericRecord.get("id"), 10);
break;
case "k_one_v_three_schema_separate":
kv = (KeyValue<?, ?>) nativeObject;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a KeyValue<GenericRecord, GenericRecord>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll fix.

kv = (KeyValue<?, ?>) nativeObject;
jsonNode = ((GenericJsonRecord) kv.getKey()).getJsonNode();
Assert.assertEquals(jsonNode.get("id").intValue(), 1);
jsonNode = ((GenericJsonRecord) kv.getValue()).getJsonNode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about kv.getValue().getNativeObject() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that using the getNativeObject still needs to constraint convert to JsonNode.

@gaoran10 gaoran10 force-pushed the consume-multiple-schema-messages branch from 864a188 to 3a801e0 Compare May 20, 2021 02:20
@gaoran10
Copy link
Contributor Author

gaoran10 commented May 21, 2021

There are still two problems that need to be resolved when using AutoConsumeSchema to consume key-value schema messages.

  1. The keyValue schema data encoding does not consider the KeyValueEncodingType, so if the key and value schema are some but not some KeyValueEncodingType, this will cause an error when decoding message data.

    For example, these two schemas belong to one topic, there will be only one schema version due to they have the same schema data.

Schema.KeyValue(Schema.JSON(Schemas.PersonOne.class),
                        Schema.JSON(Schemas.PersonFour.class), KeyValueEncodingType.SEPARATED)
Schema.KeyValue(Schema.JSON(Schemas.PersonOne.class),
                        Schema.JSON(Schemas.PersonFour.class), KeyValueEncodingType.INLINE)
  1. Primitive schemas have empty schema data, this will cause the KeyValue schema data doesn't distinguish different primitive schemas.

    For example.

Schema.KeyValue(Schema.INT32, Schema.JSON(Schemas.PersonThree.class), KeyValueEncodingType.SEPARATED)
Schema.KeyValue(Schema.BOOL, Schema.JSON(Schemas.PersonThree.class), KeyValueEncodingType.SEPARATED)

Maybe we could adjust the key-value schema encoding logic.

Fix these two problems in the next PR.

@eolivelli
Copy link
Contributor

@gaoran10 I would postpone fixing those problems to a follow up patch
this patch is already quite big.

@@ -357,6 +358,11 @@ public void handleGetSchema(CommandGetSchema commandGetSchema) {
final long clientRequestId = commandGetSchema.getRequestId();
String serviceUrl = getBrokerServiceUrl(clientRequestId);
String topic = commandGetSchema.getTopic();
Optional<SchemaVersion> tempVersion = Optional.empty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about:

 final Optional<SchemaVersion> schemaVersion;
 if (commandGetSchema.hasSchemaVersion()) {
     schemaVersion = Optional.of(commandGetSchema.getSchemaVersion()).map(BytesSchemaVersion::of);
} else {
    schemaVersion = Optional.empty()
 }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

SchemaVersion sv = BytesSchemaVersion.of(schemaVersion);
fetchSchemaIfNeeded(sv);
ensureSchemaInitialized(sv);
return adapt(schemaMap.get(sv).decode(bytes, schemaVersion), schemaVersion);
Copy link
Contributor

@congbobo184 congbobo184 May 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may don't need decode(bytes, schemaVersion), only use decode(bytes) best?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea!

Assert.assertEquals(jsonNode.get("age").intValue(), 18);
break;
case "avro_one_schema":
org.apache.avro.generic.GenericRecord genericRecord =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this type is PersonOne not GenericAvroRecord is right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that if using the Schema.AUTO_CONSUME(), the reader doesn't know the Pojo PersonOne so the decoding result is org.apache.avro.generic.GenericRecord.

@codelipenghui codelipenghui merged commit f5e10a9 into apache:master May 23, 2021
eolivelli pushed a commit to datastax/pulsar that referenced this pull request May 26, 2021
…eSchema (apache#10604)

Based on the PR apache#10573

Support consuming multiple schema types messages by AutoConsumeSchema.

(cherry picked from commit f5e10a9)
eolivelli pushed a commit to datastax/pulsar that referenced this pull request Jun 7, 2021
…eSchema (apache#10604)

Based on the PR apache#10573

Support consuming multiple schema types messages by AutoConsumeSchema.

(cherry picked from commit f5e10a9)
eolivelli added a commit to datastax/pulsar that referenced this pull request Jun 7, 2021
yangl pushed a commit to yangl/pulsar that referenced this pull request Jun 23, 2021
…eSchema (apache#10604)

Based on the PR apache#10573

### Motivation

Support consuming multiple schema types messages by AutoConsumeSchema.
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
…eSchema (apache#10604)

Based on the PR apache#10573

### Motivation

Support consuming multiple schema types messages by AutoConsumeSchema.
@gaoran10 gaoran10 deleted the consume-multiple-schema-messages branch April 8, 2022 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants