-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Create stream fails when multiple Protobuf schema definitions exist #8933
Conversation
Thanks for this @spena! A bit more general question: multiple schema definitions can exist and users can specify what they want to pick -- however I noticed that multiple schema versions can also exist? Are we assuming users always pick the latest? |
@pgaref Yes, multiple schema versions can exist. There is another property to select the schema version or
Notice that I am not using a version in |
1f5aa27
to
975caa7
Compare
9b7bca1
to
de82ba2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aweseome! LGTM overall. Some small nits.
/** | ||
* Returns a list of schema names found in the {@code ParsedSchema}. | ||
*/ | ||
default List<String> getSchemaDefinitions(final ParsedSchema schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not applicable to avro or json_sr? They should have at least one definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I just added the method to ConnectFormat
and return a single name as default there. I still don't know how Json and Avro have multiple definitions, so they will return one single name. For others formats, such as delimited, it still returns Zero (the default from Format.java). I also changed the method name to schemaFullNames
@@ -80,4 +85,18 @@ protected ConnectSchemaTranslator getConnectSchemaTranslator( | |||
return new ProtobufSerdeFactory(new ProtobufProperties(formatProps)) | |||
.createSerde(connectSchema, config, srFactory, targetType, isKey); | |||
} | |||
|
|||
@Override | |||
public List<String> getSchemaDefinitions(final ParsedSchema schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests
return new ProtobufData(new ProtobufDataConfig(updatedConfigs)) | ||
.fromConnectSchema(injectSchemaFullName(schema)); | ||
} | ||
|
||
private ProtobufSchema withSchemaFullName(final ProtobufSchema origSchema) { | ||
return fullNameSchema.map(origSchema::copy).orElse(origSchema); | ||
} | ||
|
||
private Schema injectSchemaFullName(final Schema origSchema) { | ||
return fullNameSchema | ||
.map(fullName -> SchemaFullNameAppender.appendSchemaFullName(origSchema, fullName)) | ||
.orElse(origSchema); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add some tests for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests
dab7996
to
45c2aec
Compare
@@ -106,6 +106,15 @@ public static boolean subjectExists( | |||
return getLatestSchema(srClient, subject).isPresent(); | |||
} | |||
|
|||
public static Optional<Integer> getSchemaId( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: getLatestSchemaId
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (supportedProperties.contains(ConnectProperties.SCHEMA_ID)) { | ||
if (!formatInfo.getProperties().containsKey(ConnectProperties.SCHEMA_ID)) { | ||
SchemaRegistryUtil.getSchemaId(srClient, topicName, isKey) | ||
.ifPresent(schemaId -> | ||
propertiesBuilder.put(ConnectProperties.SCHEMA_ID, String.valueOf(schemaId))); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? This seems to change existing logic of data serialization drastically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we need the Serde factory to get the Schema and extract the specified schema definition from SR. This happens in ProtobufSerdeFactory.createSerde()
which calls SerdeUtils.getAndTranslateSchema()
when the schemaId is present.
Also, by setting the SCHEMA_ID, the ProtobufSerdeFactory.getConverter() disables the AUTO_REGISTER_SCHEMAS
to prevent failing because multiple schema definitions on SR.
The same happens for Avro and JsonSR factories. I could refactor the interfaces and classes to accept the physical schema extracted, which is what I need. But seems schemaId does the same, so I reuse it.
What other things does the schemaId do that it's for concern?
45c2aec
to
f76e136
Compare
f76e136
to
c3063ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. We will only inject schema id during insertion time when it's protobuf and has multiple messages.
One caveat is that if schema in SR has multiple messages, CREATE or replace
with more columns and then insertion
will always fail because old schema in SR has less columns and we don't do autoregistration to SR.
Approve the PR to unblock but @spena will merge after fixing the schema id injection logic and do more testing :)
50b92ba
to
b69920e
Compare
Description
Fixes #5265
Ignore 1st commit 58e4703 which is being reviewed here -> #8984
This PR supports the
KEY_SCHEMA_FULL_NAME
andVALUE_SCHEMA_FULL_NAME
properties in theWITH
clause when creating a stream. These properties allows users to specify the full name of a protobuf schema instead of using a default name (current default isConnectDefault1
).When registering a new schema, if the properties are given, then the schema will be registered in SR with the new name, i.e.
The KEY/VALUE properties can also be used to specify the name of the Protobuf message schema if multiple schema definitions are available in SR. i.e.
The KEY/VALUE properties are already supported for AVRO. This PR assumes those properties were meant for other formats too, but when Protobuf was implemented then these props weren't fixed.
Testing done
Manually. See the above examples.
I have some limitations with QTT that does not allow me to test the KEY/VALUE properties.
Reason:
Reviewer checklist