-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New schema version registered on INSERT VALUES #6091
Comments
tl;dr This is a real bug, but I'm going down downgrade it from To add some background on the issue - we are eagerly registering schemas when we create streams/tables that don't have any to make sure that the compatibility will be maintained going forward. The default behavior for connect-based serializers is to register the schema when data is produced. Unfortunately, it seems that the serializers are using a different mechanism for building the schema (namely, we don't construct the schema using the basic TBH, I'm not entirely sure why we don't just use the vanilla converter (as we do for both JSON_SR and PROTOBUF formats). The javadoc mentions:
I don't know if this is something legacy or if this is actually needed. We need to dig more into that, but the long story short is that this shouldn't cause us any true issues - the schema registry would not let us register the new schema if it was incompatible with the old one, and it will only register it on the first |
@agavra - would it be correct to say that this issue applies to any serialization that happens after stream creation, not just Also, would the loss of the Connect metadata attributes have any effect to schema compatibility/validation downstream for data types like |
What other types of serialization were you thinking of? This might also be a problem for
I believe we correctly account for precision/scale in decimal specifically, but I think there might be some compatibility issues with certain schema registry configurations (i.e. see #7174 which is a very similar root cause) not so much on the Connect side of things. That being said, this is one of those situations where we'd need to test every combination of things to check if a problem could exist - but since we still use the connect serializers as well as schema registry, we should never register a truly incompatible schema. |
Why cosmetic? Seems it actually breaks Bumping to P0. |
Below is another test. The following sqls create two internal topics: _confluent-ksql-default_query_CTAS_MATERIALIZED_DATA_21-Aggregate-GroupBy-repartition and _confluent-ksql-default_query_CTAS_MATERIALIZED_DATA_21-Aggregate-Aggregate-Materialize-changelog. create stream data (col1 int key, col2 string key, col3 string) with (kafka_topic='data', format='avro', partitions=1); { { Should the names of the two schemas be "MaterializedDataKey", not "DataKey"? |
How we register schema during insertion for avro: https://github.com/confluentinc/ksql/blob/master/ksqldb-serde/src/main/java/io/confluent/ksql/serde/avro/KsqlAvroSerdeFactory.java#L171 How we register schema during creation for avro: https://github.com/confluentinc/ksql/blob/master/ksqldb-serde/src/main/java/io/confluent/ksql/serde/avro/AvroSchemaTranslator.java#L34 Notice during creation time, there's no: We should unify the config to be in one place for schema registration during creation and insertion. |
Thanks for the analysis and details, @lihaosky. Do you happen to have a sense of an LOE for unifying the schema registration config? |
Hi @colinhicks , I think it take about 1.5 weeks to design and implement and 1 more week to do proper testing and fixing broken QTT tests etc (since it may not be backward compatible, we need to find ways to fix ~1000 broken QTT test and do other verifications necessary to make sure it doesn't break anything. This could be time consuming). So totally 2.5 - 3 weeks maybe. |
Describe the bug
When using Avro with Schema Registry in ksqlDB, a schema is registered for the
CREATE STREAM
statement (if it doesn't already exist). Then when inserting records withINSERT VALUES
, a second version of the schema (with one minor metadata difference - noconnect.name
attribute) is subsequently registered. The second version seems unnecessary.To Reproduce
Version: 0.11.0
Create a new Avro-based stream:
View the registered schema:
Insert a record:
View the new schema version:
Note that the only difference in the second schema version is no
connect.name
metadata attribute.Expected behavior
Only one schema version is created when the stream is defined
Actual behaviour
See above
The text was updated successfully, but these errors were encountered: