New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: introduce new api to control schema translation #8309
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lihaosky ! This refactor LGTM but I have a question about how it fits into the larger KLIP-56 project: once this PR is merged, it will be the case that if key_schema_id
/value_schema_id
is provided, then ksqlDB will no longer uppercase fields in the schema read from Schema Registry, right? Doesn't that mean this PR should be accompanied by changes to the QTTs which explicitly set key_schema_id
or value_schema_id
to expect lowercase field names, instead of uppercase? How come I don't see those changes?
Also one more question: I know there are more PRs planned for KLIP-56. Are we OK with being in a partial state for this project when the 0.24 release is cut? It doesn't look like development on this feature is happening behind a feature flag. |
Good question! I'm updating the tests now. What's general way of put this behind feature flag? Adding |
For larger features, we usually introduce a feature flag into KsqlConfig at the start of feature development and hide all functional changes behind the feature flag so it's safe to release at any point during feature development (even if the feature is only half-baked). In your particular case, I agree with your assessment: I think it's fine that we're in a half-baked state with this feature for 0.24 since the WITH clause properties already existed and were undocumented; I'm not aware of any users who are relying on the feature (though if that understanding is wrong then we might want to rethink this). I just wanted to double-check that we're not in a state where development of this feature impacts other ksql behaviors (e.g., if the previous version of this PR had been merged, then existing schema inference would've resulted in field case being preserved, which would have been a visible change). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lihaosky -- LGTM besides the minor comment I left inline, but I notice that the build is also failing. The current failures seem to be checkstyle related. It'd be good to unblock those in case it's hiding test failures.
Also, it'd be good to merge this PR soon in order to block the 0.24 release cut. I'd rather not cut the release without this PR being merged since then we'd be writing key_schema_id
and value_schema_id
into the command topic as part of a released version, and we'd have to maintain backwards compatibility for such command topic entries. Unless you think this PR will be ready to merge by the end of tomorrow (Friday), can you open a separate PR just to back out the changes that are causing key_schema_id
and value_schema_id
to be written into the command topic, in order to unblock the release? Thanks!
{ | ||
"name": "validate schema id without elements OK - PROTOBUF", | ||
"statements": [ | ||
"CREATE STREAM INPUT WITH (kafka_topic='input', value_format='PROTOBUF');", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see schema id in this statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vcrfxia ! Should be able to merge this one by end of tomorrow.
{ | ||
"name": "validate schema id without elements OK - PROTOBUF", | ||
"statements": [ | ||
"CREATE STREAM INPUT WITH (kafka_topic='input', value_format='PROTOBUF');", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
315c4a6
to
9455c58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lihaosky -- LGTM besides the comments inline (and the failing build).
@@ -57,7 +57,7 @@ ASSERT VALUES bar (rowtime, id, col1) VALUES (1, 1, 1); | |||
--@test: basic test with key_schema_id (format AVRO) | |||
---------------------------------------------------------------------------------------------------- | |||
CREATE STREAM foo (id INT KEY, col1 INT) WITH (kafka_topic='foo', key_format='AVRO', value_format='AVRO', key_schema_id=123); | |||
ASSERT STREAM foo (id INT KEY, col1 INT) WITH (kafka_topic='foo', key_format='AVRO', value_format='AVRO', key_schema_id=123); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being removed? The test is called basic test with key_schema_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was added by my previos PR when putting schema id to format properties: 84e1765. Not we don't put it there, it's deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should delete the entire test, right? The test currently no longer tests what the name of the test suggests.
ASSERT STREAM bar (id INT KEY, col1 INT) WITH (kafka_topic='BAR', value_format='JSON', wrap_single_value=false); | ||
|
||
---------------------------------------------------------------------------------------------------- | ||
--@test: assert format doesn't support schema_id property should fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test being removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we previously put schema id to format properties. We expect KAFKA not support it and throw exception, now we don't put it there, it will be ignored. It was added in 84e1765. I have next PR to do validation so that we throw exceptions when schema id is provided and format doesn't support schema inference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take a look at build again...
@@ -57,7 +57,7 @@ ASSERT VALUES bar (rowtime, id, col1) VALUES (1, 1, 1); | |||
--@test: basic test with key_schema_id (format AVRO) | |||
---------------------------------------------------------------------------------------------------- | |||
CREATE STREAM foo (id INT KEY, col1 INT) WITH (kafka_topic='foo', key_format='AVRO', value_format='AVRO', key_schema_id=123); | |||
ASSERT STREAM foo (id INT KEY, col1 INT) WITH (kafka_topic='foo', key_format='AVRO', value_format='AVRO', key_schema_id=123); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was added by my previos PR when putting schema id to format properties: 84e1765. Not we don't put it there, it's deleted
ASSERT STREAM bar (id INT KEY, col1 INT) WITH (kafka_topic='BAR', value_format='JSON', wrap_single_value=false); | ||
|
||
---------------------------------------------------------------------------------------------------- | ||
--@test: assert format doesn't support schema_id property should fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we previously put schema id to format properties. We expect KAFKA not support it and throw exception, now we don't put it there, it will be ignored. It was added in 84e1765. I have next PR to do validation so that we throw exceptions when schema id is provided and format doesn't support schema inference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the build passes. Thanks @lihaosky !
Description
key_schema_id
andvalue_schema_id
fromConnectFormat
.key_schema_id
andvalue_schema_id
from format properties.Testing done
Update unit test
Reviewer checklist