Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: introduce new api to control schema translation #8309

Merged
merged 9 commits into from Nov 5, 2021

Conversation

lihaosky
Copy link
Member

@lihaosky lihaosky commented Nov 2, 2021

Description

  • Remove key_schema_id and value_schema_id from ConnectFormat.
  • Introduce new api in Format to control how schema should be translated.
  • Keep current behavior (uppercase field names) for Schema Inference without providing schema id.
  • Remove key_schema_id and value_schema_id from format properties.
  • Remove historical_plans previously added as a result of schema_id in format properties.

Testing done

Update unit test

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@lihaosky lihaosky requested a review from vcrfxia November 2, 2021 20:46
@lihaosky lihaosky requested a review from a team as a code owner November 2, 2021 20:46
Copy link
Contributor

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lihaosky ! This refactor LGTM but I have a question about how it fits into the larger KLIP-56 project: once this PR is merged, it will be the case that if key_schema_id/value_schema_id is provided, then ksqlDB will no longer uppercase fields in the schema read from Schema Registry, right? Doesn't that mean this PR should be accompanied by changes to the QTTs which explicitly set key_schema_id or value_schema_id to expect lowercase field names, instead of uppercase? How come I don't see those changes?

@vcrfxia
Copy link
Contributor

vcrfxia commented Nov 3, 2021

Also one more question: I know there are more PRs planned for KLIP-56. Are we OK with being in a partial state for this project when the 0.24 release is cut? It doesn't look like development on this feature is happening behind a feature flag.

@lihaosky
Copy link
Member Author

lihaosky commented Nov 3, 2021

Good question! I'm updating the tests now.

What's general way of put this behind feature flag? Adding key_schema_id and value_schema_id at last in property to make it available? BTW, key_schema_id and value_schema_id is already available for CS/CT before KLIP-56. KLIP-56 introduce this case fix and other validations. I was under the impression that updating documentation makes this available to use publicly.

@vcrfxia
Copy link
Contributor

vcrfxia commented Nov 3, 2021

For larger features, we usually introduce a feature flag into KsqlConfig at the start of feature development and hide all functional changes behind the feature flag so it's safe to release at any point during feature development (even if the feature is only half-baked). In your particular case, I agree with your assessment: I think it's fine that we're in a half-baked state with this feature for 0.24 since the WITH clause properties already existed and were undocumented; I'm not aware of any users who are relying on the feature (though if that understanding is wrong then we might want to rethink this). I just wanted to double-check that we're not in a state where development of this feature impacts other ksql behaviors (e.g., if the previous version of this PR had been merged, then existing schema inference would've resulted in field case being preserved, which would have been a visible change).

Copy link
Contributor

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lihaosky -- LGTM besides the minor comment I left inline, but I notice that the build is also failing. The current failures seem to be checkstyle related. It'd be good to unblock those in case it's hiding test failures.

Also, it'd be good to merge this PR soon in order to block the 0.24 release cut. I'd rather not cut the release without this PR being merged since then we'd be writing key_schema_id and value_schema_id into the command topic as part of a released version, and we'd have to maintain backwards compatibility for such command topic entries. Unless you think this PR will be ready to merge by the end of tomorrow (Friday), can you open a separate PR just to back out the changes that are causing key_schema_id and value_schema_id to be written into the command topic, in order to unblock the release? Thanks!

{
"name": "validate schema id without elements OK - PROTOBUF",
"statements": [
"CREATE STREAM INPUT WITH (kafka_topic='input', value_format='PROTOBUF');",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see schema id in this statement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Member Author

@lihaosky lihaosky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vcrfxia ! Should be able to merge this one by end of tomorrow.

{
"name": "validate schema id without elements OK - PROTOBUF",
"statements": [
"CREATE STREAM INPUT WITH (kafka_topic='input', value_format='PROTOBUF');",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Contributor

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lihaosky -- LGTM besides the comments inline (and the failing build).

@@ -57,7 +57,7 @@ ASSERT VALUES bar (rowtime, id, col1) VALUES (1, 1, 1);
--@test: basic test with key_schema_id (format AVRO)
----------------------------------------------------------------------------------------------------
CREATE STREAM foo (id INT KEY, col1 INT) WITH (kafka_topic='foo', key_format='AVRO', value_format='AVRO', key_schema_id=123);
ASSERT STREAM foo (id INT KEY, col1 INT) WITH (kafka_topic='foo', key_format='AVRO', value_format='AVRO', key_schema_id=123);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being removed? The test is called basic test with key_schema_id.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added by my previos PR when putting schema id to format properties: 84e1765. Not we don't put it there, it's deleted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should delete the entire test, right? The test currently no longer tests what the name of the test suggests.

ASSERT STREAM bar (id INT KEY, col1 INT) WITH (kafka_topic='BAR', value_format='JSON', wrap_single_value=false);

----------------------------------------------------------------------------------------------------
--@test: assert format doesn't support schema_id property should fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test being removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we previously put schema id to format properties. We expect KAFKA not support it and throw exception, now we don't put it there, it will be ignored. It was added in 84e1765. I have next PR to do validation so that we throw exceptions when schema id is provided and format doesn't support schema inference

Copy link
Member Author

@lihaosky lihaosky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take a look at build again...

@@ -57,7 +57,7 @@ ASSERT VALUES bar (rowtime, id, col1) VALUES (1, 1, 1);
--@test: basic test with key_schema_id (format AVRO)
----------------------------------------------------------------------------------------------------
CREATE STREAM foo (id INT KEY, col1 INT) WITH (kafka_topic='foo', key_format='AVRO', value_format='AVRO', key_schema_id=123);
ASSERT STREAM foo (id INT KEY, col1 INT) WITH (kafka_topic='foo', key_format='AVRO', value_format='AVRO', key_schema_id=123);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added by my previos PR when putting schema id to format properties: 84e1765. Not we don't put it there, it's deleted

ASSERT STREAM bar (id INT KEY, col1 INT) WITH (kafka_topic='BAR', value_format='JSON', wrap_single_value=false);

----------------------------------------------------------------------------------------------------
--@test: assert format doesn't support schema_id property should fail
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we previously put schema id to format properties. We expect KAFKA not support it and throw exception, now we don't put it there, it will be ignored. It was added in 84e1765. I have next PR to do validation so that we throw exceptions when schema id is provided and format doesn't support schema inference

Copy link
Contributor

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the build passes. Thanks @lihaosky !

@lihaosky lihaosky merged commit 6913cf6 into confluentinc:master Nov 5, 2021
@lihaosky lihaosky deleted the refactor-translation branch November 5, 2021 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants