Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UX: CSAS returns control to UI before completing, subsequent statement fails with Unable to verify the AVRO schema is compatible with KSQL. Subject not found. #1394

Closed
rmoff opened this issue Jun 8, 2018 · 14 comments · Fixed by #8614
Assignees
Labels
bug streaming-engine Tickets owned by the ksqlDB Streaming Team user-experience
Projects
Milestone

Comments

@rmoff
Copy link
Contributor

rmoff commented Jun 8, 2018

I want to be able to run a set of KSQL statements, either cut & paste as part of a demo, or as a script passed to KSQL.

I've hit a problem with this:

ksql> CREATE STREAM CUSTOMERS_SRC_REKEY WITH (PARTITIONS=1) AS SELECT * FROM CUSTOMERS_SRC PARTITION BY ID;

 Message
----------------------------
 Stream created and running
----------------------------
ksql> CREATE TABLE CUSTOMERS WITH (KAFKA_TOPIC='CUSTOMERS_SRC_REKEY', VALUE_FORMAT ='AVRO', KEY='ID');
 Unable to verify the AVRO schema is compatible with KSQL. Subject not found.; error code: 40401
ksql> CREATE TABLE CUSTOMERS WITH (KAFKA_TOPIC='CUSTOMERS_SRC_REKEY', VALUE_FORMAT ='AVRO', KEY='ID');
 Unable to verify the AVRO schema is compatible with KSQL. Subject not found.; error code: 40401
ksql> CREATE TABLE CUSTOMERS WITH (KAFKA_TOPIC='CUSTOMERS_SRC_REKEY', VALUE_FORMAT ='AVRO', KEY='ID');

 Message
---------------
 Table created
---------------
ksql>

What I think is happening is that the CSAS results in a new Avro schema, which is not registered in time before the subsequent statement (which requires it) is executed. The second statement has to be manually retried until it succeeds (a few seconds later).

KSQL needs to either (a) wait until the first statement has fully executed or (b) before throwing the error in the second statement, check if there is some pending change that would allow it to complete without error.

@apurvam
Copy link
Contributor

apurvam commented Nov 30, 2018

I think the fundamental issue is going to be fixed by #2159

Once we update the CLI to use those changes, it should address this bug.

cc @rodesai @vcrfxia: please cross check if the above is true!

@vcrfxia
Copy link
Contributor

vcrfxia commented Nov 30, 2018

@apurvam yes, this is exactly the sort of bug #2159 is intended to address -- thanks for connecting the dots! Our plan is to, in a follow-up PR, have the CLI by default wait until all previous statements have executed before validating new ones, which should indeed fix this bug.

EDIT: Turns out I misunderstood the bug called out in this issue, and #2159 will not fix this bug. See new response below.

@rmoff
Copy link
Contributor Author

rmoff commented Dec 7, 2018

@vcrfxia how will this work in the situation where there are no new records arriving in the source (in the above example, CUSTOMERS_SRC_REKEY)? From what I've observed (and could be wrong) the topic won't get created and schema registered until at least one record has flowed through.

@vcrfxia
Copy link
Contributor

vcrfxia commented Dec 11, 2018

Consulted with @rodesai and it turns out #2159 will not fix this bug, for the reason brought up by @rmoff . Currently, KSQL does not register Avro schemas for a topic created via a CSAS statement until the first record flows through. #2159 does not change this behavior, and will not fix the bug as a result. To fix the bug, KSQL should create the Avro schema as part of acknowledging the CSAS statement, rather than waiting for the first record to flow through and having the serializer create the schema.

cc @apurvam

@apurvam
Copy link
Contributor

apurvam commented Dec 11, 2018

ok. Thanks for the follow up @vcrfxia . We can leave this open then.

@rmoff
Copy link
Contributor Author

rmoff commented Dec 11, 2018

Thanks @vcrfxia , @apurvam.

For anyone encountering this issue, a workaround I use for my demos is to include a SELECT … LIMIT 1 after the CSAS, and before the CT, which makes KSQL wait until it gets a record before continuing e.g.

SET 'auto.offset.reset' = 'earliest';
CREATE STREAM ACCOUNTS_STREAM WITH (KAFKA_TOPIC='asgard.demo.accounts', VALUE_FORMAT='AVRO');
CREATE STREAM ACCOUNTS_REKEYED WITH (PARTITIONS=1) AS SELECT * FROM ACCOUNTS_STREAM PARTITION BY ACCOUNT_ID;
-- This select statement is simply to make sure that we have time for the ACCOUNTS_REKEYED topic
-- to be created before we define a table against it
SELECT * FROM ACCOUNTS_REKEYED LIMIT 1;
CREATE TABLE ACCOUNTS WITH (KAFKA_TOPIC='ACCOUNTS_REKEYED',VALUE_FORMAT='AVRO',KEY='ACCOUNT_ID');

@rmoff
Copy link
Contributor Author

rmoff commented Jun 14, 2019

To fix the bug, KSQL should create the Avro schema as part of acknowledging the CSAS statement, rather than waiting for the first record to flow through and having the serializer create the schema.

@vcrfxia is this on the roadmap? Is the only workaround, for now, to explicitly specify the Avro schema in the dependent CT statement?

@vcrfxia
Copy link
Contributor

vcrfxia commented Jun 14, 2019

I'm not aware of it being on the immediate roadmap, but @MichaelDrogalis or @apurvam can answer definitively.

Yes, as you mentioned, one workaround is to specify the schema in the CT statement, rather than rely on schema inference. Another workaround (for the CLI) is to follow the CSAS with a SELECT ... LIMIT 1 on the stream as you mentioned in #2880. I'm not aware of any other workarounds at the moment.

@MichaelDrogalis
Copy link
Contributor

We don't have this one on the roadmap right now, unfortunately.

@big-andy-coates big-andy-coates added this to Needs triage in Bugs Oct 25, 2019
@ivangfr
Copy link

ivangfr commented Jul 10, 2020

I am facing the same problem since I migrated from version 5.4.1 to 5.5.1. Here is the project (https://github.com/ivangfr/springboot-kafka-debezium-ksql/tree/project-update-migration-ksqldb). The migration changes and some adjustments are in the branch project-update-migration-ksqldb, not in master yet.

Besides, the workaround proposed by @rmoff doesn't work for me. Here is the script: https://github.com/ivangfr/springboot-kafka-debezium-ksql/blob/project-update-migration-ksqldb/docker/ksql/researchers-institutes.ksql

For now, the only way it works is by running each CREATE STREAM and CREATE TABLE statement individually in ksqlDB-cli terminal.


Update

Btw, I am not facing anymore the problem describe above about running a script inside ksqlDB. Everything looks ok now in my project.

@rmoff
Copy link
Contributor Author

rmoff commented Nov 25, 2021

@vcrfxia should this now be fixed, based on #4219 ?
@krisajenkins hit this error again today (running Confluent Platform 7.0.0), and I knew I'd seen it somewhere before :)

@vcrfxia
Copy link
Contributor

vcrfxia commented Nov 29, 2021

Hey @rmoff @krisajenkins , are you submitting the two statements (where the latter depends on the former) as two separate requests through the UI or as a single request? I would not expect them to work together as a single request for a reason similar to #4800. If you're submitting them separately and it's still not working I'd be curious to know:

  • is ksql registering a schema with Schema Registry when the first CREATE STREAM command is issued?
  • are there any logs (most likely in Schema Registry, potentially also forwarded to ksql) suggesting that ksql tried to register a schema but encountered errors?
  • does the second statement work after waiting a while, i.e., is it an issue of async propagation?
  • can you share your exact statements for reproducing the issue for further investigation?

@rmoff
Copy link
Contributor Author

rmoff commented Nov 30, 2021

@vcrfxia Running it as a single request via RUN

@colinhicks colinhicks added needs-triage streaming-engine Tickets owned by the ksqlDB Streaming Team labels Nov 30, 2021
@colinhicks
Copy link
Member

Adding workflow labels while we assess whether this can be addressed at the same time as #4800

@jzaralim jzaralim added this to the 0.24.0 milestone Dec 14, 2021
@vcrfxia vcrfxia modified the milestones: 0.24.0, 0.25.0 Jan 4, 2022
Bugs automation moved this from Needs triage to Closed Feb 3, 2022
mjsax added a commit that referenced this issue Feb 3, 2022
If multiple statements are executed "at once", ie, submitted via an input file, earlier statements may create new schemas that later statements depend on. During the sandbox execution, we don't register new schema in SR and thus dependent statement fail inside the sandbox as those schema are not available to them.

This fix adds a schema-cache inside to sandbox to capture new schemas to make them available to dependent statements inside the sandbox.

Co-authored-by: Victoria Xia <victoria.f.xia281@gmail.com>

Closes #1394
@mjsax mjsax modified the milestones: 0.25.0, 0.24.0 Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug streaming-engine Tickets owned by the ksqlDB Streaming Team user-experience
Projects
Bugs
  
Closed
Development

Successfully merging a pull request may close this issue.

8 participants