Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Headless KSQL to support schema inference (Avro) #1530

Closed
miguno opened this issue Jul 4, 2018 · 8 comments
Closed

Headless KSQL to support schema inference (Avro) #1530

miguno opened this issue Jul 4, 2018 · 8 comments
Labels
avro Issues involving Avro serialisation bug
Projects

Comments

@miguno
Copy link
Contributor

miguno commented Jul 4, 2018

Schema inference doesn't work when using headless KSQL. For example, when your input data is in Avro format, you must still enter column names/types manually (when KSQL is run in headless config).

The problem with headless (non-interactive mode) is that the engine parses all the queries in the set up front, and fails if any of the fields are not found in the meta store. The schema inference is done during execution, which happens after parsing. So if a subsequent query refers to an inferred field, it will fail with the parse exception.

  • The workaround is to manually specify the columns.
  • The solution is to parse and execute queries one by one in all modes.

Related Issues

We also have other issues because of this dichotomy between the way our interactive and non-interactive modes operate.

@miguno miguno added the bug label Jul 4, 2018
@miguno
Copy link
Contributor Author

miguno commented Jul 4, 2018

cc @ybyzek @rmoff

@rmoff
Copy link
Contributor

rmoff commented Jul 4, 2018

This is an important thing for us to fix to enable people to painlessly move from dev through to production.

@miguno miguno added the avro Issues involving Avro serialisation label Jul 19, 2018
@big-andy-coates big-andy-coates self-assigned this Jul 25, 2018
@big-andy-coates
Copy link
Contributor

+1 from me - this is a biggie for production.

@yunhappy
Copy link

yunhappy commented Oct 23, 2018

  1. +1 from me ,
  2. offset set to earliest can not work ?

@miguno
Copy link
Contributor Author

miguno commented Oct 29, 2018

@yunhappy : Please upvote (+1 / thumbs-up) the first message/comment in this thread, this way we can track the number of upvotes more easily.

@big-andy-coates
Copy link
Contributor

big-andy-coates commented Dec 4, 2018

This is in direct conflict with issue #2205.

The issue with supporting avro field inference in headless mode is that the schema could change over time. So we might start KSQL only to find the schema has changed, with columns added / removed. How should KSQL handle such a change?

Currently, in interactive mode, the schema is queried once and then baked into the command topic. We could support something similar for headless. I know @rodesai is looking to store other system metadata in a topic in headless mode. Maybe this could be too.

Alternatively, we can look at supporting schema evolution in both headless and interactive. But this is a larger piece of work and probably a more long term goal / feature.

As for #2205, I see that as just a quick 'make KSQL fail hard and fast' until we get around to adding true support for this, i.e. fix the fact it currently parses OK, but doesn't actually work.

@dhanasgit
Copy link

+1

@miguno
Copy link
Contributor Author

miguno commented Apr 7, 2022

Closing this issue. If there are future requests similar to this, please reopen or create a new issue.

@miguno miguno closed this as completed Apr 7, 2022
Bugs automation moved this from Needs triage to Closed Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
avro Issues involving Avro serialisation bug
Projects
Bugs
  
Closed
Development

No branches or pull requests

5 participants