-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark streaming schema evolution + ABRiS #176
Comments
Hi @grantatspothero |
@kevinwallimann Makes sense. There appears to be a way to invalidate a structured streaming plan so you could potentially detect schema changes and rebuild the plan, but it is complicated. You can close this thanks for the help! |
Hi @kevinwallimann However using this code I'm able to update the schema for each message at least in the source the problem is that is never reflected on the sink You said spark can't handle this out of the box I confirm but do you've any idea how to implement it ? |
This thread discusses a change to ABRiS to avoid hitting the schema registry repeatedly during spark execution by only fetching the latest schema once on the driver:
#105 (comment)
However, doesn't this mean if a new version of an event is registered to the schema registry after the spark streaming job is kicked off, then the spark streaming job will never pick up the new version of the schema until it is restarted?
The above thread discussed potential workarounds, one of them being a parameter to tune how often the schema registry should be polled for the latest schema version: #105 (comment)
Is this something reasonable to ask? It is nice since you could have longrunning spark streaming jobs that will dynamically ingest new versions of events as they are produced.
The text was updated successfully, but these errors were encountered: