Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-46906][SS] Add a check for stateful operator change for streaming #44927
[SPARK-46906][SS] Add a check for stateful operator change for streaming #44927
Changes from 5 commits
eec170c
d2d79a4
98adaaa
89f67ad
37ef7fd
7ecb493
5241dbb
20121cd
e427824
0618e9e
a6619cb
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: when user adds/removes/changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we inline all the logic e.g. building opMapInMetadata and opMapInPhysicalPlan to here? I don't see we use these fields other than here. Let's scope fields and methods be narrower whenever possible.
That said, You don't need to use rule to build opMapInPhysicalPlan. Let's just use foreach to traverse the plan and build opMapInPhysicalPlan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we print out association between opId and opName in error message? It may be uneasy to understand what is mismatching only with opNames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if you are using more than two lines to define the method, param should start at second line of definition. (In other words, all params should appear at the same indentation.)
https://github.com/databricks/scala-style-guide?tab=readme-ov-file#indent
Also, param should start with lowercase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any automation tool for checking this other than
./dev/scalafmt
? This command is listed on the spark developer tool wiki, and is actually quite messy - it will touch all existing files other than only formatting my code change.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe community has to run the formatter at once for the whole codebase. I'm not sure scalafmt can deal with the whole styles though. It is still good to familiarize Scala style guide for Databricks; it doesn't only contain styles automation can handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Thanks Jungtaek!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split down to separate test case if this is fully isolated with other check.
Btw, this actually brings up food for thought. Do we disallow stateful query to be stateless? E.g. could you simply test the removal of stateful operator with checkpointDir rather than spinning up another checkpoint?
It's OK if we have been supporting the case (although undocumented) and we keep supporting the case. If not, we could just test the case via using checkpointDir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't allow even before adding the operator check. Streaming will throw error with message as "state path not found".
Done. Restarting a stateless query from a stateful query will now trigger error with message as: