-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-49630][SS] Add flatten option to process collection types with state data source reader #48110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…te data source reader
|
cc - @HeartSaVioR @jingz-db - PTAL, thx ! |
...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSource.scala
Outdated
Show resolved
Hide resolved
...rc/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StatePartitionReader.scala
Outdated
Show resolved
Hide resolved
|
If you are working on this file anyway, could you also fix the typo in the comment here? Line 152 in 931ab06
the the state -> the state |
Sure done |
sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala
Outdated
Show resolved
Hide resolved
jingz-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nits otherwise LGTM! Thanks for making the change! Shall we also document in the PR API section that the default value for flatten option is True?
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass. Mostly minors.
...rc/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StatePartitionReader.scala
Outdated
Show resolved
Hide resolved
...re/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/utils/SchemaUtil.scala
Outdated
Show resolved
Hide resolved
...apache/spark/sql/execution/datasources/v2/state/StateDataSourceTransformWithStateSuite.scala
Show resolved
Hide resolved
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 pending CI
|
The build only failed with test suite from SparkConnect and it seems to be flaky and not relevant to this change. |
|
Thanks! Merging to master. |
What changes were proposed in this pull request?
Add flatten option to process collection types with state data source reader
Why are the changes needed?
Changes are needed to process entries row-by-row in case we don't have enough memory to fit these collections inside a single row
Does this PR introduce any user-facing change?
Yes
Users can provide the following query option:
How was this patch tested?
Added unit tests
Was this patch authored or co-authored using generative AI tooling?
No