[SPARK-45747][SS] Use prefix key information in state metadata to handle reading state for session window aggregation#43788
Conversation
...rc/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StatePartitionReader.scala
Outdated
Show resolved
Hide resolved
...rc/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StatePartitionReader.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Is this change necessary?
There was a problem hiding this comment.
Yes, I use this class to get prefix key number but don't want to access the internal row because it is untyped. So I need to exposed the api to access the metadata as case class
There was a problem hiding this comment.
Shall we follow the pattern we do for StateDataSourceReadSuite vs StateDataSourceTestBase? The main reason I put the part of query execution to StateDataSourceTestBase is that we'd probably be likely to reuse the query between test for read vs test for write.
There was a problem hiding this comment.
Moved the query running code to StateDataSourceTestBase.
...est/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Shall we validate the full schema of the state, including the part of session window?
There was a problem hiding this comment.
Changed the query to validate all the field.
|
Thanks! Merging to master. |
|
@chaoqin-li1123 Could you please rebase your change with latest master branch? merge script is confusing that I'm the main author due to my commits listed here. |
eba2737 to
6ffe9c2
Compare
What changes were proposed in this pull request?
Currently reading state for session window aggregation operator is not supported because the numColPrefixKey is unknown. We can read the operator state metadata introduced in SPARK-45558 to determine the number of prefix columns and load the state of session window correctly.
Why are the changes needed?
To support reading state for session window aggregation.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Add integration test.
Was this patch authored or co-authored using generative AI tooling?
No.