Skip to content

[SPARK-45747][SS] Use prefix key information in state metadata to handle reading state for session window aggregation#43788

Closed
chaoqin-li1123 wants to merge 5 commits intoapache:masterfrom
chaoqin-li1123:session_window_agg
Closed

[SPARK-45747][SS] Use prefix key information in state metadata to handle reading state for session window aggregation#43788
chaoqin-li1123 wants to merge 5 commits intoapache:masterfrom
chaoqin-li1123:session_window_agg

Conversation

@chaoqin-li1123
Copy link
Contributor

What changes were proposed in this pull request?

Currently reading state for session window aggregation operator is not supported because the numColPrefixKey is unknown. We can read the operator state metadata introduced in SPARK-45558 to determine the number of prefix columns and load the state of session window correctly.

Why are the changes needed?

To support reading state for session window aggregation.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Add integration test.

Was this patch authored or co-authored using generative AI tooling?

No.

@chaoqin-li1123
Copy link
Contributor Author

@HeartSaVioR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I use this class to get prefix key number but don't want to access the internal row because it is untyped. So I need to exposed the api to access the metadata as case class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we follow the pattern we do for StateDataSourceReadSuite vs StateDataSourceTestBase? The main reason I put the part of query execution to StateDataSourceTestBase is that we'd probably be likely to reuse the query between test for read vs test for write.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the query running code to StateDataSourceTestBase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we validate the full schema of the state, including the part of session window?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the query to validate all the field.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending CI.

@HeartSaVioR
Copy link
Contributor

Thanks! Merging to master.

@HeartSaVioR
Copy link
Contributor

@chaoqin-li1123 Could you please rebase your change with latest master branch? merge script is confusing that I'm the main author due to my commits listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants