New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-10189] Add ReadModifyWriteState user state to python sdk #11916
Conversation
Last time someone started adding this feature, we decided to call it ReadModifyWrite state. |
Java SDK still calls this ValueState, ReadModifyWriteState is only used in beam_runner_api, wondering if we should change Java SDK as well if we want to call it ReadModifyWriteState in python? |
Yes, the plan was to consider changing Java too, though that's harder due to backwards compatibility issues. |
Renamed to ReadModifyWriteState. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, modulo minor comments.
Retest this please |
@committer This is approved. Could someone commit this please? |
retest this please |
Run Python2_PVR_Flink PreCommit |
@mxm Seems like flink runner is sending cache token per state, instead of one user state token per bundle. This will cause python sdk to fail
|
The Flink Runner only sends one cache token per worker and application attempt. Please see Line 400 in f3d5fc4
|
it looks like the fn-execution core module is trying to add a cache token for each state key? Line 272 in 2fd785d
|
It is true that the fn protocol supports one cache token per handler (e.g. user state or side input handler). Those handler do not change for the lifetime of the application. I'm still trying to understand what the problem is. Cache tokens have been working fine so far. Could you provide some logs or test cases which show that there is a problem? |
Last time I checked with @lukecwik he mentioned the SDK is expecting one global cache token per-bundle for all user states, and one cache token per side-input. The problem is that it seems we can't declare more than one user state in Stateful Dofn, otherwise the SDK fails. Such as the test_pardo_state_only_test I'm trying to update in this PR.
|
Thanks for the stacktrace, that helped to figure out what's going on here. The issue is only present in batch mode where Flink does not use its own memory backend but uses Beam's We have to adapt the implementation to return the same cache token for all Line 60 in 63d51f5
Line 116 in 63d51f5
|
I've created a PR which fixes the problem: #12062 |
Run Python2_PVR_Flink PreCommit |
Run Python PreCommit |
Glad to see we were able to unblock the changes here! |
Thanks for the fix! |
My pleasure! |
…e#11916) * [BEAM-10189] Add ValueState user state to python sdk * Rename to ReadModifyWriteState * Rename _ValueStateTag to _ReadModifyWriteStateTag * Use deterministic teststream in tests and minor fix to direct_userstate * lint
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.