Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-7739] Implement ReadModifyWriteState in Python SDK #9067

Closed

Conversation

rakeshcusat
Copy link
Contributor

ValueState is missing from python sdk though Java sdk has it. In order to have the feature parity we should have this implemented for Python sdk.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@robertwb
Copy link
Contributor

Thanks for looking at this. IIRC, the consensus last time this came up was to call it ReadModifyWriteState.

@robertwb robertwb self-requested a review July 15, 2019 12:06
@rakeshcusat
Copy link
Contributor Author

rakeshcusat commented Jul 15, 2019

@robertwb Can you give me more context to that discussion or email chain that I can follow.
I was also talking to @pabloem offline and he mentioned that ValueState is controversial and java sdk doesn't allow it specifically when you are merging windows. I was just thinking of closing this PR.

If ReadModifyWriteState is not implemented yet then I will be more than happy to implement that state.

@rakeshcusat
Copy link
Contributor Author

@robertwb never mind, I found the discussion here

@robertwb
Copy link
Contributor

To follow up on this, it's not yet implemented but that would be a welcome contribution (likely just some renaming in this PR).

@rakeshcusat rakeshcusat changed the title [BEAM-7739] Implement ValueState in Python SDK [BEAM-7739] Implement ReadModifyWriteState in Python SDK Jul 24, 2019
@rakeshcusat
Copy link
Contributor Author

rakeshcusat commented Jul 24, 2019

@robertwb can you confirm that we also need to introduce ReadModifyWriteState in the proto definition?

message StateSpec {
oneof spec {
ValueStateSpec value_spec = 1;
BagStateSpec bag_spec = 2;
CombiningStateSpec combining_spec = 3;
MapStateSpec map_spec = 4;
SetStateSpec set_spec = 5;
}
}
message ValueStateSpec {
string coder_id = 1;
}

@robertwb
Copy link
Contributor

@robertwb can you confirm that we also need to introduce ReadModifyWriteState in the proto definition?

Or simply rename ValueStateSpec in the proto. (These protos are still in a state they can be modified.)

@rakeshcusat
Copy link
Contributor Author

rakeshcusat commented Jul 24, 2019

@robertwb can you confirm that we also need to introduce ReadModifyWriteState in the proto definition?

Or simply rename ValueStateSpec in the proto. (These protos are still in a state they can be modified.)

I just want to make sure I understand it correctly, In case I rename this, I also need to rename in Java SDK as well because it is used there as well, right?
I didn't see any reference in Go SDK other then the auto-generated code so I am assuming it is not yet implemented in Go SDK

I will rename the spec to be consistent and avoid any confusion in future.

Edit:
on second thought I am concerned about this line

ValueStateSpec value_spec = 1;
if I would rename this proto. Wouldn't it cause proto to complain that we have changed the type and cause some incompatibility problem?

@pabloem
Copy link
Member

pabloem commented Jul 24, 2019

@rakeshcusat I think the name for Java would still be the same (changing the name would be a breaking change).

As for the proto, I think you can change it, because it's not externally exposed to users of Beam - it's only meant to be used by Beam itself (in this repository).

@rakeshcusat rakeshcusat force-pushed the BEAM-7739-implement-valuestate branch from aaadc02 to fdfd029 Compare August 12, 2019 22:23
@robertwb
Copy link
Contributor

Ping?

@rakeshcusat
Copy link
Contributor Author

Ping?

@robertwb I was busy but I will try to finish it this week.

@rakeshcusat
Copy link
Contributor Author

@robertwb / @pabloem Here is the current status:

  1. Updated proto beam_runner_api.proto
  2. Python SDK side is ready.
  3. Made most of the changes in Java SDK and I need feedback to not break the existing ValueState but deprecate it with minimal changes. I would really appreciate some pointers and ideas.
  4. I have not looked into Go SDK and not sure whether it is implemented or not but would appreciate some pointers and ideas.

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Just some minor thoughts. As for Go, the protos are still manually copied and generated there, so I don't think we have to change anything now.

public static final class InMemoryValue<T>
implements ValueState<T>, InMemoryState<InMemoryValue<T>> {
implements ValueState<T>, InMemoryState<InMemoryValue<T>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this could be mostly merged with InMemoryReadModifyWrite (possibly requiring a common baseclass).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I wonder if InMemoryReadModifyWrite was a subclass of InMemoryValue one could get rid of some of the parallel code here (e.g. bindValue vs. bindReadModifyWrite). Haven't pursued this to its conclusion though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, I need to look more closely and consolidate duplicate code.

@@ -273,14 +280,73 @@ void clearInternal() {
}

private class FlinkBroadcastValueState<T> extends AbstractBroadcastState<T>
implements ValueState<T> {
implements ValueState<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, perhaps ReadModifyWriteState could extend ValueState?

@rakeshcusat
Copy link
Contributor Author

R: @kennknowles

@rakeshcusat
Copy link
Contributor Author

@kennknowles and I discuss it offline.
This change is getting bigger and it is also not moving fast. The bigger changes are also hard to review and debug. I am going to break this down into multiple PRs which will be easier to review and we can move faster.

  1. Add ReadModifyWrite state in Python Sdk without proto changes but will use valueState from proto.
  2. Rename the proto and update the Python and Java sdk accordingly.
  3. Add ReadModifyWrite State in Java sdk.

Let me know if you have any concerns with this approach.
@robertwb

@robertwb
Copy link
Contributor

Sound good to me. It'd be less work to swap steps (1) and (2).

@rakeshcusat
Copy link
Contributor Author

Sound good to me. It'd be less work to swap steps (1) and (2).

Sounds good. I will do as the suggested way.

@robertwb
Copy link
Contributor

Any update on this? I see #9687, were there other PRs?

@stale
Copy link

stale bot commented Jan 14, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jan 14, 2020
@stale
Copy link

stale bot commented Jan 22, 2020

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@stale stale bot closed this Jan 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants