Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-991] Comply with byte limit for Datastore Commit in python SDK #3043

Closed
wants to merge 3 commits into from

Conversation

cph6
Copy link
Contributor

@cph6 cph6 commented May 10, 2017

This is the equivalent of #2948 for the python SDK. RPCs are limited both by overall size and by the number of entities contained, to fit within the Datastore API limits https://cloud.google.com/datastore/docs/concepts/limits .

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.007%) to 70.473% when pulling a5b2ce0 on cph6:datastore_request_size_limit_py into 1fafaa8 on apache:master.

@aaltay
Copy link
Member

aaltay commented May 16, 2017

R: @vikkyrk

@aaltay
Copy link
Member

aaltay commented May 16, 2017

This seems to be a duplicate of #2948. @cph6 could you close this one if this is a duplicate, if not could you change the title?

@cph6
Copy link
Contributor Author

cph6 commented May 17, 2017

#2948 is for the Java SDK, this is for the Python SDK — and is titled accordingly :-)

@aaltay
Copy link
Member

aaltay commented May 17, 2017

@cph6 thanks for the clarification :) @vikkyrk please review this one as well.

@vikkyrk
Copy link
Contributor

vikkyrk commented May 17, 2017

LGTM, thanks.

@vikkyrk
Copy link
Contributor

vikkyrk commented May 17, 2017

R: @aaltay ready to be merged.

datastore_write_fn.finish_bundle()

self.assertEqual(
math.ceil(sum(e.ByteSize() for e in entities) /
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use 2 here, test description says this will be split over two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that makes more sense - if someone increases the size per RPC later, the test might end up testing for 1 RPC, which would miss the point of the test.

@@ -313,8 +313,11 @@ class _Mutate(PTransform):
supported, as the commits are retried when failures occur.
"""

# Max allowed Datastore write batch size.
# Max allowed Datastore writes per batch, and max bytes per batch.
# Note that the max bytes per batch is lower than the actual limit enforced
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the actual limit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10MB, added to the comment (I had included it in the Java version, but not here).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not 5MB to restrictive, if 10MB is the limit? Would not this create unnecessary RPCs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does result in more RPCs, but 10MB is a limit, not a target. Most workloads will not hit 5MB either I guess.

The 10MB limit includes the whole RPC, not just the mutations, so some space needs to be left over for other fields in the CommitRequest. That said, halving the limit is very conservative; at the time I was thinking that this went through an external library (datastorehelper) as well, but I see that the commit path does not. I could make it 9MB if you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, let's make it 9MB if you think that it is OK. (I am not an expert on Datastore, I would rather rely on your opinion).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increased to 9MB and tested — that seems fine.

- state RPC size limit, consistent with the Java version.
- clean up unit test.
@aaltay
Copy link
Member

aaltay commented May 19, 2017

Thank you @cph6. This change is good except for one question I asked above.

This is closer to the enforced limit, while still leaving plenty of space for
the elements of the CommitRequest apart from the mutations themselves.
@asfgit asfgit closed this in 597b07e May 23, 2017
@aaltay
Copy link
Member

aaltay commented May 23, 2017

Thank you @cph6!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants