New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-991] Comply with byte limit for Datastore Commit in python SDK #3043
Conversation
R: @vikkyrk |
#2948 is for the Java SDK, this is for the Python SDK — and is titled accordingly :-) |
LGTM, thanks. |
R: @aaltay ready to be merged. |
datastore_write_fn.finish_bundle() | ||
|
||
self.assertEqual( | ||
math.ceil(sum(e.ByteSize() for e in entities) / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just use 2
here, test description says this will be split over two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that makes more sense - if someone increases the size per RPC later, the test might end up testing for 1 RPC, which would miss the point of the test.
@@ -313,8 +313,11 @@ class _Mutate(PTransform): | |||
supported, as the commits are retried when failures occur. | |||
""" | |||
|
|||
# Max allowed Datastore write batch size. | |||
# Max allowed Datastore writes per batch, and max bytes per batch. | |||
# Note that the max bytes per batch is lower than the actual limit enforced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the actual limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10MB, added to the comment (I had included it in the Java version, but not here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is not 5MB to restrictive, if 10MB is the limit? Would not this create unnecessary RPCs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does result in more RPCs, but 10MB is a limit, not a target. Most workloads will not hit 5MB either I guess.
The 10MB limit includes the whole RPC, not just the mutations, so some space needs to be left over for other fields in the CommitRequest. That said, halving the limit is very conservative; at the time I was thinking that this went through an external library (datastorehelper) as well, but I see that the commit path does not. I could make it 9MB if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, let's make it 9MB if you think that it is OK. (I am not an expert on Datastore, I would rather rely on your opinion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increased to 9MB and tested — that seems fine.
- state RPC size limit, consistent with the Java version. - clean up unit test.
Thank you @cph6. This change is good except for one question I asked above. |
This is closer to the enforced limit, while still leaving plenty of space for the elements of the CommitRequest apart from the mutations themselves.
Thank you @cph6! |
This is the equivalent of #2948 for the python SDK. RPCs are limited both by overall size and by the number of entities contained, to fit within the Datastore API limits https://cloud.google.com/datastore/docs/concepts/limits .