-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[#22051]: Add read_time support to Google Cloud Datastore connector #22052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
4 similar comments
|
Can one of the admins verify this patch? |
|
Can one of the admins verify this patch? |
|
Can one of the admins verify this patch? |
|
Can one of the admins verify this patch? |
Codecov Report
@@ Coverage Diff @@
## master #22052 +/- ##
==========================================
- Coverage 74.19% 74.17% -0.02%
==========================================
Files 706 706
Lines 93229 93229
==========================================
- Hits 69168 69154 -14
- Misses 22793 22807 +14
Partials 1268 1268
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. |
|
R: @pcostell |
|
LGTM |
|
R: @chamikaramj |
chamikaramj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
| Timestamp readTimeProto = Timestamps.fromMillis(readTime.getMillis()); | ||
| return querySplitter.getSplits(query, partitionId, numSplits, datastore, readTimeProto); | ||
| } | ||
| return querySplitter.getSplits(query, partitionId, numSplits, datastore); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, do sub-queries maintain the same read time ?
Also, I wonder if we can implement a better splitter function here by using different read times (but that can be separate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sub-queries all maintain the same read time, in that way we can guarantee a consistent read across the entire database from the beam workload.
Split function queries Datastore using the split points (a special index), having the split function use the same read time as all other queries will make sure the split points are accurate as of that read time, this can impact how well and balanced the beam workload can be partitioned.
| .read() | ||
| .withProjectId(PROJECT_ID) | ||
| .withQuery(QUERY) | ||
| .withReadTime(TIMESTAMP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do existing tests fail without setting the new parameter ?
If so, could customers run into similar issues ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the added readTime is optional and does not need to be specified if customers don't care about read consistency across sub-queries. e.g. just above this test (testBuildReadWithReadTime) there's another test (testBuildRead) that builds a DatastoreIO read with exactly the same arguments except that it doesn't specify readTime.
Also in V1ReadIT test, it does a read without specifying read time, and does another read with read time, both can work. So customers existing workload don't need to be changed and it won't break.
|
Run Java PreCommit |
|
Run Java_PVR_Flink_Batch PreCommit |
|
Retest this please |
|
Run SQL_Java11 PreCommit |
|
This broke the Java Dataflow Postcommit: |
|
attempting to fix with #22484 |
Allows DatastoreIO to be configured with a read_time, which will make the Beam workload issue all its Datastore reads using the same read_time and get the consistent read result from Datastore across all its split reads.
addresses #22051
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.