New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-13806] Add x-lang BigQuery IO integration test to Go SDK. #16818
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big comment is about the output PCollections for Writes, but I don't expect that change in this PR. I beleive It's a best practice to always have those for write transforms so it's possible to block on them with side inputs. I clearly missed that in reviews last week.
bigqueryio.Write(s, table, rows, | ||
bigqueryio.CreateDisposition(bigqueryio.CreateNever), | ||
bigqueryio.WriteExpansionAddr(*integration.GCPIoExpansionAddr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the bigqueryIO Write not return a pcollection we can have the read block on? As it stands, I think this will have both happen at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but I split the pipeline into two separate pipelines that will be run in sequence, which should avoid this issue.
checkFlags(t) | ||
|
||
// Create a table before running the pipeline | ||
table, err := newTempTable(*integration.BigQueryDataset, "go_bqio_it", ddlTestRowSchema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider printing the table name using t.Log
, to make it clearer which resources map to this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
bigqueryio.ReadExpansionAddr(*integration.GCPIoExpansionAddr)) | ||
passert.Equals(s, readRows, rows) | ||
|
||
ptest.RunAndValidate(t, p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the test is successful could we please also delete the table manually?
I see that we have limited it's lifespan, but I'd prefer that we don't accrue a rolling pile of these over the course of the 24 hour window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
@youngoli - What is the next step on this PR? |
Could this be merged? |
Sorry I haven't been checking this. Don't merge this yet, I need to test it on Jenkins (which requires getting in #16819 first). I'll try to find time to work on these and get them in. |
Run XVR_GoUsingJava_Dataflow PostCommit |
I rebased to fix merge errors and account for the changes in our integration testing script, but the functionality of the first commit should be unchanged, only the test framework and expansion service setup should be any different. The second commit contains all the fixup addressing the comments. |
Run XVR_GoUsingJava_Dataflow PostCommit |
1 similar comment
Run XVR_GoUsingJava_Dataflow PostCommit |
What is the next step on this PR? |
rows := beam.ParDo(s, createFn, beam.Impulse(s)) | ||
inType := reflect.TypeOf((*TestRow)(nil)).Elem() | ||
readRows := bigqueryio.Read(s, inType, | ||
bigqueryio.FromTable(table), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also test reading from a query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did. As an update to what we talked about online, I'm not getting any permissions errors testing locally. (I am getting a different unrelated error, but it's because of a bug in how I wrote the test. Will commit once I fix that.)
Also includes piping in flags for BigQuery IO through integration test script, and a small file for creating bigquery tables that expire after a day.
Splits the integration test into two pipelines to run sequentially. Also drops table after a successful test and logs table names.
CreateRows wasn't creating the same elements in both read and write pipelines after splitting the two pipelines. Adjusted it to use a consistent seed in all pipelines.
Codecov Report
@@ Coverage Diff @@
## master #16818 +/- ##
=======================================
Coverage 74.10% 74.10%
=======================================
Files 698 698
Lines 92486 92486
=======================================
Hits 68541 68541
Misses 22690 22690
Partials 1255 1255
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
I added a test that reads from a Query, and while testing in my own environment I'm getting weird results. The elements output from a query read have all fields as pointers because SQL queries don't match the table's schema. That's pretty normal. But when I made an equivalent struct in Go, I get errors when decoding into that struct, even though all the fields seem to be matching exactly:
|
Run XVR_GoUsingJava_Dataflow PostCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we document the type alias work around for "lost" types, I'm very comfortable getting this in. We should file an issue for whatever is happening with Query. We likely need to simply hard replace the coder java thinks it has in the CrossLanguage handler, or figure out why a new one is derived in the first place.
Run XVR_GoUsingJava_Dataflow PostCommit |
Is this the error @robertwb ran into? Missing ProjectID?
|
Some of the "default" tests are also failing (prefix, multi, partition), but they're hard dying after apparently sending a bundle to the Java side (I see successful Go side messages). Definitely unrelated, but separately worth investigating. |
Yes, that looks like the same issue Robert ran into. It seems to be failing because the expansion service is unaware of the GCP project from pipeline options. But this was discussed earlier this week and it should be a relatively easy fix. I think for now though I might split the query and non-query reads into separate tests and sickbay the query read temporarily. |
SGTM. I'll give it a once over after and then we can merge.
…On Fri, Jun 10, 2022, 3:59 PM Daniel Oliveira ***@***.***> wrote:
Yes, that looks like the same issue Robert ran into. It seems to be
failing because the expansion service is unaware of the GCP project from
pipeline options. But this was discussed earlier this week and it should be
a relatively easy fix. I think for now though I might split the query and
non-query reads into separate tests and sickbay the query read temporarily.
—
Reply to this email directly, view it on GitHub
<#16818 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADKDOFKPU73WTHQS5ONDVXTVOPCE3ANCNFSM5OC2L5JA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Run XVR_GoUsingJava_Dataflow PostCommit |
Prior to the merge conflicts (that I'll manually fix in a bit) I kicked off which should be one final run of these (less the filtered out Query test). Assuming this passes, a quick fix to merge the integration filters later, this will be good to merge. |
The run did pass, modulo the 3 unrelated tests that are currently failing. I'm guessing something changed to make those "built in" transforms to no longer be built in, so they're failing. They're working for python however, so there's some mechanism that's failing with how Go specifies them. |
Nice, merged in. Thanks for doing that last bit of checking, I appreciate it. |
…he#16818) * [BEAM-13806] Add x-lang BigQuery IO integration test to Go SDK. Also includes piping in flags for BigQuery IO through integration test script, and a small file for creating bigquery tables that expire after a day. * [BEAM-13806] Splitting BigQuery IT into read and write pipelines. Splits the integration test into two pipelines to run sequentially. Also drops table after a successful test and logs table names. * Fixup: Fix gradle build and undo VR script changes. * Fixup: Add Query test and fix deterministic random element generation. CreateRows wasn't creating the same elements in both read and write pipelines after splitting the two pipelines. Adjusted it to use a consistent seed in all pipelines. * Fixup: Avoiding inline functions * Workaround to coder issue, plus some debugging code * Polishing workaround with documentation and removing debug prints. * Move pipeline code to test file * Split Query test from non-query test Co-authored-by: Robert Burke <lostluck@users.noreply.github.com>
Also includes piping in flags for BigQuery IO through integration test script, and a small file for creating bigquery tables that expire after a day.
This was manually tested, but can't be tested on Jenkins at the moment because it only works on Dataflow and there currently isn't a test suite that runs Go xlang transforms on Dataflow. That should be fixed shortly and then this can be tested on Jenkins. It shouldn't be submitted until then.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
ValidatesRunner
compliance status (on master branch)Examples testing status on various runners
Post-Commit SDK/Transform Integration Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.