-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-1267] Adds ignoreUnknownValues option to BigQuery.Write #1778
Conversation
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've given you a few comments marked as "Nit" meaning they are stylistic or otherwise consistency fixes. I am happy to cleanup nits during the git merge process -- so if you do not need/want to submit more changes, please comment and I will fix the nits while merging.
} | ||
|
||
/** | ||
* Returns a copy of this write transformation, but with ignoreUnknownValues set to true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but ignoring unknown values (see BigQuery
Jobs
Documentation`).
Maybe something like that, for clearer documentation if a reader is not super familiar with the BigQuery API?
@@ -2138,6 +2153,11 @@ public boolean getValidate() { | |||
return validate; | |||
} | |||
|
|||
/** Returns {@code true} if ignoreUnknownValues is enabled. */ | |||
public Boolean getIgnoreUnknownValues() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Mark the return value as @Nullable
.
Throughout, either @Nullable Boolean
or, if it cannot be null, boolean
.
ValueProvider<String> jsonSchema, | ||
WriteDisposition writeDisposition, | ||
CreateDisposition createDisposition, | ||
Boolean ignoreUnknownValues) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same nit w.r.t. @Nullable Boolean
vs boolean
.
@@ -2304,14 +2349,16 @@ private void load( | |||
@Nullable TableSchema schema, | |||
List<String> gcsUris, | |||
WriteDisposition writeDisposition, | |||
CreateDisposition createDisposition) throws InterruptedException, IOException { | |||
CreateDisposition createDisposition, | |||
Boolean ignoreUnknownValues) throws InterruptedException, IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and nit here.
@@ -2564,6 +2611,8 @@ static void clearCreatedTables() { | |||
|
|||
private final BigQueryServices bqServices; | |||
|
|||
private final Boolean ignoreUnknownValues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same good 'ole nit :).
@@ -166,7 +166,18 @@ void deleteDataset(String projectId, String datasetId) | |||
* | |||
* <p>Returns the total bytes count of {@link TableRow TableRows}. | |||
*/ | |||
long insertAll(TableReference ref, List<TableRow> rowList, @Nullable List<String> insertIdList) | |||
long insertAll( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just delete this function in favor of the one below?
private void checkWriteObjectWithIgnoreUnknownValues( | ||
BigQueryIO.Write.Bound bound, | ||
boolean ignoreUnknownValues) { | ||
assertEquals(ignoreUnknownValues, bound.ignoreUnknownValues); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like this function is only used once -- inline it?
@@ -498,7 +498,7 @@ public void testInsertRetry() throws Exception { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the file I'd expect to see a test that the ignoreUnknownValues
field actually makes it to the TableDataInsertAll
request. Maybe @peihe can suggest how such a test could be written?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think change the insertAll() to the following will help testing and support additional configurations. (the handling of ignoreUnknownValues will then stay in BigQueryIO.)
void insertAll(TableReference ref, Collection request);
Yes, @g-eorge, please do take another pass if you get the chance. Thanks! |
Hi @g-eorge, just checking in. Any interest in picking this back up, or something we can do to help? |
Hi @dhalperi, yes, sorry about the delay have been swamped. Will try to look at it soon. |
@g-eorge any update on this one? |
@peihe I have opened a PR here for BEAM-1306. When you are happy with that I will look at finishing this up. |
Refer to this link for build results (access rights to CI server needed): |
R: @reuvenlax -@dhalperi |
Is this still needed? Unfortunately the underlying BigQuery transform has changed quite a lot, so some merging is needed. |
I would definitely appreciate if this made it's way into beam! |
Sure, though BigQueryIO has changed quite a bit so you might have to a bit of conflict resolution. |
@reuvenlax this is currently blocked by this refactor requested by @peihe . If it is now possible to create pipelines with options like ignoreUnknownValues, then it's not necessary. Otherwise, I'm happy to find time to bring it up to date with the latest code, so long as I can get timely feedback. |
This PR hasn't been updated in a long time - do you plan to continue it? |
I will close this PR based on the stale PR policy, please reopen if you would like to continue working on it. |
Hi @dhalperi, please could you take a look and let me know if you have any feedback?
I've not tested it fully yet. Do you have any suggestions for the best way to test it?
Perhaps I should also update an example to include this option?