Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-1267] Adds ignoreUnknownValues option to BigQuery.Write #1778

Closed
wants to merge 1 commit into from

Conversation

g-eorge
Copy link

@g-eorge g-eorge commented Jan 14, 2017

Hi @dhalperi, please could you take a look and let me know if you have any feedback?

I've not tested it fully yet. Do you have any suggestions for the best way to test it?

Perhaps I should also update an example to include this option?

@asfbot
Copy link

asfbot commented Jan 14, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/6558/
--none--

Copy link
Contributor

@dhalperi dhalperi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given you a few comments marked as "Nit" meaning they are stylistic or otherwise consistency fixes. I am happy to cleanup nits during the git merge process -- so if you do not need/want to submit more changes, please comment and I will fix the nits while merging.

}

/**
* Returns a copy of this write transformation, but with ignoreUnknownValues set to true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but ignoring unknown values (see BigQuery Jobs Documentation`).

Maybe something like that, for clearer documentation if a reader is not super familiar with the BigQuery API?

@@ -2138,6 +2153,11 @@ public boolean getValidate() {
return validate;
}

/** Returns {@code true} if ignoreUnknownValues is enabled. */
public Boolean getIgnoreUnknownValues() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Mark the return value as @Nullable.

Throughout, either @Nullable Boolean or, if it cannot be null, boolean.

ValueProvider<String> jsonSchema,
WriteDisposition writeDisposition,
CreateDisposition createDisposition,
Boolean ignoreUnknownValues) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nit w.r.t. @Nullable Boolean vs boolean.

@@ -2304,14 +2349,16 @@ private void load(
@Nullable TableSchema schema,
List<String> gcsUris,
WriteDisposition writeDisposition,
CreateDisposition createDisposition) throws InterruptedException, IOException {
CreateDisposition createDisposition,
Boolean ignoreUnknownValues) throws InterruptedException, IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and nit here.

@@ -2564,6 +2611,8 @@ static void clearCreatedTables() {

private final BigQueryServices bqServices;

private final Boolean ignoreUnknownValues;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same good 'ole nit :).

@@ -166,7 +166,18 @@ void deleteDataset(String projectId, String datasetId)
*
* <p>Returns the total bytes count of {@link TableRow TableRows}.
*/
long insertAll(TableReference ref, List<TableRow> rowList, @Nullable List<String> insertIdList)
long insertAll(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just delete this function in favor of the one below?

private void checkWriteObjectWithIgnoreUnknownValues(
BigQueryIO.Write.Bound bound,
boolean ignoreUnknownValues) {
assertEquals(ignoreUnknownValues, bound.ignoreUnknownValues);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this function is only used once -- inline it?

@@ -498,7 +498,7 @@ public void testInsertRetry() throws Exception {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the file I'd expect to see a test that the ignoreUnknownValues field actually makes it to the TableDataInsertAll request. Maybe @peihe can suggest how such a test could be written?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think change the insertAll() to the following will help testing and support additional configurations. (the handling of ignoreUnknownValues will then stay in BigQueryIO.)
void insertAll(TableReference ref, Collection request);

https://issues.apache.org/jira/browse/BEAM-1306

@dhalperi
Copy link
Contributor

Thanks @g-eorge for this great feature. I've added @peihe to the review, hope that he can provide guidance on testing. Overall, this looks great.

@g-eorge
Copy link
Author

g-eorge commented Jan 24, 2017

Thanks @dhalperi @peihe for the feedback, I'm happy to take another look at it. Unless someone beats me to it.

@dhalperi
Copy link
Contributor

Yes, @g-eorge, please do take another pass if you get the chance. Thanks!

@dhalperi
Copy link
Contributor

Hi @g-eorge, just checking in. Any interest in picking this back up, or something we can do to help?

@g-eorge
Copy link
Author

g-eorge commented Feb 21, 2017

Hi @dhalperi, yes, sorry about the delay have been swamped. Will try to look at it soon.

@aaltay
Copy link
Member

aaltay commented Mar 16, 2017

@g-eorge any update on this one?

@g-eorge
Copy link
Author

g-eorge commented Mar 21, 2017

@peihe I have opened a PR here for BEAM-1306. When you are happy with that I will look at finishing this up.

@reuvenlax
Copy link
Contributor

Since I'm busy refactoring BigQueryIO (#2415), I can help review these PRs. @dhalperi or another committer will still have to be the one to merge.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 69.348% when pulling 953d7c5 on g-eorge:bq-option-ignoreunknownvalues into 810db7f on apache:master.

@asfbot
Copy link

asfbot commented Apr 8, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/9320/
--none--

@davorbonaci
Copy link
Member

R: @reuvenlax -@dhalperi

@reuvenlax
Copy link
Contributor

Is this still needed? Unfortunately the underlying BigQuery transform has changed quite a lot, so some merging is needed.

@mborst
Copy link

mborst commented Jun 1, 2017

I would definitely appreciate if this made it's way into beam!

@reuvenlax
Copy link
Contributor

Sure, though BigQueryIO has changed quite a bit so you might have to a bit of conflict resolution.

@g-eorge
Copy link
Author

g-eorge commented Jun 12, 2017

@reuvenlax this is currently blocked by this refactor requested by @peihe . If it is now possible to create pipelines with options like ignoreUnknownValues, then it's not necessary. Otherwise, I'm happy to find time to bring it up to date with the latest code, so long as I can get timely feedback.

@jkff
Copy link
Contributor

jkff commented Dec 16, 2017

This PR hasn't been updated in a long time - do you plan to continue it?

@aaltay
Copy link
Member

aaltay commented Jan 24, 2018

I will close this PR based on the stale PR policy, please reopen if you would like to continue working on it.

@aaltay aaltay closed this Jan 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants