Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-480] Move insertAll() from BigQueryTableInserter to BigQueryServices #717

Closed
wants to merge 1 commit into from

Conversation

peihe
Copy link
Contributor

@peihe peihe commented Jul 22, 2016

No new code in this PR, just a refactoring.

@peihe
Copy link
Contributor Author

peihe commented Jul 22, 2016

R: @dhalperi

@dhalperi
Copy link
Contributor

LGTM

@asfgit asfgit closed this in 122cd04 Jul 22, 2016
@peihe peihe deleted the log-request branch July 22, 2016 23:54
inserter.insertAll(tableReference, tableRows, uniqueIds, byteCountAggregator);
long totalBytes = bqServices.getDatasetService(options).insertAll(
tableReference, tableRows, uniqueIds);
byteCountAggregator.addValue(totalBytes);
Copy link

@gadaldo gadaldo Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you move BigQueryTableInserter in the BigqueryServices, how can I insert the whole List<TableRow> into a Bigquery table? I could use com.google.api.services.bigquery.Bigquery service rather then the old BigQueryTableInserte, but I can't access getDatasetService(options) from that object so I can't use insertAll() method, so how can I replace the following code now?

Bigquery bigqueryService = ... // building bigQueryService with google credential
BigQueryTableInserter bigQueryTableInserter = new BigQueryTableInserter(bigqueryService);
   bigQueryTableInserter.getOrCreateTable(tableReference,
   BigQueryIO.Write.WriteDisposition.WRITE_APPEND,
   BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED,
   tableSchema);
bigQueryTableInserter.insertAll(tableReference, rows);

I am struggling with it.

Thank you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain to me the context of your usage of BigQueryTableInserter?

Could you use BigQueryIO.Write?
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L185

BigqueryServices is the implementation details of the BigQueryIO (similarly we have BigtableServices). If we make the visibility public, we have to guarantee the backward compatibility from now on. And, we need to review the exposed methods signatures, and their use cases.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of backward compatibility, I think that that feature (BigQueryTableInserter) was there and was working in production and now we can't see any alternative to it, so... not that great.

I can't use BigQueryIO.Write otherwise I would have done it. Mainly because of the nature of our dataflows; I'll try to explain the scenario:
We are receiving batch of Json events from pub sub in base64 algorithm. We have to store those into Bigquey tables, so we translate these records into GenericRecord and then into TableRows in a processElement() method. In this method we retrieve the table schema and the name of the table at runtime (from the header of the message) so we can't output() a List<TableRow> otherwise we lose the table name externally and the schema as well, that's why we can't use BigQueryIO.Write and we have to store these records within the processElement() method itself using the object BigQueryTableInserter.

If you have any other idea to apply so, it's welcomed.
Thank you

pl04351820 pushed a commit to pl04351820/beam that referenced this pull request Dec 20, 2023
* feat: Multi db test parametrization

The changes are:

- adding test parametrization wherever the client is used. It will run the test against the default db, and a named db
named db can only be passed to the Client, and the underlying queries and aggregation queries will retrieve the database id from the Client.

* Fix unit test import

* Fix unit test import

* fix: change non-default database name

* fix: added client test without any database set

---------

Co-authored-by: Vishwaraj Anand <vishwaraj.anand00@gmail.com>
Co-authored-by: kolea2 <45548808+kolea2@users.noreply.github.com>
Co-authored-by: Anthonios Partheniou <partheniou@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants