New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-480] Move insertAll() from BigQueryTableInserter to BigQueryServices #717
Conversation
R: @dhalperi |
LGTM |
inserter.insertAll(tableReference, tableRows, uniqueIds, byteCountAggregator); | ||
long totalBytes = bqServices.getDatasetService(options).insertAll( | ||
tableReference, tableRows, uniqueIds); | ||
byteCountAggregator.addValue(totalBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you move BigQueryTableInserter
in the BigqueryServices, how can I insert the whole List<TableRow>
into a Bigquery table? I could use com.google.api.services.bigquery.Bigquery
service rather then the old BigQueryTableInserte, but I can't access getDatasetService(options)
from that object so I can't use insertAll() method, so how can I replace the following code now?
Bigquery bigqueryService = ... // building bigQueryService with google credential
BigQueryTableInserter bigQueryTableInserter = new BigQueryTableInserter(bigqueryService);
bigQueryTableInserter.getOrCreateTable(tableReference,
BigQueryIO.Write.WriteDisposition.WRITE_APPEND,
BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED,
tableSchema);
bigQueryTableInserter.insertAll(tableReference, rows);
I am struggling with it.
Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain to me the context of your usage of BigQueryTableInserter?
Could you use BigQueryIO.Write?
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L185
BigqueryServices is the implementation details of the BigQueryIO (similarly we have BigtableServices). If we make the visibility public, we have to guarantee the backward compatibility from now on. And, we need to review the exposed methods signatures, and their use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of backward compatibility, I think that that feature (BigQueryTableInserter) was there and was working in production and now we can't see any alternative to it, so... not that great.
I can't use BigQueryIO.Write
otherwise I would have done it. Mainly because of the nature of our dataflows; I'll try to explain the scenario:
We are receiving batch of Json events from pub sub in base64 algorithm. We have to store those into Bigquey tables, so we translate these records into GenericRecord
and then into TableRows
in a processElement()
method. In this method we retrieve the table schema and the name of the table at runtime (from the header of the message) so we can't output()
a List<TableRow>
otherwise we lose the table name externally and the schema as well, that's why we can't use BigQueryIO.Write
and we have to store these records within the processElement()
method itself using the object BigQueryTableInserter
.
If you have any other idea to apply so, it's welcomed.
Thank you
* feat: Multi db test parametrization The changes are: - adding test parametrization wherever the client is used. It will run the test against the default db, and a named db named db can only be passed to the Client, and the underlying queries and aggregation queries will retrieve the database id from the Client. * Fix unit test import * Fix unit test import * fix: change non-default database name * fix: added client test without any database set --------- Co-authored-by: Vishwaraj Anand <vishwaraj.anand00@gmail.com> Co-authored-by: kolea2 <45548808+kolea2@users.noreply.github.com> Co-authored-by: Anthonios Partheniou <partheniou@google.com>
No new code in this PR, just a refactoring.