-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
What would you like to happen?
Hi, Beam community,
I very much like the feature WriteToBigquery offering to dynamically deciding the table:
beam/sdks/python/apache_beam/io/gcp/bigquery.py
Line 1788 in 6adecd4
| table (str, callable, ValueProvider): The ID of the table, or a callable |
I am thinking about a potential improvement opportunity. Currently WriteToBigquery supports either dict or TableRow, and it assumes that table reference can be inferred from the content of dict or TableRow to be written into BQ. We probably should remove that assumption. For systems that does not align with this assumption the operator would force them to add table name as a column. (or optionally enable ignore_unknown_fields)
This is a case I am expecting WriteToBigquery to support:
Given a PCollection of K, V pair of <str, dict>, I hope K can be the used to infer table name yet get V written into the dictionary. (or even do some further processing on V)
I can see there is a few way to do that. The simplest way I can think of is to allow table arg to be an instance of DoFn or AppendDestinationsFn.
| class AppendDestinationsFn(DoFn): |
If the sdk allow us to overwrite AppendDestinationsFn and pass it into WriteToBigquery we will have more freedom to customise the class.
I am happy to contribute and make this feature come true. But would be keen to get communities' feedback first:)
Issue Priority
Priority: 2
Issue Component
Component: io-py-gcp