Skip to content

[Feature Request]: WriteToBigquery should offer a callback to deeply customise destination & element to BQ #23746

@ruodingt

Description

@ruodingt

What would you like to happen?

Hi, Beam community,

I very much like the feature WriteToBigquery offering to dynamically deciding the table:

table (str, callable, ValueProvider): The ID of the table, or a callable

I am thinking about a potential improvement opportunity. Currently WriteToBigquery supports either dict or TableRow, and it assumes that table reference can be inferred from the content of dict or TableRow to be written into BQ. We probably should remove that assumption. For systems that does not align with this assumption the operator would force them to add table name as a column. (or optionally enable ignore_unknown_fields)

This is a case I am expecting WriteToBigquery to support:

Given a PCollection of K, V pair of <str, dict>, I hope K can be the used to infer table name yet get V written into the dictionary. (or even do some further processing on V)

I can see there is a few way to do that. The simplest way I can think of is to allow table arg to be an instance of DoFn or AppendDestinationsFn.

class AppendDestinationsFn(DoFn):

If the sdk allow us to overwrite AppendDestinationsFn and pass it into WriteToBigquery we will have more freedom to customise the class.

I am happy to contribute and make this feature come true. But would be keen to get communities' feedback first:)

Issue Priority

Priority: 2

Issue Component

Component: io-py-gcp

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions