Skip to content

[BEAM-6480] Adds AvroIO sink for generic records.#9005

Merged
aromanenko-dev merged 6 commits intoapache:masterfrom
RyanSkraba:BEAM-6480-avroio-sink
Jul 9, 2019
Merged

[BEAM-6480] Adds AvroIO sink for generic records.#9005
aromanenko-dev merged 6 commits intoapache:masterfrom
RyanSkraba:BEAM-6480-avroio-sink

Conversation

@RyanSkraba
Copy link
Contributor

Add a sink to AvroIO that takes generic data (usually GenericRecords). The current method sinkViaGenericRecords() has been deprecated.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@RyanSkraba
Copy link
Contributor Author

R: @iemejia Can you take a look?

@iemejia iemejia self-requested a review July 5, 2019 12:54
Copy link
Member

@iemejia iemejia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty ok, please apply the suggestions on naming and minor fixes and it will be ready to go.

private enum WriteMethod {
AVROIO_WRITE,
AVROIO_SINK
AVROIO_SINK,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AVROIO_SINK_WITH_SCHEMA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooof -- this is used in different places to distinguish between write and sink techniques (with and without Schema).

I think I'll add one more to distinguish between the three types of sinks: AVROIO_SINK_CLASS with a specific/generated class, AVROIO_SINK_WITH_SCHEMA and AVROIO_SINK_WITH_FORMATTER. What do you think?

Modify unit test to show that the sink can work directly with a PCollection<GenericRecord> without casting.
@RyanSkraba
Copy link
Contributor Author

Oops -- there's a NeedsRunner issue with this PR. It turns out that the unit test relies on heterogeneous GenericRecords being sent to dynamic files. It's unlikely that a PCollection<GenericRecord> with different schemas will be correctly serialized with a single AvroCoder (and probably undesirable).

I can either rewrite the test to use a single schema and different criteria to select dynamic destinations (which exercises the code equally well) or write a custom Coder to correctly serialize the different GenericRecords in a single collection (which is a good example of how to use this technique). I'm taking a look.

@RyanSkraba
Copy link
Contributor Author

Run Java PreCommit

@RyanSkraba
Copy link
Contributor Author

R: @aromanenko-dev

Copy link
Contributor

@aromanenko-dev aromanenko-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It almost LGTM, I just left several comments.

Copy link
Contributor

@aromanenko-dev aromanenko-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aromanenko-dev aromanenko-dev merged commit 142b581 into apache:master Jul 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants