Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-6201] Data insertion pipeline #7238

Merged
merged 2 commits into from Dec 11, 2018

Conversation

lgajowy
Copy link
Contributor

@lgajowy lgajowy commented Dec 10, 2018

This pr adds a simple pipeline that can publish synthetic data to PubSub. Thanks to this easy solution we are able to subscribe to a (previously filled) topic and read the data stream from the PubSub subscription using any pipeline in any SDK provided that it has a PubSub Source implemented.

@aaltay @pabloem could either of you take a look?

CC: @kkucharc


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---

@lgajowy
Copy link
Contributor Author

lgajowy commented Dec 10, 2018

I'm aware of #6637 that adds a periodic, streaming impulse source. However, this is Flink only solution (at least for now).

Until we have a generic solution for streaming sources that works on portable pipelines we can use the Data Insertion Pipeline in load tests for runners other than flink. WDYT?

@aaltay aaltay requested a review from pabloem December 10, 2018 21:06
@pabloem
Copy link
Member

pabloem commented Dec 11, 2018

This looks good. Thanks a lot!

@pabloem
Copy link
Member

pabloem commented Dec 11, 2018

An interesting feature for this would be to write the same values to a Kafka Topic. This way we can support a flink test with inserted Kafka source. Perhaps create a JIRA for that?

@lgajowy
Copy link
Contributor Author

lgajowy commented Dec 11, 2018

Good idea. It is also fairly easy to do. JIRA for this - here

Thanks!

@lgajowy lgajowy merged commit b0069eb into apache:master Dec 11, 2018
@lgajowy lgajowy deleted the data-insertion-pipeline branch December 11, 2018 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants