New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-5] Add Flink Runner #12

Closed
wants to merge 149 commits into
base: master
from

Conversation

Projects
None yet
8 participants
@mxm
Contributor

mxm commented Mar 2, 2016

As of BEAM-5 and the previous mailing list discussion, I would like to contribute the Flink Runner to Beam.

According to the repository layout document, I have created a Beam Runners project which will host the runners. Therein, I have added the Flink runner with a few modifications mostly related to code formatting and licensing. These changes you can see in the individual commits. Note, that everything builds and the tests run fine.

The Runner is in sync with the latest Beam SDK version which is currently 1.5.0-SNAPSHOT. I would like to point out that the runner also supports the stable 1.0.0 version (in the 1.0.0 branch at https://github.com/dataArtisans/flink-dataflow) but I've added the latest version here to keep up with the latest changes in Beam.

Please let me know what you think about the pull request and when you would like to merge it.

@davorbonaci

This comment has been minimized.

Member

davorbonaci commented Mar 3, 2016

Awesome, Max! This sounds great.

Let me take this code review. I'm also looping in Mark, who may be able to provide additional insight. I'd expect we'd be able to provide a few thoughts and get it merged quickly.

R: @davorbonaci
CC: @mshields822

@mxm

This comment has been minimized.

Contributor

mxm commented Mar 3, 2016

Hi @davorbonaci! Thanks for the feedback. I would like to merge this pull request tomorrow if you don't have any objections. Let me know if you need anything more. Surely, we will continue to work on the Flink Runner once it is merged. We can then also transfer any open issues to the Apache Beam Jira.

@mshields822

This comment has been minimized.

Contributor

mshields822 commented Mar 3, 2016

Agree - plenty yet to be done, but at least we are buildable with unit tests passing.

@davorbonaci

This comment has been minimized.

Member

davorbonaci commented Mar 4, 2016

I have just 2 ideas for consideration, none of them is blocking:

  • What do you think about changing the directory name from runners/flink-runner to runners/flink? Sounds simpler, and perhaps less unnecessary duplication. I'm sorry if I'm now giving a different recommendation than what the document initially said.
  • When we did the Dataflow code drop, we preserved all commit history of the old repository under the pretense that it may help other better understand why certain things were done. Perhaps you can consider doing the same. An extra benefit is that it gives an appropriate credit to all authors of this codebase and makes the graphs more accurate.

Let me know what you think. Either way, I think it is great. (And I'm looking forward in working with you going forward!)

@mxm

This comment has been minimized.

Contributor

mxm commented Mar 4, 2016

  • Sure we can change the directory name from runners/flink-runner to runners/flink. Looks much nicer.
  • I didn't want to import all commits into this pull request because I thought it would discourage merging. However, if we can preserve commit history, I'm all for it. I'll push a new version with the entire history.

I'm also looking forward to working with you @davorbonaci!

@mxm

This comment has been minimized.

Contributor

mxm commented Mar 4, 2016

Two options for preserving the commits:

  1. Make a merge commit. This merges and interleaves the whole history of the Dataflow SDK. Probably not something we want.

  2. Rebase. Unfortunately, only possible if we rewrite the Flink Runner history to be relative to the runners/flink directory. Otherwise, we run into conflicts with shared files like /pom.xml and we would loose proper history there if we changed these files during rebasing.

So I would go for option 2).

@mxm mxm force-pushed the mxm:master branch 2 times, most recently from aa4601d to 85a6568 Mar 4, 2016

@mxm mxm force-pushed the mxm:master branch from 85a6568 to 460b58f Mar 4, 2016

@mxm

This comment has been minimized.

Contributor

mxm commented Mar 4, 2016

There you go. I feel like a real Git magician now. For your information, I rewrote the history using

git filter-branch -f --index-filter 'git ls-files -s | perl -pe "s/\t/\trunners\/flink/g" |
   GIT_INDEX_FILE=${GIT_INDEX_FILE}.new git update-index --index-info &&
   mv "${GIT_INDEX_FILE}.new" "$GIT_INDEX_FILE"' HEAD

And then put the other changes on top again. Everything integrates nicely and should be good to merge now :)

@asfgit asfgit closed this in 4da935b Mar 4, 2016

@davorbonaci

This comment has been minimized.

Member

davorbonaci commented Mar 4, 2016

This looks great. An awesome milestone for the entire Beam project!

( I went ahead and merged this for you while we sort out your permissions issues. )

@mxm

This comment has been minimized.

Contributor

mxm commented Mar 4, 2016

Thanks for the merge! Let's take Beam to the next level :)

@tomwhite tomwhite referenced this pull request Mar 10, 2016

Merged

Import Spark Runner code #37

aljoscha pushed a commit to aljoscha/beam that referenced this pull request Mar 14, 2018

Merge pull request apache#12 from bsidhom/output-receiver-empty
Only access output coder if available

tvalentyn pushed a commit to tvalentyn/beam that referenced this pull request May 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment