Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-79] Merge branch 'master' into gearpump_runner #750

Closed
wants to merge 214 commits into from

Conversation

manuzhang
Copy link
Contributor

@manuzhang manuzhang commented Jul 28, 2016

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify. (Even better, enable
    Travis-CI on your fork and ensure the whole test matrix passes).
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

tgroh and others added 30 commits July 15, 2016 10:52
Transform Evaluator Factories must be reused for the entire execution of
a Pipeline and must not be reused across pipelines.

Remove EvaluatorKey, and key explicitly by the transform application.
This makes the static constructors for withAllowedLateness symmetric to
the PTransform builder methods. It also allows references to
Window#withAllowedLateness(Duration, ClosingBehavior).
A DoFn application is the scope of reuse.

Factor CloningThreadLocal as the top-level class instead of
SerializableCloningThreadLocalCacheLoader, and extract the Fn from the
AppliedPTransform when loading an absent element.
* Move package from io to io.gcp.bigquery
* Move from SDK core into GCP-IO module
* Fixup references and import orders
* Separate AvroUtils into generic AvroUtils and BigQueryAvroUtils
* Rewrite a unit test in sdk core to not depend on BigQueryIO
* Fixup Javadoc in SDK core that need not depend on BigQueryIO
* Make utility classes package-private
* Use the uber jar
* Remove OS classifier mumbo jumbo
* Move common dependency versioning to root pom
This removes the duplication of "DirectRunner" and "DirectOptions"
classes.
* Register FileIOChannelFactory for file scheme
* Modify FileIOChannelFactory to dynamically remove the file:// scheme string.
Previously, the situation was this:

 - All runners inherit a RunnableOnService integration-test
   execution referencing runnableOnServicePipelineOptions
   whether or not the variable was set. Basically an unbound
   variable reference.
 - The Dataflow runner had a profile disabling it if
   runnableOnServicePipelineOptions was not set.
 - Before they got configured, Flink and Spark had to
   do extra work to explicitly prevent the invalid
   configuration from being used.

After this change:

 - All runners inherit the same integration-test execution
   but only if the variable it requires is present.
 - Dataflow doesn't have any special profile.
 - Flink and Spark are unchanged, since they do set
   up the variable themselves. When they move to running
   only as postcommit, like Dataflow does, the hardcoding
   is expected to either move to a profile or move to
   the Jenkins invocation.
Duplicate WordCount into spark examlpes package.

Duplicate parts of TfIdf from beam examlpes.

Better reuse of WordCount and its parts.

Remove dependency on beam-examples-java
- updates the README
- repairs broken exec configuration
@jbonofre
Copy link
Member

A new rebase is needed for new version. I plan to do it asap.

…an interpretation of the Pipeline's windows

This closes apache#808
@kennknowles
Copy link
Member

I forked it and updated the version. It looks like it is passing: https://travis-ci.org/kennknowles/incubator-beam/builds/151282293.

I've opened manuzhang#1 which bumps the version on this branch, or you could just do it yourself.

kennknowles and others added 8 commits August 10, 2016 11:34
Timers are equal if the domain, timestamp, and namespace are equal.
Compare these values in compareTo. The ordering of TimerData that are
not in the same namespace or domain is arbitrary.
Update Gearpump runner version to 0.3.0-incubating
Add a field that is modified per output, which should occur twice.
@manuzhang
Copy link
Contributor Author

@kennknowles updated but hit by downloading timeout again 😞

@manuzhang
Copy link
Contributor Author

@kennknowles @jbonofre @dhalperi merged with recent master. Previous DoFn has been replaced with OldDoFn and support for new DoFn will be added later.

@manuzhang
Copy link
Contributor Author

Guys, can we advance with this PR ?

@jbonofre
Copy link
Member

I will be back from vacation Wednesday. I will work on it then.

@manuzhang
Copy link
Contributor Author

guys, any updates ?

@jbonofre
Copy link
Member

Yes, I rebased and tested. I noted some stuff to fix. I will comment today.

@jbonofre
Copy link
Member

@manuzhang would you be available on Slack or Hangout ? I would like to discuss a couple of topics with you. Thanks !

@kennknowles
Copy link
Member

The tests are green. Are there any outstanding issues, or are you ready to merge it in? Since it is a feature branch, presumably you can just roll forwards addressing any issues.

@jbonofre
Copy link
Member

I plan to discuss with Manu about some topics tomorrow. I will do the merge. Thanks !

@jbonofre
Copy link
Member

I discussed with Manu about couple of topics:

  • support of some IO features (not yet implements, it's on the TODO)
  • the README.md could be enhanced
  • the translation of DoFn is based on Gearpump FlatMapFunction which matches well
  • examples package and code directly in the runner (I would keep it like this for now and remove later)

I also tested the gearpump runner with the last gearpump incubating release with simple pipelines.

I also checked:

  • checkstyle is OK
  • headers are OK

@kennknowles I'm proposing to merge this runner on master. It would give more visibility and would allow people to experiment and provide feedback. If you agree, I will do more than happy to do the merge ;)

@kennknowles
Copy link
Member

That's a different discussion for the mailing list, where we have discussed criteria a bit. This PR is merging the other way, so that the gearpump-runner branch is caught up to master a bit.

asfgit pushed a commit that referenced this pull request Aug 25, 2016
@kennknowles
Copy link
Member

I have shuffled the commits for readability and pushed the changes from this PR into gearpump-runner. I think you'll need to close this PR manually due to the way the asfgit bot operates.

I recommend merging master into gearpump-runner as frequently as possible so that it is almost always a pretty small update.

@jbonofre
Copy link
Member

Fully agree. It's the plan and what I did. I think we have to move forward quickly to merge gearpump runner on master. It would give more visibility.

@manuzhang
Copy link
Contributor Author

Thanks for both of you

@manuzhang manuzhang closed this Aug 26, 2016
@manuzhang manuzhang deleted the gearpump_runner_sync branch February 14, 2017 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet