Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-3397] Fix the failure in the dataflow integration test by removing the spark and flink pipeline options #4388

Closed
wants to merge 5 commits into from

Conversation

alanmyrvold
Copy link
Member

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

@kennknowles
Copy link
Member

I suspect you are going to have to explicitly remove each dependency, as I doubt this config is transitive. This is fixed already in the gradle build, but as long as we stick with maven, I would suggest this course of action:

  • Delete the jenkins-precommit profile and all of its executions. They confuse Jenkins anyhow.
  • Make a separate groovy file for example ITs (this adds value on its own, since unit tests will be more reliable).
  • In that file, use shell commands so you can do more than one (I think mavenJob only supports one lol) so you can do a "no frills" maven install of the examples and all runners, followed by independent calls to run ITs. Whatever the groovy version of this:
mvn -DskipTests -Dmdeps.analyze.skip -Drat.skip -Dfindbugs.skip -Djavadoc.skip clean install -P apex-runner,spark-runner,dataflow-runner,direct-runner,flink-runner -pl examples/java -am

# Flink
mvn --errors failsafe:integration-test@flink-runner-integration-tests \
    -pl examples/java \
    -P flink-runner \
    -D beamTestPipelineOptions=... flink options from the jenkins-precommit profile ...

# Spark
mvn --errors failsafe:integration-test@spark-runner-integration-tests \
    -pl examples/java \
    -P spark-runner \
    -D beamTestPipelineOptions=... spark options from the jenkins-precommit profile ...

etc

I don't know how to make that parallel, unfortunately. This was/should become a pipeline job eventually that parallelizes the IT runs I guess.

This will have a smidgin of overhead due to recompiling the SDK and runners, but you should be able to skip a lot. Optionally, you could make this migration for only the Dataflow execution from the existing config.

@lukecwik
Copy link
Member

There is an internal change that @charlesccychen is working on that will make this PR moot.

@alanmyrvold
Copy link
Member Author

Closing without merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants