[BEAM-479] Name local Spark RunnableOnService profile more precisely #711

kennknowles · 2016-07-22T04:29:49Z

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

Make sure the PR title is formatted like:
[BEAM-<Jira issue #>] Description of pull request
Make sure tests pass via mvn clean verify. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
Replace <Jira issue #> in the title with the actual Jira issue
number, if there is one.
If this contribution is large, please file an Apache
Individual Contributor License Agreement.

Settling on the name local-runnable-on-service-tests for all profiles with a local endpoint. That way, this profile plus the desired module will suffice to run against a local endpoint if possible.

kennknowles · 2016-07-22T04:30:29Z

R: @amitsela

Worked with @dhalperi to more directly set up and invoke the RunnableOnService tests as a postcommit. Trying to get build times down a bit and make all runners share similar configs.

kennknowles · 2016-07-22T16:18:47Z

Test failure is Travis timeout on Mac infrastructure, which is happening broadly. Not related to this PR.

amitsela · 2016-07-22T16:39:17Z

What does a local endpoint mean ? How do other runners execute ? how are they different ? except for supporting different capabilities of the model

kennknowles · 2016-07-22T16:54:54Z

That's a really good question, worth nailing down. Intuitively it means "no cluster needs to be set up". I guess the important things would be:

No set up
No network needed (so a hermetic test that starts a real non-local cluster won't work)
No credentials

It applies not just to runners but also to tests. For example, KafkaIO can run an embedded Kafka, and that will work with local Flink & Spark. Whereas Dataflow doesn't have a local version since it is a hosted service.

Does this seem reasonable? If not, is there somewhere we can go with this idea?

kennknowles · 2016-07-22T21:00:53Z

BTW I think the Jenkins failure is a hardcoded port that is being allocated by two jobs running in parallel on the same box.

amitsela · 2016-07-23T09:26:41Z

Generally makes sense, even more, most tests on runners will probably be local because of setup/teardown overhead and the cost of maintaining cluster/s for constant testing. Also, most functionality is OK to test this way, except for serialization maybe.
But having said that, doesn't it mean that Dataflow is the exception here ? AFAIK Spark, Flink and even Gearpump run tests on a local instance.
So unless I'm missing something, there should probably be a cluster-runnable-on-service-tests instead of (or in addition to) local-runnable-on-service-tests.
WDYT ?

dhalperi · 2016-07-25T16:52:54Z

Local testing is excellent for unit tests or "fast postcommits". But I think we want to make it easier and easier to test Beam on remote infra of all sorts -- this is our primary intended use case.

E.g., I'd like there to be a post-commit that runs on a permanent Flink cluster, a permanent Spark cluster, etc. These are the things we need to make sure work well!

kennknowles · 2016-07-25T19:04:05Z

I'm for local-runnable-on-service-tests and cluster-runnable-on-service-tests. They should probably share the same base re-configuration of the runnable-on-service-tests execution, setting up deps to scan and exclusions.

The cluster-runnable-on-service-tests run has the complication that it requires credentials that will depend on who is running them, so you'll likely have to provide many options at the commandline, thus the savings of setting up the --runner=XYZ is minimal. But I generally like the idea of these both setting up some useful defaults so the mvn user only has to provide the necessary bits.

amitsela · 2016-07-26T17:41:59Z

+1 for local-* and cluster-*
One thing to consider - which cluster ? or to be accurate resource-manager. Spark can run it's own (Standalone Mode), use YARN or Mesos. According to the latest survey by Databricks Standalone is in the lead (48%), with YARN tailing it (40%) while Mesos not too popular.
I'd vote for Standalone to test the most popular use case while avoiding the additional complexity of maintaining YARN on this cluster.

kennknowles · 2016-07-27T19:19:06Z

I agree that we should test the most common use case. I don't have much more to say than that as far as how and where it might be provisioned in the future. Maybe it is a good discussion for the dev list?

amitsela · 2016-07-27T19:49:59Z

LGTM. I'll publish in dev list tomorrow.

Settling on the name "local-runnable-on-service-tests" for all profiles with a local endpoint. That way, this profile plus the desired module will suffice to run against a local endpoint if possible.

kennknowles force-pushed the remove-spark-local branch from 18e0648 to 4230312 Compare July 27, 2016 19:20

kennknowles force-pushed the remove-spark-local branch from 4230312 to b9543b9 Compare August 2, 2016 16:59

Name local Spark RunnableOnService profile more precisely

3f828b7

Settling on the name "local-runnable-on-service-tests" for all profiles with a local endpoint. That way, this profile plus the desired module will suffice to run against a local endpoint if possible.

kennknowles force-pushed the remove-spark-local branch from b9543b9 to 3f828b7 Compare August 4, 2016 19:48

asfgit closed this in fcf6b1d Aug 4, 2016

kennknowles deleted the remove-spark-local branch November 10, 2016 03:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-479] Name local Spark RunnableOnService profile more precisely #711

[BEAM-479] Name local Spark RunnableOnService profile more precisely #711

kennknowles commented Jul 22, 2016 •

edited

kennknowles commented Jul 22, 2016

kennknowles commented Jul 22, 2016

amitsela commented Jul 22, 2016 •

edited

kennknowles commented Jul 22, 2016

kennknowles commented Jul 22, 2016

amitsela commented Jul 23, 2016

dhalperi commented Jul 25, 2016

kennknowles commented Jul 25, 2016

amitsela commented Jul 26, 2016

kennknowles commented Jul 27, 2016

amitsela commented Jul 27, 2016

[BEAM-479] Name local Spark RunnableOnService profile more precisely #711

[BEAM-479] Name local Spark RunnableOnService profile more precisely #711

Conversation

kennknowles commented Jul 22, 2016 • edited

kennknowles commented Jul 22, 2016

kennknowles commented Jul 22, 2016

amitsela commented Jul 22, 2016 • edited

kennknowles commented Jul 22, 2016

kennknowles commented Jul 22, 2016

amitsela commented Jul 23, 2016

dhalperi commented Jul 25, 2016

kennknowles commented Jul 25, 2016

amitsela commented Jul 26, 2016

kennknowles commented Jul 27, 2016

amitsela commented Jul 27, 2016

kennknowles commented Jul 22, 2016 •

edited

amitsela commented Jul 22, 2016 •

edited