Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove global shared test cluster #8854

Closed
wants to merge 1 commit into from

Conversation

rmuir
Copy link
Contributor

@rmuir rmuir commented Dec 9, 2014

In my experiments, this doesnt really buy much, but it causes reproducibility grief, and prevents things like leak detection from working (I would like to do more e.g. with the filesystem).

master:
[INFO] Execution time total: 5 minutes 51 seconds
[INFO] Tests summary: 672 suites, 4412 tests, 478 errors, 51 ignored (49 assumptions)

patch:
[INFO] Execution time total: 6 minutes 40 seconds
[INFO] Tests summary: 672 suites, 4412 tests, 479 errors, 51 ignored (49 assumptions)

So the savings is really not much in the scheme of things: I think reproducibility is better.

@s1monw
Copy link
Contributor

s1monw commented Dec 9, 2014

+1 lets remove this beast

* <li>{@link Scope#TEST} - uses a new cluster for each individual test method.</li>
* <li>{@link Scope#SUITE} - uses a cluster shared across all test method in the same suite</li>
* </ul>
* <p/>
* The most common test scope it {@link Scope#GLOBAL} which shares a cluster per JVM. This cluster is only set-up once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it -> is (existing typo)

@rjernst
Copy link
Member

rjernst commented Dec 9, 2014

LGTM!

//ports can be reused as suite or test clusters are never run concurrently
//we don't reuse the same port immediately though but leave some time to make sure ports are freed
//reserve 0 to global cluster, prevent conflicts between jvms by never going above 9
return 1 + portCounter.incrementAndGet() % 9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove the 1 + , now that 0 is not reserved for the global scope. This should be portCounter.incrementAndGet() % 10;

@javanna
Copy link
Member

javanna commented Dec 19, 2014

I think this requires some more work... the global cluster has the property to be an external one too, and REST tests can run against this external cluster. We use this before releasing, to run REST tests against the distribution that we are going to ship. We might want to move that logic to SUITE clusters, but this needs be done before merging otherwise the release process breaks.

@javanna
Copy link
Member

javanna commented Dec 19, 2014

One more thing, how did you run your experiments? I tend to agree that the global cluster doesn't buy us much in local mode (default), but when running tests with network, I'd expect rebuilding the cluster for each suite would have a cost. Any measurements around this?

@rmuir
Copy link
Contributor Author

rmuir commented Dec 20, 2014

{quote}
but when running tests with network, I'd expect rebuilding the cluster for each suite would have a cost. Any measurements around this?
{quote}

Running tests with the network is useless. It just means test failures. We should not even support it. I'm certainly not going to waste 1 millisecond of my time on it.

@rmuir
Copy link
Contributor Author

rmuir commented Dec 20, 2014

I'm closing this: I dont want to fight for it, and I dont want to be blamed for test failures in flaky tests.

I will just say for the record: having randomized tests that do not reproduce is useless.

@rmuir rmuir closed this Dec 20, 2014
@javanna
Copy link
Member

javanna commented Dec 22, 2014

I think removing the global cluster is a good idea, since repeatibility of tests is super important. We should go ahead with this, make the external cluster work too and understand what the implications are with tests that use network as long as we have them. If tests with network are a bad idea let's discuss on a separate issue why we support them and why we shouldn't.

rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Feb 23, 2015
This was previously attempted in elastic#8854. I revived that branch and did
some performance testing as was suggested in the comments there.

I fixed all the errors, mostly just the rest tests, which
needed to have http enabled on the node settings (the global cluster
previously had this always enabled). I also addressed the comments from
that issue.

My performance tests involved running the entire test suite on my
desktop which has 6 cores, 16GB of ram, and nothing else was being
run on the box at the time. I ran each set of settings 3 times and
took the average time.

| mode    | master | patch | diff |
| ------- | ------ | ----- | ---- |
| local   | 409s   | 417s  | +2%  |
| network | 368s   | 380s  | +3%  |

This increase in average time is clearly worthwhile to pay to achieve
isolation of tests. One caveat is the way I fixed the rest tests
is still to have one cluster for the entire suite, so all the rest
tests can still potentially affect each other, but this is an
issue for another day.

There were some oddities that I noticed while running these tests
that I would like to point out, as they probably deserve some
investigation (but orthogonal to this PR):
* The total test run times are highly variable (more than a minute between the min and max)
* Running in network mode is on average actually *faster* than local mode. How is this possible!?
rmuir added a commit that referenced this pull request Feb 23, 2015
This was previously attempted in #8854. I revived that branch and did
some performance testing as was suggested in the comments there.

I fixed all the errors, mostly just the rest tests, which
needed to have http enabled on the node settings (the global cluster
previously had this always enabled). I also addressed the comments from
that issue.

My performance tests involved running the entire test suite on my
desktop which has 6 cores, 16GB of ram, and nothing else was being
run on the box at the time. I ran each set of settings 3 times and
took the average time.

| mode    | master | patch | diff |
| ------- | ------ | ----- | ---- |
| local   | 409s   | 417s  | +2%  |
| network | 368s   | 380s  | +3%  |

This increase in average time is clearly worthwhile to pay to achieve
isolation of tests. One caveat is the way I fixed the rest tests
is still to have one cluster for the entire suite, so all the rest
tests can still potentially affect each other, but this is an
issue for another day.

There were some oddities that I noticed while running these tests
that I would like to point out, as they probably deserve some
investigation (but orthogonal to this PR):
* The total test run times are highly variable (more than a minute between the min and max)
* Running in network mode is on average actually *faster* than local mode. How is this possible!?

closes #9781
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants