Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests: Revamp static bwc test framework to use dangling indexes #10247

Merged
merged 1 commit into from Apr 4, 2015

Conversation

rjernst
Copy link
Member

@rjernst rjernst commented Mar 25, 2015

The static old index tests currently take a long time to run because
each index version essentially recreates the cluster, and spins up
new nodes. This PR instead loads each old version into the existing
cluster as a dangling index. It also removes the intermediate
"StaticIndexBackwardCompatibilityTest" which was an extra layer
with no purpose, and moves a shared version of a commonly found
function to get an http client.

The test now takes between 40 and 60 seconds for me. I also ran it
"under stress" by running all ES tests in one shell, while
simultaneously running 10 iterations of the old index tests. Each
iteration took on average about 90 seconds, which is much better
than the 20+ minutes we see in master on jenkins.

@rjernst rjernst added >test Issues or PRs that are addressing/adding tests v2.0.0-beta1 v1.6.0 v1.5.1 labels Mar 25, 2015

@TimeoutSuite(millis = 40 * TimeUnits.MINUTE)
public class OldIndexBackwardsCompatibilityTests extends StaticIndexBackwardCompatibilityTest {
@LuceneTestCase.SuppressCodecs({"Lucene3x", "MockFixedIntBlock", "MockVariableIntBlock", "MockSep", "MockRandom", "Lucene40", "Lucene41", "Appending", "Lucene42", "Lucene45", "Lucene46", "Lucene49"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity - why are all these suppresses needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember, but they were copied from the static index superclass. I believe @s1monw added them originally on the first static bwc test, so maybe he can comment.

@bleskes
Copy link
Contributor

bleskes commented Mar 25, 2015

This is much cleaner! thx . Left some comments.

@rjernst
Copy link
Member Author

rjernst commented Mar 26, 2015

@bleskes I pushed another commit with some changes based on your feedback.

@bleskes
Copy link
Contributor

bleskes commented Mar 26, 2015

Thx @rjernst . I replied to the comments.

@rjernst
Copy link
Member Author

rjernst commented Mar 31, 2015

@bleskes I pushed a new commit. I believe this addresses your concern over using hard coded paths.

@bleskes
Copy link
Contributor

bleskes commented Mar 31, 2015

Awesome. LGTM. I will work on the dangling request issue today, so we can get this in.

@bleskes
Copy link
Contributor

bleskes commented Mar 31, 2015

scratch the waiting for the dangling indices request issue. missed the master:false on the loading node. No need to wait indeed.

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Mar 31, 2015
In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool.

This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local.

Note: this was discovered by elastic#10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.
The static old index tests currently take a long time to run because
each index version essentially recreates the cluster, and spins up
new nodes.  This PR instead loads each old version into the existing
cluster as a dangling index. It also removes the intermediate
"StaticIndexBackwardCompatibilityTest" which was an extra layer
with no purpose, and moves a shared version of a commonly found
function to get an http client.

The test now takes between 40 and 60 seconds for me. I also ran it
"under stress" by running all ES tests in one shell, while
simultaneously running 10 iterations of the old index tests. Each
iteration took on average about 90 seconds, which is much better
than the 20+ minutes we see in master on jenkins.

closes elastic#10247
@rjernst rjernst merged commit c3011ce into elastic:master Apr 4, 2015
rjernst added a commit that referenced this pull request Apr 7, 2015
The static old index tests currently take a long time to run because
each index version essentially recreates the cluster, and spins up
new nodes.  This PR instead loads each old version into the existing
cluster as a dangling index. It also removes the intermediate
"StaticIndexBackwardCompatibilityTest" which was an extra layer
with no purpose, and moves a shared version of a commonly found
function to get an http client.

The test now takes between 40 and 60 seconds for me. I also ran it
"under stress" by running all ES tests in one shell, while
simultaneously running 10 iterations of the old index tests. Each
iteration took on average about 90 seconds, which is much better
than the 20+ minutes we see in master on jenkins.

closes #10247
rjernst added a commit that referenced this pull request Apr 7, 2015
The static old index tests currently take a long time to run because
each index version essentially recreates the cluster, and spins up
new nodes.  This PR instead loads each old version into the existing
cluster as a dangling index. It also removes the intermediate
"StaticIndexBackwardCompatibilityTest" which was an extra layer
with no purpose, and moves a shared version of a commonly found
function to get an http client.

The test now takes between 40 and 60 seconds for me. I also ran it
"under stress" by running all ES tests in one shell, while
simultaneously running 10 iterations of the old index tests. Each
iteration took on average about 90 seconds, which is much better
than the 20+ minutes we see in master on jenkins.

closes #10247
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Apr 8, 2015
In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool.

This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local.

Note: this was discovered by elastic#10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.

Closes elastic#10350
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Apr 8, 2015
In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool.

This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local.

Note: this was discovered by elastic#10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.

Closes elastic#10350
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
The static old index tests currently take a long time to run because
each index version essentially recreates the cluster, and spins up
new nodes.  This PR instead loads each old version into the existing
cluster as a dangling index. It also removes the intermediate
"StaticIndexBackwardCompatibilityTest" which was an extra layer
with no purpose, and moves a shared version of a commonly found
function to get an http client.

The test now takes between 40 and 60 seconds for me. I also ran it
"under stress" by running all ES tests in one shell, while
simultaneously running 10 iterations of the old index tests. Each
iteration took on average about 90 seconds, which is much better
than the 20+ minutes we see in master on jenkins.

closes elastic#10247
@rjernst rjernst deleted the fix/slow-static-bwc branch September 18, 2020 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>test Issues or PRs that are addressing/adding tests v1.5.1 v1.6.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants