New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests: Revamp static bwc test framework to use dangling indexes #10247
Conversation
|
||
@TimeoutSuite(millis = 40 * TimeUnits.MINUTE) | ||
public class OldIndexBackwardsCompatibilityTests extends StaticIndexBackwardCompatibilityTest { | ||
@LuceneTestCase.SuppressCodecs({"Lucene3x", "MockFixedIntBlock", "MockVariableIntBlock", "MockSep", "MockRandom", "Lucene40", "Lucene41", "Appending", "Lucene42", "Lucene45", "Lucene46", "Lucene49"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity - why are all these suppresses needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember, but they were copied from the static index superclass. I believe @s1monw added them originally on the first static bwc test, so maybe he can comment.
This is much cleaner! thx . Left some comments. |
@bleskes I pushed another commit with some changes based on your feedback. |
Thx @rjernst . I replied to the comments. |
@bleskes I pushed a new commit. I believe this addresses your concern over using hard coded paths. |
Awesome. LGTM. I will work on the dangling request issue today, so we can get this in. |
scratch the waiting for the dangling indices request issue. missed the master:false on the loading node. No need to wait indeed. |
In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool. This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local. Note: this was discovered by elastic#10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.
e9f716d
to
3948a84
Compare
The static old index tests currently take a long time to run because each index version essentially recreates the cluster, and spins up new nodes. This PR instead loads each old version into the existing cluster as a dangling index. It also removes the intermediate "StaticIndexBackwardCompatibilityTest" which was an extra layer with no purpose, and moves a shared version of a commonly found function to get an http client. The test now takes between 40 and 60 seconds for me. I also ran it "under stress" by running all ES tests in one shell, while simultaneously running 10 iterations of the old index tests. Each iteration took on average about 90 seconds, which is much better than the 20+ minutes we see in master on jenkins. closes elastic#10247
3948a84
to
c3011ce
Compare
The static old index tests currently take a long time to run because each index version essentially recreates the cluster, and spins up new nodes. This PR instead loads each old version into the existing cluster as a dangling index. It also removes the intermediate "StaticIndexBackwardCompatibilityTest" which was an extra layer with no purpose, and moves a shared version of a commonly found function to get an http client. The test now takes between 40 and 60 seconds for me. I also ran it "under stress" by running all ES tests in one shell, while simultaneously running 10 iterations of the old index tests. Each iteration took on average about 90 seconds, which is much better than the 20+ minutes we see in master on jenkins. closes #10247
The static old index tests currently take a long time to run because each index version essentially recreates the cluster, and spins up new nodes. This PR instead loads each old version into the existing cluster as a dangling index. It also removes the intermediate "StaticIndexBackwardCompatibilityTest" which was an extra layer with no purpose, and moves a shared version of a commonly found function to get an http client. The test now takes between 40 and 60 seconds for me. I also ran it "under stress" by running all ES tests in one shell, while simultaneously running 10 iterations of the old index tests. Each iteration took on average about 90 seconds, which is much better than the 20+ minutes we see in master on jenkins. closes #10247
In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool. This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local. Note: this was discovered by elastic#10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update. Closes elastic#10350
In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool. This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local. Note: this was discovered by elastic#10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update. Closes elastic#10350
The static old index tests currently take a long time to run because each index version essentially recreates the cluster, and spins up new nodes. This PR instead loads each old version into the existing cluster as a dangling index. It also removes the intermediate "StaticIndexBackwardCompatibilityTest" which was an extra layer with no purpose, and moves a shared version of a commonly found function to get an http client. The test now takes between 40 and 60 seconds for me. I also ran it "under stress" by running all ES tests in one shell, while simultaneously running 10 iterations of the old index tests. Each iteration took on average about 90 seconds, which is much better than the 20+ minutes we see in master on jenkins. closes elastic#10247
The static old index tests currently take a long time to run because
each index version essentially recreates the cluster, and spins up
new nodes. This PR instead loads each old version into the existing
cluster as a dangling index. It also removes the intermediate
"StaticIndexBackwardCompatibilityTest" which was an extra layer
with no purpose, and moves a shared version of a commonly found
function to get an http client.
The test now takes between 40 and 60 seconds for me. I also ran it
"under stress" by running all ES tests in one shell, while
simultaneously running 10 iterations of the old index tests. Each
iteration took on average about 90 seconds, which is much better
than the 20+ minutes we see in master on jenkins.