-
Notifications
You must be signed in to change notification settings - Fork 24.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mappings: Update mapping on master in async manner #6648
Conversation
* and sent to master for heavy single index requests that each introduce a new mapping, and when | ||
* multiple shards exists on the same nodes, allowing to work on the index level in this case. | ||
*/ | ||
private class MasterMappingUpdater extends Thread { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we implement Runnable instead of extending from Thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, but then we need to have the runnable around, with the thread as variables. I like this encapsulation, I don't think extending Thread is such a bad idea for this case. Will switch if there is strong sentiment about it
Left two comments, this change looks good to me. Maybe someone else can also take a look? |
@@ -323,6 +324,8 @@ public void close() { | |||
stopWatch.stop().start("rivers"); | |||
injector.getInstance(RiversManager.class).close(); | |||
|
|||
stopWatch.stop().start("mapping"); | |||
injector.getInstance(MappingUpdatedAction.class).close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's safer to introduce a stop() method and call it in the stop phase as the masterMappingUpdater thread might try to speak to the cluster service which has been stopped, leading to errors & warning logs
I went through the change. Bulk of it looks good. Left some minor comments. I also wonder if we should mark it as breaking because we removed the |
@bleskes used the support method, added a note on breaking, also bit the bullet and cleaned all calls to update mapping to include doc mapper and UUID actually used |
sortedMappers.put(cursor.key, cursor.value); | ||
} | ||
Mapper[] sortedMappers = mappers.values().toArray(Mapper.class); | ||
Arrays.sort(sortedMappers, new Comparator<Mapper>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. did this come out of profiling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
+1 |
Today, when a new mapping is introduced, the mapping is rebuilt (refreshSource) on the thread that performs the indexing request. This can become heavier and heavier if new mappings keeps on being introduced, we can move this process to another thread that will be responsible to refresh the source and then send the update mapping to the master (note, this doesn't change the semantics of new mapping introduction, since they are async anyhow). When doing so, the thread can also try and batch as much updates as possible, this is handy especially when multiple shards for the same index exists on the same node. An internal setting that can control the time to wait for batches is also added (defaults to 0). Testing wise, a new support method on ElasticsearchIntegrationTest#waitForConcreteMappingsOnAll to allow to wait for the concrete manifestation of mappings on all relevant nodes is added. Some tests mistakenly rely on the fact that there are no more pending tasks to mean mappings have been updated, so if we see, timing related, failures down later (all tests pass), then those will need to be fixed to wither awaitBusy on the master for the new mapping, or in the rare case, wait for the concrete mapping on all the nodes using the new method. closes elastic#6648
also, no need to call nodes info in test, we already have the node names
also use the internal cluster support method to get the list of nodes an index is on
Today, when a new mapping is introduced, the mapping is rebuilt (refreshSource) on the thread that performs the indexing request. This can become heavier and heavier if new mappings keeps on being introduced, we can move this process to another thread that will be responsible to refresh the source and then send the update mapping to the master (note, this doesn't change the semantics of new mapping introduction, since they are async anyhow). When doing so, the thread can also try and batch as much updates as possible, this is handy especially when multiple shards for the same index exists on the same node. An internal setting that can control the time to wait for batches is also added (defaults to 0). Testing wise, a new support method on ElasticsearchIntegrationTest#waitForConcreteMappingsOnAll to allow to wait for the concrete manifestation of mappings on all relevant nodes is added. Some tests mistakenly rely on the fact that there are no more pending tasks to mean mappings have been updated, so if we see, timing related, failures down later (all tests pass), then those will need to be fixed to wither awaitBusy on the master for the new mapping, or in the rare case, wait for the concrete mapping on all the nodes using the new method. closes #6648
During phase1 we copy over all lucene segments. These make refer to mapping updates that are still queued up to be sent to master. We must make sure those pending updates are sent before completing the relocation. Relates to elastic#6648
Config option "action.wait_on_mapping_change" was removed since ES 1.3.0 elastic/elasticsearch#6648
Config option "action.wait_on_mapping_change" was removed since ES 1.3.0 elastic/elasticsearch#6648
Today, when a new mapping is introduced, the mapping is rebuilt (refreshSource) on the thread that performs the indexing request. This can become heavier and heavier if new mappings keeps on being introduced, we can move this process to another thread that will be responsible to refresh the source and then send the update mapping to the master (note, this doesn't change the semantics of new mapping introduction, since they are async anyhow).
When doing so, the thread can also try and batch as much updates as possible, this is handy especially when multiple shards for the same index exists on the same node. An internal setting that can control the time to wait for batches is also added (defaults to 0).
Testing wise, a new support method on ElasticsearchIntegrationTest#waitForConcreteMappingsOnAll to allow to wait for the concrete manifestation of mappings on all relevant nodes is added. Some tests mistakenly rely on the fact that there are no more pending tasks to mean mappings have been updated, so if we see, timing related, failures down later (all tests pass), then those will need to be fixed to wither awaitBusy on the master for the new mapping, or in the rare case, wait for the concrete mapping on all the nodes using the new method.
Note, this change also removes
action.wait_on_mapping_change
, this is an internal setting, and is not recommended to set it. It was used using the old test infrastructure to validate if the problem was due to mapping propagation, but we have a much better infra for this now.