Validate dynamic mappings updates on the master node. #10634

jpountz · 2015-04-16T17:01:29Z

This commit changes dynamic mappings updates so that they are synchronous on the
entire cluster and their validity is checked by the master node. There are some
important consequences of this commit:

a failing index request on a non-existing type does not implicitely create
the type anymore
dynamic mappings updates cannot create inconsistent mappings on different
shards
indexing requests that introduce new fields might induce latency spikes
because of the overhead to update the mappings on the master node

Closes #8688
Closes #8650

jpountz · 2015-04-16T17:04:10Z

This is still a work-in-progress as I have test failures on RiverTests. I am suspecting this is because this test triggers lots of concurrent dynamic mappings updates and there is a deadlock somehow because of my ignorange of how our transport works (eg. it is ok to update the mappings like I do in MapperUtils.validateDynamicMappingsUpdateOnMaster?). Any help would be greatly appreciated.

rjernst · 2015-04-17T04:36:59Z

src/main/java/org/elasticsearch/index/gateway/IndexShardGateway.java

+            try {
+                MapperUtils.validateDynamicMappingsUpdateOnMaster(client, logger, indexService.index().name(), mappingUpdate.getKey(), mappingUpdate.getValue(), waitForMappingUpdatePostRecovery);
+            } catch (Throwable t) {
+                logger.debug("failed to send mapping update post recovery to master for [{}]", t, mappingUpdate.getKey());


This seems like a bad case to get into? At least should it be warn or error level? This means we reapplied some updates from the translog, but the master rejected those updates...but we dont seem to do anything with the validation, so does that mean the mappings have already been updated in place? Also, could we limit the catch to just whatever exception would mean failed validation? Otherwise a bug as simple as an NPE in validate would get caught and just logged?

rjernst · 2015-04-17T07:11:37Z

The mappings side looks great! My main question is about what to do on translog replay if sending mapping updates to the master fails.

jpountz · 2015-04-17T07:24:31Z

Good question. This should hopefully not happen once mappings updates are synchronous but I believe this could happen when upgrading from 1.x. In that case I don't think we can do better than warning a scary message to the logs?

rjernst · 2015-04-19T02:01:22Z

@jpountz Ok, let's put it at warn level then?

bleskes · 2015-04-19T13:02:15Z

I think we should fail the shard in this case. It means that some data was indexed in a way that is inconsistent with the “master” mapping in which case you probably have rogue lucene indices. Picking up on another copy is a better alternative. If we ending up having no copies we are at least clearly communicating things are bad (master have a different mapping that is inconsistent with any shard). If people want to still recover they can intervene and delete the translog (which may loose some data but I think it’s an acceptable solution for this bad place to be in). Also note that we flush on close during node shutdown, so the chance this will happen is small...

On 19 Apr 2015, at 04:01, Ryan Ernst notifications@github.com wrote:

@jpountz Ok, let's put it at warn level then?

—
Reply to this email directly or view it on GitHub.

jpountz · 2015-04-20T09:35:50Z

I added (and beasted) more tests about concurrent indexing requests and the rivers issue seems to be specific to rivers actually so I left mapping updates handled like today when it comes to rivers (ie. updated locally first and then propagated to the master).

Also failing to apply a mapping update on recovery now fails the shard.

kimchy · 2015-04-20T12:16:52Z

src/main/java/org/elasticsearch/index/gateway/IndexShardGateway.java

-                            logger.debug("waited for mapping update on master for [{}], yet timed out", type);
+    private void validateMappingUpdate(final String type, Mapping update) {
+        final CountDownLatch latch = new CountDownLatch(1);
+        final Throwable[] error = new Throwable[1];


can this be an AtomicReference?

jpountz · 2015-04-20T13:10:24Z

@bleskes @kimchy Pushed a new commit to address your comments

…ization. As requested on elastic#10399

This commit changes dynamic mappings updates so that they are synchronous on the entire cluster and their validity is checked by the master node. There are some important consequences of this commit: - a failing index request on a non-existing type does not implicitely create the type anymore - dynamic mappings updates cannot create inconsistent mappings on different shards - indexing requests that introduce new fields might induce latency spikes because of the overhead to update the mappings on the master node Close elastic#8688

Mappings: Validate dynamic mappings updates on the master node.

purduemike · 2015-04-22T00:33:55Z

+1 this is happening everyday for me during an index rollover.
Looks like this fix is slated for v2.0.0. Can we instead merged into the next minor release?

kevinkluge added the in progress label Apr 16, 2015

jpountz added v2.0.0-beta1 review WIP labels Apr 16, 2015

rjernst reviewed Apr 17, 2015
View reviewed changes

jpountz removed the WIP label Apr 20, 2015

jpountz assigned rjernst Apr 20, 2015

kimchy reviewed Apr 20, 2015
View reviewed changes

jpountz added 2 commits April 21, 2015 11:08

Internal: Ensure that explanation descriptions are not null on serial…

ca03e40

…ization. As requested on elastic#10399

jpountz force-pushed the fix/validate_mappings_on_master branch from e5da1df to 1adf232 Compare April 21, 2015 09:19

jpountz added a commit that referenced this pull request Apr 21, 2015

Merge pull request #10634 from jpountz/fix/validate_mappings_on_master

ac74247

Mappings: Validate dynamic mappings updates on the master node.

jpountz merged commit ac74247 into elastic:master Apr 21, 2015

kevinkluge removed in progress labels Apr 21, 2015

jpountz added >enhancement release highlight labels Apr 21, 2015

brwe mentioned this pull request May 22, 2015

Mapping potentially lost with "dynamic" : "strict", _default_ mapping and failed document index #8650

Closed

clintongormley added the :Search/Mapping Index mappings, including merging and defining field types label May 25, 2015

clintongormley added the >breaking label Jun 6, 2015

clintongormley changed the title ~~Mappings: Validate dynamic mappings updates on the master node.~~ Validate dynamic mappings updates on the master node. Jun 6, 2015

clintongormley added the resiliency label Jun 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate dynamic mappings updates on the master node. #10634

Validate dynamic mappings updates on the master node. #10634

jpountz commented Apr 16, 2015

jpountz commented Apr 16, 2015

rjernst Apr 17, 2015

rjernst commented Apr 17, 2015

jpountz commented Apr 17, 2015

rjernst commented Apr 19, 2015

bleskes commented Apr 19, 2015

jpountz commented Apr 20, 2015

kimchy Apr 20, 2015

jpountz commented Apr 20, 2015

purduemike commented Apr 22, 2015

Validate dynamic mappings updates on the master node. #10634

Validate dynamic mappings updates on the master node. #10634

Conversation

jpountz commented Apr 16, 2015

jpountz commented Apr 16, 2015

rjernst Apr 17, 2015

Choose a reason for hiding this comment

rjernst commented Apr 17, 2015

jpountz commented Apr 17, 2015

rjernst commented Apr 19, 2015

bleskes commented Apr 19, 2015

jpountz commented Apr 20, 2015

kimchy Apr 20, 2015

Choose a reason for hiding this comment

jpountz commented Apr 20, 2015

purduemike commented Apr 22, 2015