New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate dynamic mappings updates on the master node. #10634
Validate dynamic mappings updates on the master node. #10634
Conversation
This is still a work-in-progress as I have test failures on RiverTests. I am suspecting this is because this test triggers lots of concurrent dynamic mappings updates and there is a deadlock somehow because of my ignorange of how our transport works (eg. it is ok to update the mappings like I do in MapperUtils.validateDynamicMappingsUpdateOnMaster?). Any help would be greatly appreciated. |
try { | ||
MapperUtils.validateDynamicMappingsUpdateOnMaster(client, logger, indexService.index().name(), mappingUpdate.getKey(), mappingUpdate.getValue(), waitForMappingUpdatePostRecovery); | ||
} catch (Throwable t) { | ||
logger.debug("failed to send mapping update post recovery to master for [{}]", t, mappingUpdate.getKey()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a bad case to get into? At least should it be warn or error level? This means we reapplied some updates from the translog, but the master rejected those updates...but we dont seem to do anything with the validation, so does that mean the mappings have already been updated in place? Also, could we limit the catch to just whatever exception would mean failed validation? Otherwise a bug as simple as an NPE in validate would get caught and just logged?
The mappings side looks great! My main question is about what to do on translog replay if sending mapping updates to the master fails. |
Good question. This should hopefully not happen once mappings updates are synchronous but I believe this could happen when upgrading from 1.x. In that case I don't think we can do better than warning a scary message to the logs? |
@jpountz Ok, let's put it at warn level then? |
I think we should fail the shard in this case. It means that some data was indexed in a way that is inconsistent with the “master” mapping in which case you probably have rogue lucene indices. Picking up on another copy is a better alternative. If we ending up having no copies we are at least clearly communicating things are bad (master have a different mapping that is inconsistent with any shard). If people want to still recover they can intervene and delete the translog (which may loose some data but I think it’s an acceptable solution for this bad place to be in). Also note that we flush on close during node shutdown, so the chance this will happen is small...
|
I added (and beasted) more tests about concurrent indexing requests and the rivers issue seems to be specific to rivers actually so I left mapping updates handled like today when it comes to rivers (ie. updated locally first and then propagated to the master). Also failing to apply a mapping update on recovery now fails the shard. |
logger.debug("waited for mapping update on master for [{}], yet timed out", type); | ||
private void validateMappingUpdate(final String type, Mapping update) { | ||
final CountDownLatch latch = new CountDownLatch(1); | ||
final Throwable[] error = new Throwable[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be an AtomicReference?
…ization. As requested on elastic#10399
This commit changes dynamic mappings updates so that they are synchronous on the entire cluster and their validity is checked by the master node. There are some important consequences of this commit: - a failing index request on a non-existing type does not implicitely create the type anymore - dynamic mappings updates cannot create inconsistent mappings on different shards - indexing requests that introduce new fields might induce latency spikes because of the overhead to update the mappings on the master node Close elastic#8688
e5da1df
to
1adf232
Compare
Mappings: Validate dynamic mappings updates on the master node.
+1 this is happening everyday for me during an index rollover. |
This commit changes dynamic mappings updates so that they are synchronous on the
entire cluster and their validity is checked by the master node. There are some
important consequences of this commit:
the type anymore
shards
because of the overhead to update the mappings on the master node
Closes #8688
Closes #8650