Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write state also on data nodes if not master eligible #9952

Closed
wants to merge 34 commits into from

Conversation

brwe
Copy link
Contributor

@brwe brwe commented Mar 2, 2015

When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

I am a little lost with this. I found that the index can still be deleted
from a data node if the state was written but the node gets a new cluster state from a
master that does not have it, for example because it was restarted without data folder. Happens
if the data node does not get the initial cluster state from the new but a later one and state
persistence is not disabled.
I avoid this now by this: https://github.com/elasticsearch/elasticsearch/pull/9952/files#diff-f0f71bedb3d7e6f1cec54e8dddf5c3d3R109
but am worried about side effects this might have. Any feedback appreciated.

closes #8823

// remove the index state for this index if it is only a data node
// only delete if the last shard was removed
if (shardsAllocatedOnThisNodeInLastClusterState) {
removeIndexState(indexMetaData);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we need this removal logic, we already have that in IndicesClusterStateService#applyCleanedIndices, where an index is removed is no shards are around any more, its nice that it would be in a single place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh, I think I see what happens, in IndicesClusterStateService, we just remove the index and not delete it from the file system, which is good. In IndicesStore, only there we actually delete the shard content once the shard is allocated on all the other nodes. I think that we need to add a logic there that if the there are no more shards around, we need to also delete the index itself (which will delete its metadata).

@s1monw
Copy link
Contributor

s1monw commented Mar 4, 2015

I left some comments...

@s1monw s1monw self-assigned this Mar 4, 2015
@s1monw s1monw added the review label Mar 4, 2015
@brwe
Copy link
Contributor Author

brwe commented Mar 4, 2015

Made a pr for the deletion of index folders here: #9985 Should be easy to remove all the additional deletion code from this pr.

@brwe brwe force-pushed the data-node-state-write-pr branch 2 times, most recently from fcbeaed to 011f424 Compare March 5, 2015 16:18
@brwe
Copy link
Contributor Author

brwe commented Mar 5, 2015

#9985 is merged, I rebased on latest master and changed the code accordingly. I wanted to remove the change in ClusterStateEvent also because I was unable to reproduce the failures I'd seen before without it. But now I found that without the change the tests only pass on my Linux machine but fail every 10 iterations or so on my mac so something is still fishy. I'll try to come up with a detailed failure analysis tomorrow.

@brwe
Copy link
Contributor Author

brwe commented Mar 6, 2015

I think I know what is going on now: The fresh master with the empty cluster state does (rarely) not send the first cluster state due to a race condition in lifecycles of DiscoveryService and its member Discovery. In DiscoveryService.doStart() the Discovery is started but the lifecyle for DiscoveryService is started only after that. This is why when the first cluster state reaches DiscoveryService.publish the lifecycle might or might not have started.

I added a commit d69f2cf where I removed the ClusterStateEvent change and added an artificial delay to the DiscoveryService.doStart() so that the tests fail reliably just so you can check if you want.

I would suggest we remove the ClusterStateEvent workaround and open another issue for this because this behavior is not a result of this pull request.

@s1monw
Copy link
Contributor

s1monw commented Mar 6, 2015

I agree with your idea of opening a new issue for the ClusterSTateEvent problem

@brwe
Copy link
Contributor Author

brwe commented Mar 10, 2015

Chatted with @s1monw and now rewrote it so that the selection of what to write is not done in GatewayMetaState anymore. I tried to do it similar to #10016. It is still a little raw but but would be great if you could let me know if this is the right direction.

@@ -90,6 +94,26 @@ public MetaData loadMetaState() throws Exception {
return metaStateService.loadFullState();
}

public static class IndexMetaWriteInfo {
IndexMetaData newMetaData;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can they all be package private and final please? just like a struct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also you might wanna put this to the end of the file

brwe added a commit that referenced this pull request Apr 29, 2015
When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

closes #8823
closes #9952
@brwe brwe closed this in 4088dd3 Apr 29, 2015
@kevinkluge kevinkluge removed the review label Apr 29, 2015
@brwe
Copy link
Contributor Author

brwe commented Apr 29, 2015

need to investigate #10017 before we can push

@brwe brwe reopened this Apr 29, 2015
@brwe
Copy link
Contributor Author

brwe commented Apr 30, 2015

The reason why the tests failed on CI is the same I described in the beginning #9952 (comment) : a data node receives a new cluster state from a master that does not have the index in its state but the data node missed the state with a no master block before and so state persistence was not disabled. the fact that an index is not in the cluster state is then interpreted as delete command. This can happen here for the reasons described in #10017 but there might be other reasons as well. I now think we should not delete indices at all if the cluster state that would cause a deletion comes from a new master.
I added a new commit for this but need someone to confirm that this is actually the right solution.

@s1monw
Copy link
Contributor

s1monw commented May 4, 2015

I added a new commit for this but need someone to confirm that this is actually the right solution.

+1 to the solution

@brwe
Copy link
Contributor Author

brwe commented May 4, 2015

Chatted with @kimchy and we decided to push as is and add a //norelease comment and open an issue because the short term fix for the problem (#9952 (comment)) is not very elegant.
Added another commit to address the latest comments.

@s1monw
Copy link
Contributor

s1monw commented May 5, 2015

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker >bug :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. resiliency v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write index metadata on data nodes where shards allocated
6 participants