Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow rebalancing primary shards on shared filesystems #10585

Closed

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Apr 14, 2015

Instead of failing the Engine for a shared filesystem, this change
allows a "soft close" of the Engine, where only the IndexWriter is
closed so that the replica can open an IndexWriter using the same
filesystem directory/mount.

Fixes #10469

@dakrone
Copy link
Member Author

dakrone commented Apr 14, 2015

@s1monw this PR is still missing a test where replication failure is simulated (like you asked for), but I wanted to vet the idea of a "soft-close" of the Engine, I had to make some changes after #10452 was merged to allow an Engine to be closed without closing the Translog, because closing the translog caused all future recovery to hang when translog.snapshot() is called.

Let me know what you think and I will work on adding more tests tomorrow.

@s1monw
Copy link
Contributor

s1monw commented Apr 14, 2015

@dakrone I don't think we should do the sep close methods an all that kind of stuff. I'd rather just call sync instead of close on the translog. Especially given the fact that @bleskes works on larger refactorings on that end so all the changes here are likely only needed on 1.x

@dakrone
Copy link
Member Author

dakrone commented Apr 14, 2015

@s1monw wouldn't that leave the translog in a never-closed state then? Or is the translog closed somewhere else? Does that just rely on the injector being closed to close it?

@s1monw
Copy link
Contributor

s1monw commented Apr 14, 2015

@dakrone the translog is close later - the sync is the important part..

@dakrone dakrone force-pushed the allow-shadow-primary-relocation branch 3 times, most recently from a76d62c to 3fc625a Compare April 16, 2015 15:25
@s1monw s1monw self-assigned this Apr 16, 2015
s1monw added a commit to s1monw/elasticsearch that referenced this pull request Apr 21, 2015
@s1monw
Copy link
Contributor

s1monw commented Apr 21, 2015

@dakrone I think this is good functionality wise but I think the implementation needs to be less intrusive ie. I think we should implement this as a subclass of the engine instead of adding all these settings and changing how the recovery handler works and calling back into it. I took a quick step copying your test and adding this quick and dirty to the engien factory and I think it's cleaner what do you think about this s1monw@5fd56da

@dakrone
Copy link
Member Author

dakrone commented Apr 21, 2015

@s1monw I updated this to use the method that you came up with

@@ -310,6 +322,176 @@ public void testPrimaryRelocation() throws Exception {
}

@Test
@TestLogging("_root:DEBUG,index:TRACE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need that?

@s1monw
Copy link
Contributor

s1monw commented Apr 21, 2015

left minor comments LGTM otherwise

@bleskes
Copy link
Contributor

bleskes commented Apr 22, 2015

@dakrone this is the on that only goes to 1.x, right? asking because it still has the 2.0.0 label on it...

@s1monw
Copy link
Contributor

s1monw commented Apr 22, 2015

@bleskes the plan is to push this impl to 1.x and another impl to master based on your refactoring in #10624 makes sense? I removed the 2.0 label for now

@bleskes
Copy link
Contributor

bleskes commented Apr 22, 2015

yeah, makes total sense, just checking because of the label. Thx,

/**
* TODO: document me!
*/
public class MockSharedFSEngine extends MockInternalEngine {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a subclass of SharedFSEngine and use the new MockEngineSupport from #10700

@dakrone dakrone force-pushed the allow-shadow-primary-relocation branch from 5fc9fd3 to 24bf3de Compare April 22, 2015 15:15
@s1monw
Copy link
Contributor

s1monw commented Apr 22, 2015

sweet! LGTM

@dakrone dakrone force-pushed the allow-shadow-primary-relocation branch from 84e8a70 to cd57ed7 Compare April 22, 2015 16:21
Instead of failing the Engine for a shared filesystem, this change
allows a "soft close" of the Engine, where only the IndexWriter is
closed so that the replica can open an IndexWriter using the same
filesystem directory/mount.

Fixes elastic#10469
@dakrone
Copy link
Member Author

dakrone commented Apr 22, 2015

This has been merged to 1.x only, I will rewrite and open a new PR once #10624 is merged into master, since it refactors much of the recovery process.

@dakrone dakrone closed this Apr 22, 2015
@clintongormley clintongormley changed the title [CORE] Allow rebalancing primary shards on shared filesystems Allow rebalancing primary shards on shared filesystems May 29, 2015
@dakrone dakrone deleted the allow-shadow-primary-relocation branch June 1, 2015 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow relocating primary shards on shared filesystems without failing the shard
5 participants