Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to one data.path per shard #10461

Merged
merged 1 commit into from Apr 20, 2015
Merged

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Apr 7, 2015

This commit moves away from using stripe RAID-0 simumlation across multiple
data paths towards using a single path per shard. Multiple data paths are still
supported but shards and it's data is not striped across multiple paths / disks.
This will for instance prevent to loose all shards if a single disk is corrupted.

Indices that are using this features already will automatically upgraded to a single
datapath based on a simple diskspace based heuristic. In general there must be enough
diskspace to move a single shard at any time otherwise the upgrade will fail.

Closes #9498

@s1monw s1monw added >enhancement v2.0.0-beta1 blocker :Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. labels Apr 7, 2015
@s1monw
Copy link
Contributor Author

s1monw commented Apr 7, 2015

FYI this is a first cut at this feature so it's not perfect but it mirrors the direction pretty well..

@dakrone dakrone changed the title [STORE] Move to on data.path per shard [STORE] Move to one data.path per shard Apr 7, 2015
import java.util.ArrayList;
import java.util.List;

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty javadoc?

if (Files.exists(path.resolve(MetaDataStateFormat.STATE_DIR_NAME))) {
numPathsExist++;
}
if (numPathsExist > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if can be moved inside the one above it.

@mikemccand
Copy link
Contributor

If ES crashes/is killed while a shard is being upgraded that shard is now corrupt right? Because we have moved some but not all files? Maybe in the upgrade release notes we strongly suggest taking snapshot before (maybe we do this already)?

return dataPaths[i];
}
}
Path maxUseablePath = dataPaths[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra 'e' here too

@s1monw
Copy link
Contributor Author

s1monw commented Apr 15, 2015

If ES crashes/is killed while a shard is being upgraded that shard is now corrupt right? Because we have moved some but not all files? Maybe in the upgrade release notes we strongly suggest taking snapshot before (maybe we do this already)?

that is a great question, the answer is no IMO. Today if you have multiple datapaths and distributor directory you can stop upgradeing in the middle and still being able to open it with 1.x and you distributor dir. The distiributor doesn't care where the files are and we don't delete before we have successfully moved them so I think we are ok here?

@@ -58,7 +57,7 @@ public DanglingIndicesState(Settings settings, NodeEnvironment nodeEnv, MetaStat
* Process dangling indices based on the provided meta data, handling cleanup, finding
* new dangling indices, and allocating outstanding ones.
*/
public void processDanglingIndices(MetaData metaData) {
public void processDanglingIndices(MetaData metaData){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lost a space.

@mikemccand
Copy link
Contributor

I love this change, I love all the stuff that's removed! I left some minor comments...

@mikemccand
Copy link
Contributor

that is a great question, the answer is no IMO.

Oh I see, because on restart, the distributor will just look around and find where the file resides and just open it there? Good!

Hmm but what about the non-atomic move case? Does Files.move behave well if JVM crashes while it's running? E.g. copy to a temp file on the dest file store and then do an atomic rename in the end?

@s1monw
Copy link
Contributor Author

s1monw commented Apr 16, 2015

Hmm but what about the non-atomic move case? Does Files.move behave well if JVM crashes while it's running? E.g. copy to a temp file on the dest file store and then do an atomic rename in the end?

yeah but when do you remove the source file, there is always a window here I guess?

@s1monw
Copy link
Contributor Author

s1monw commented Apr 16, 2015

@mikemccand I pushed changes according to your comments, and merged with master

}

// nocommit - we need something more extensible but this does the job for now...
public static ShardPath findShardPath(NodeEnvironment env, ShardId shardId, @IndexSettings Settings indexSettings) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we name this selectNewPathForShard? , to contrast it from loadShardPath (findShardPath sounds to me like go and find an existing shard path).

@bleskes
Copy link
Contributor

bleskes commented Apr 17, 2015

I this is good. Left little concerns/comments/suggestions here and there.

@s1monw
Copy link
Contributor Author

s1monw commented Apr 20, 2015

@bleskes I pushed another commit adressing your comments. @mikemccand I moved to a more pessimistic model give your comments too by using atomic move etc. can you take another look

@s1monw
Copy link
Contributor Author

s1monw commented Apr 20, 2015

@mikemccand pushed a new commit

@mikemccand
Copy link
Contributor

LGTM, thanks @s1monw!

logger.info("{} upgrading multi data dir to {}", shard, targetPath.getDataPath());
final ShardStateMetaData loaded = ShardStateMetaData.FORMAT.loadLatestState(logger, paths);
if (loaded == null) {
throw new IllegalStateException(shard + " no shard state found in any of: [" + Arrays.toString(paths) + "] please check and remove them if possible");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrays.toString(paths) already adds [] , no need to add them

@bleskes
Copy link
Contributor

bleskes commented Apr 20, 2015

Thx @s1monw looks great. I left some very minor comment and replied to your question about #10461 (comment)

@s1monw
Copy link
Contributor Author

s1monw commented Apr 20, 2015

@bleskes pushed one more commit - would you mind taking a look

Path[] paths = env.availableShardPaths(shardId);
Path path = randomFrom(paths);
assumeTrue(paths.length > 1);
int id = randomIntBetween(1, 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should remove path from paths. otherwise we write only one shard state no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pardon? ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind. Misread the test. It actually writes a shard copy to all paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the second state creation

@bleskes
Copy link
Contributor

bleskes commented Apr 20, 2015

LGTM. Left one comment we agreed to solve on a follow up issue.

@s1monw
Copy link
Contributor Author

s1monw commented Apr 20, 2015

@bleskes I opened #10677 for this

@s1monw s1monw force-pushed the fix_multiple_data_path branch 2 times, most recently from 91488c5 to 3c76333 Compare April 20, 2015 14:25
This commit moves away from using stripe RAID-0 simumlation across multiple
data paths towards using a single path per shard. Multiple data paths are still
supported but shards and it's data is not striped across multiple paths / disks.
This will for instance prevent to loose all shards if a single disk is corrupted.

Indices that are using this features already will automatically upgraded to a single
datapath based on a simple diskspace based heuristic. In general there must be enough
diskspace to move a single shard at any time otherwise the upgrade will fail.

Closes elastic#9498
@s1monw s1monw merged commit 5730c06 into elastic:master Apr 20, 2015
@s1monw s1monw deleted the fix_multiple_data_path branch April 20, 2015 19:02
@clintongormley clintongormley changed the title [STORE] Move to one data.path per shard Move to one data.path per shard Jun 7, 2015
@clintongormley clintongormley added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[STORE] Move to one datapath per shard
7 participants