Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add path.shared_data #12729

Merged
merged 1 commit into from Aug 12, 2015
Merged

Add path.shared_data #12729

merged 1 commit into from Aug 12, 2015

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Aug 7, 2015

This allows path.shared_data to be added to the security manager while
still allowing a custom data_path for indices using shadow replicas.

For example, configuring path.shared_data: /tmp/foo, then created an
index with:

POST /myindex
{
  "index": {
    "number_of_shards": 1,
    "number_of_replicas": 1,
    "data_path": "/tmp/foo/bar/baz",
    "shadow_replicas": true
  }
}

The index will then reside in /tmp/foo/bar/baz.

path.shared_data defaults to ${path.home}/data if not specified.

Resolves #12714
Relates to #11065

.put(InternalSettingsPreparer.IGNORE_SYSTEM_PROPERTIES_SETTING, true) // make sure we get what we set :)
.put(ClusterName.SETTING, InternalTestCluster.clusterName("single-node-cluster", randomLong()))
.put("path.home", createTempDir())
.put("path.shared_data", createTempDir().getParent())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the .getParent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted it to be a level higher than the temp directory, to capture the directory that other temp dirs are created in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not create one temp dir, and then a subdir off of that? There is nothing that guarantees tempdirs are all created side by side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the directory set for path.shared_data and the custom directories set during tests when indices are created are uncoupled, if I set it to a specific directory in advance, every test (including where we randomly add a custom data path) would be required to know what the shared data path was already set to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like it better if custom data paths were explicitely created in a specific drectory. However it's an existing issue that your PR does not introduce, so could you just add a comment explaining why you set the custom data path this way and add a TODO to explicitely set custom index paths to be sub directories?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@dakrone
Copy link
Member Author

dakrone commented Aug 10, 2015

@jpountz pushed new commits for your comments

if (settings.get("path.shared_data") != null) {
sharedDataFile = PathUtils.get(cleanPath(settings.get("path.shared_data")));
} else {
sharedDataFile = homeFile.resolve("data");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it resolve to the same as path.data instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as path.data? Or do you mean it should append the cluster name onto it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if path.data is set eg. in the config/elasticsearch.yml to be another location

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe sharedDataFile should just be null if not set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, part of the reason is that path.data can be an array, and so I didn't want to just randomly pick one of the paths

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to set it as null because if we remove custom_paths_enabled we'll now get a NullPointerException if someone tries to use it. I think if we remove custom_paths_enabled we can set this to a better default if you're concerned about it using the data path. Maybe just ${es.path.home}/data/custom or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe it should be null when not set, otherwise we request an unnecessary permission to the security manager, while we should be requesting as few permissions as possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make it null and add better messaging in the validation I think.

@dakrone
Copy link
Member Author

dakrone commented Aug 10, 2015

@jpountz pushed a change to make path.shared_data null if unset and add better validation to avoid NPEs

@@ -300,7 +300,7 @@ public void run() {
@Test
public void testCustomDataPaths() throws Exception {
String[] dataPaths = tmpPaths();
NodeEnvironment env = newNodeEnvironment(dataPaths, Settings.EMPTY);
NodeEnvironment env = newNodeEnvironment(dataPaths, "/tmp", Settings.EMPTY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checking that this test does not create files in this dir, otherwise we should rather use the java.io.tmpdir?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't create any files here, it just checks and resolves paths (unit test only)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jpountz
Copy link
Contributor

jpountz commented Aug 11, 2015

This looks good to me, but I'd like that someone else gives it a look too. @rjernst maybe you?

@rjernst
Copy link
Member

rjernst commented Aug 12, 2015

LGTM too. Do you think #12776 is quick enough to get in today, so we can remove the old setting altogether for 2.x (since users will be forced to add path.shared_data anyways?)

@dakrone
Copy link
Member Author

dakrone commented Aug 12, 2015

Do you think #12776 is quick enough to get in today, so we can remove the old setting altogether for 2.x (since users will be forced to add path.shared_data anyways?)

Yeah, this should be a really quick change.

@dakrone dakrone force-pushed the shared-data-path branch 2 times, most recently from 0b9ac50 to 6a9c86f Compare August 12, 2015 16:49
This allows `path.shared_data` to be added to the security manager while
still allowing a custom `data_path` for indices using shadow replicas.

For example, configuring `path.shared_data: /tmp/foo`, then created an
index with:

```
POST /myindex
{
  "index": {
    "number_of_shards": 1,
    "number_of_replicas": 1,
    "data_path": "/tmp/foo/bar/baz",
    "shadow_replicas": true
  }
}
```

The index will then reside in `/tmp/foo/bar/baz`.

`path.shared_data` defaults to `null` if not specified.

Resolves elastic#12714
Relates to elastic#11065
@dakrone dakrone merged commit ff5ad39 into elastic:master Aug 12, 2015
@dakrone dakrone added the v2.0.0 label Aug 12, 2015
@clintongormley clintongormley added the :Core/Infra/Settings Settings infrastructure and APIs label Aug 13, 2015
@colings86 colings86 removed the v2.0.0 label Aug 21, 2015
@dakrone dakrone deleted the shared-data-path branch March 3, 2016 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants