Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to stripe the data location over multiple locations #1356

Closed
kimchy opened this issue Sep 22, 2011 · 7 comments
Closed

Allow to stripe the data location over multiple locations #1356

kimchy opened this issue Sep 22, 2011 · 7 comments

Comments

@kimchy
Copy link
Member

kimchy commented Sep 22, 2011

Allow to stripe the data location over multiple locations. The striping is simple, placing whole files in one of the locations, and deciding where to place the file based on the location with greatest free space. Note, there is no multiple copies of the same data, in that, its similar to RAID 0. Though simple, it should provide a good solution for people that don't want to mess with raids and the like. Here is how it is configured:

   path.data: /mnt/first,/mnt/second

Or the in an array format:

   path.data: ["/mnt/first", "/mnt/second"]
@kimchy kimchy closed this as completed in 8d7aaa7 Sep 22, 2011
@medcl
Copy link
Contributor

medcl commented Sep 23, 2011

hi,@kimchy, i was wondering after set the data location to multiple locations,can i change them lately,and does these location containing the same copy?

@kimchy
Copy link
Member Author

kimchy commented Sep 23, 2011

You can change them later, but requires to restart the node. Each location does not share the same copy, its striped ala RAID 0.

@deinspanjer
Copy link

What is the expected failure mode if a disk dies or otherwise becomes inaccessible? Will ES continue to write to the remaining volumes? Will the data on the failed node be recognized and recovered by the cluster?

@arsonak47
Copy link

I configured multiple folders in my elasticsearch.yaml as -

path.data: /home/esdata/part1, /home/esdata/part2, /home/esdata/part3, /home/esdata/part4, /home/esdata/part5, /home/esdata/part6, /home/esdata/part7, /home/esdata/part8, /home/esdata/part9, /home/esdata/part10, /home/esdata/part11, /home/esdata/part12, /home/esdata/part13, /home/esdata/part14, /home/esdata/part15, /home/esdata/part16, /home/esdata/part17, /home/esdata/part18, /home/esdata/part19, /home/esdata/part20, /home/esdata/part21, /home/esdata/part22, /home/esdata/part23, /home/esdata/part24, /home/esdata/part25

After inserting huge amount of data (apparently around 7.4 GB), I checked my data directories to know the pattern. I got the following output
screenshot

I am using Elasticsearch-0.90.3. My Elasticsearch cluster has single node and my index has a single shard. Now it's clear from the screenshot that my data is unevenly distributed among directories. Is there any configuration option by which I can insure even data distribution among all the configured directories?

@aholbreich
Copy link

This functionality is quite interesting, because it can potentially improve IO throughput of ES on machines with several disks. But there is lack of documentation on this. What is the pattern of the distribution between the locations? Is one shard splited over them? Or one shard can only go to one data.path?

@antonbormotov
Copy link

antonbormotov commented Oct 20, 2017

According to v2.0 breaking changes, specific shard goes to certain data path.
Check this issue as well: #9498

@dakrone
Copy link
Member

dakrone commented Oct 20, 2017

Or one shard can only go to one data.path?

Yes that's correctly. A shard will be entirely on one data path. Multiple shards are distributed across different data paths.

williamrandolph pushed a commit to williamrandolph/elasticsearch that referenced this issue Jun 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants