New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto open/close index functionality #10869
Comments
I'm one of person requested it. for information: I've 3.4TB stored data on each node for 1 week, it request 8.6 GB Lucene memory. But i have to store and provide on demand query up to 1 year long ago data. And it will requery 447.2GB heap, that looks unreasonable even with g1gc garbage collector (we used now). |
Hi @markwalkom and @ashpynov We talked about this on our FixItFriday call today. While the feature sounds appealing, I think there are lots of gotchas that are not immediately apparent, and would result in adding a huge amount of complexity in order to try to support widely differing policies, eg:
Today we provide a simple open and close API, which allows the administrator to decide on policies, and to build an interface to implement those policies. I think this is the correct approach - complexity is handled on a case by case basis, rather than Elasticsearch trying to cater for every need. @ashpynov you mention that each of your indices requires about 9GB of heap. Presumably you're using fielddata rather than doc values. You can switch to using docvalues today and in v2 we'll be switching them on by default. |
@clintongormley no, we use docvalues already, otherwise it will take up to 2% of index size (x10 more). |
Hi, @clintongormley @markwalkom
The open/close functionality is very hard to implement due to cluster architecture:
But most complexity is
|
why is it closed? u guys already made up the decision? |
Yes |
We did it over version 1.7.3. In our use case (store data per daily basis for 2 years, but common search interval about 1 month) it reduced memory usage from 256GB to 32GB per server. SSD cache on RAID storage also reduce penalty of loading/unloading data during query. |
For time series users, most of the queries happen within the relatively close past, 24-48 hours, up to a week or a month. However retention requirements mean that this data may need to be kept around for months to years and currently we recommend that people use a hot/cold setup with shard allocation filtering.
Other (additional) options include closing indices but persisting them to disk to reduce the impact to heap.
The last choice is definitely viable, but Elasticsearch is relatively ignorant around this, from a user end point; An admin will need to open an index, allow the end-user to query, and then close when, to ensure best resource use for their infrastructure. None of that is automatic.
It'd be handy for these use cases if we had configurable functionality that allows an admin to set (eg);
There are a few things to be careful of here, particularly around stability;
There's going to be more to it than this but I've seen this sort of thing mentioned a few times in the community and I think it'd be a good feature to have for larger installs using time-series data.
The text was updated successfully, but these errors were encountered: