Auto open/close index functionality #10869

markwalkom · 2015-04-29T07:25:41Z

For time series users, most of the queries happen within the relatively close past, 24-48 hours, up to a week or a month. However retention requirements mean that this data may need to be kept around for months to years and currently we recommend that people use a hot/cold setup with shard allocation filtering.

Other (additional) options include closing indices but persisting them to disk to reduce the impact to heap.

The last choice is definitely viable, but Elasticsearch is relatively ignorant around this, from a user end point; An admin will need to open an index, allow the end-user to query, and then close when, to ensure best resource use for their infrastructure. None of that is automatic.

It'd be handy for these use cases if we had configurable functionality that allows an admin to set (eg);

Allow closed indices to be opened automatically
Set a limit on number of indices that were previously closed to be opened. eg indexes only allow Y closed indices to be opened at any one time, to ensure resources are kept under any limit.
Time frame from when closed index was opened to when it will be re-closed as a;
- Hard time. eg 1 hour from open.
- Time from last use. eg 1 hour from last query seen.
Automatically close the index after period above.

There are a few things to be careful of here, particularly around stability;

Setting re-opened limit too high
Opening a large number of indices at a single time

There's going to be more to it than this but I've seen this sort of thing mentioned a few times in the community and I think it'd be a good feature to have for larger installs using time-series data.

ashpynov · 2015-05-05T14:31:42Z

I'm one of person requested it.
One more suggestion - open indices only by direct query only for query time (time need only to search and fetch data from indices). May be have some kind of LRU cache of "semi-opened" indices.

for information: I've 3.4TB stored data on each node for 1 week, it request 8.6 GB Lucene memory. But i have to store and provide on demand query up to 1 year long ago data. And it will requery 447.2GB heap, that looks unreasonable even with g1gc garbage collector (we used now).

clintongormley · 2015-05-08T10:09:00Z

Hi @markwalkom and @ashpynov

We talked about this on our FixItFriday call today. While the feature sounds appealing, I think there are lots of gotchas that are not immediately apparent, and would result in adding a huge amount of complexity in order to try to support widely differing policies, eg:

opening an index can be a heavy task, especially if the translog needs replaying
the query might result in lots of fielddata being loaded into memory, which could also take a significant amount of time and heap
what happens if you query 10 closed indices, but your auto-open limit is 5?
what happens if you query the first 5, then query the next 5? Does the user have to wait for (eg) 1 hour before they can run the second query?
what happens if the 5 indices you open use up more than the available heap?
what happens when multiple people query different closed indices at the same time?

Today we provide a simple open and close API, which allows the administrator to decide on policies, and to build an interface to implement those policies. I think this is the correct approach - complexity is handled on a case by case basis, rather than Elasticsearch trying to cater for every need.

@ashpynov you mention that each of your indices requires about 9GB of heap. Presumably you're using fielddata rather than doc values. You can switch to using docvalues today and in v2 we'll be switching them on by default.

ashpynov · 2015-05-08T10:13:48Z

@clintongormley no, we use docvalues already, otherwise it will take up to 2% of index size (x10 more).

ashpynov · 2015-10-07T08:52:32Z

Hi, @clintongormley @markwalkom
We started some prototyping of such feature based on version 1.7. Having such decission on policies:

freezed index are READ ONLY, so no translog. Opening index time is about 0,5% index size readtime from HDD about 12 sec on 500GB index (16 HDD 7200 rpm at RAID6) its very acceptable.
of course field data loading is problem even on "hot" indices and it more or less protected by breaker.
in case of indices count on limit - breaker can be solution. The amout of loaded cold indices limited by seach thread pool size and amount of indexes per query

The open/close functionality is very hard to implement due to cluster architecture:

we have to proxy query and decide - need to open index or not
we have to proxy query trough single node/
we need to know where "involved" index shards is located, and how many indices at that node is already open.

But most complexity is

Index opening make cluster RED at open time. So in our case cluster will be red half time or even always.
In case on searching in several day indices we need to place result reducing logic on client side

makeyang · 2016-06-21T10:19:21Z

why is it closed? u guys already made up the decision?

clintongormley · 2016-06-21T10:20:39Z

Yes

ashpynov · 2016-06-21T11:40:25Z

We did it over version 1.7.3. In our use case (store data per daily basis for 2 years, but common search interval about 1 month) it reduced memory usage from 256GB to 32GB per server. SSD cache on RAID storage also reduce penalty of loading/unloading data during query.
But we did not use it in production due to changing search engine to our custom and migration to ES 2.3 in specific search cases.
Also the code was developed by C++ coders so we are ashamed to publish it even here.

markwalkom added >enhancement discuss labels Apr 29, 2015

clintongormley closed this as completed May 8, 2015

clintongormley removed the discuss label May 8, 2015

clintongormley mentioned this issue Jun 21, 2016

[feature request] close inactive index and when it is visited first time, open it #18990

Closed

trevan mentioned this issue Mar 11, 2017

Allow marking an index as cold and moving its in-memory items to disk #23546

Closed

m31collision mentioned this issue Apr 22, 2017

Unloadable indices feature #24269

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto open/close index functionality #10869

Auto open/close index functionality #10869

markwalkom commented Apr 29, 2015

ashpynov commented May 5, 2015

clintongormley commented May 8, 2015

ashpynov commented May 8, 2015

ashpynov commented Oct 7, 2015

makeyang commented Jun 21, 2016

clintongormley commented Jun 21, 2016

ashpynov commented Jun 21, 2016

Auto open/close index functionality #10869

Auto open/close index functionality #10869

Comments

markwalkom commented Apr 29, 2015

ashpynov commented May 5, 2015

clintongormley commented May 8, 2015

ashpynov commented May 8, 2015

ashpynov commented Oct 7, 2015

makeyang commented Jun 21, 2016

clintongormley commented Jun 21, 2016

ashpynov commented Jun 21, 2016