Description
Today, a Rollup job will store its results in a single rollup index. There is currently no provision for handling jobs that generate such a large volume that they need multiple indices to scale even the rollup data.
There are a couple routes we can take... it's not clear to me what the best is. Current Rollup limitations make it tricky too.
Wait for ILM
Easiest option... wait for ILM (#29823) to be merged and then revisit this conversation. Integrating with ILM somehow will likely provide a better experience instead of baking smaller parts into Rollup.
Support external Rollover
Rollup doesn't play nicely with Rollover today because we try to create the destination rollup index (and if it exists, update the metadata). So if the user points their config at a Rollover alias, we throw an exception.
We could allow Rollup to point at aliases, which I think would let the user manually Rollover indices. There are some tricky bits to this though. Because Rollup uses deterministic IDs for disaster recovery after a checkpoint, the user would have to make sure a checkpoint has been fully committed before rolling over:
- Stop the job, wait for it to checkpoint and finish
- Rollover the index
- Re-enable the job
It's not terrible, but not super user-friendly either.
Internally support Rollover
We could instead implement the Rollover functionality in Rollup. It'd be essentially the same thing, same procedure, just handled by Rollup. Probably as another config option, and we just check the Rollover criteria when checkpointing or something.
Destination date math/patterns
We could implement something like:
"index_pattern": "logstash-*"
"rollup_index": "logstash-%{+YYYY.MM.dd}",
Which would dynamically create destination indices according to the timestamp of the rollup document. Unlike Rollover, we don't have to worry about backtracking and replaying documents because docs will deterministically land in their destination index too.
This does complicate job creation a little bit, since indices are generated on-demand instead of up-front. Meaning we'd need to find a way to enrich those indices with metadata after it is generated dynamically
Big issue related to all approaches
The major problem with all of these approaches is that Rollup doesn't allow more than one rollup index in a RollupSearch API. This was mainly to limit complexity internally rather than a hard limit. And I think the restriction is less important now that missing_bucket
is implemented.
I think we could loosen this restriction as long as all indices involved in the search share the exact same set of rollup jobs, that way we don't have to worry about weird mixed-job incompatibilities.