Describe the solution you'd like
The typical use case of an ingester shutdown is during a rolling update. We currently close TSDB and, at subsequent startup, we replay the WAL before the ingester is ready. Replaying the WAL is slow and we recently found out that compacting the head and shipping it to the storage on /shutdown is actually faster than replaying the WAL.
Idea: what if we always compact TSDB head and ship it to the storage at shutdown?
Question:
- If we compact TSDB head (up until head max time) on shutdown, what's the last checkpoint created and what's actually replayed from WAL at startup?
Pros:
- The ingesters rollout may be faster
- The scale down wouldn't be a snowflake operation anymore (currently it requires calling
/shutdown API beforehand)
Cons (potential blockers):
- At ingester startup, can the ingester ingests samples with timestamp < the last ingested samples before shutting down?
Let's discuss it.
Describe the solution you'd like
The typical use case of an ingester shutdown is during a rolling update. We currently close TSDB and, at subsequent startup, we replay the WAL before the ingester is ready. Replaying the WAL is slow and we recently found out that compacting the head and shipping it to the storage on
/shutdownis actually faster than replaying the WAL.Idea: what if we always compact TSDB head and ship it to the storage at shutdown?
Question:
Pros:
/shutdownAPI beforehand)Cons (potential blockers):
Let's discuss it.