Skip to content

Compacting TSDB head on every ingester shutdown to speed up startup #3723

@pracucci

Description

@pracucci

Describe the solution you'd like
The typical use case of an ingester shutdown is during a rolling update. We currently close TSDB and, at subsequent startup, we replay the WAL before the ingester is ready. Replaying the WAL is slow and we recently found out that compacting the head and shipping it to the storage on /shutdown is actually faster than replaying the WAL.

Idea: what if we always compact TSDB head and ship it to the storage at shutdown?

Question:

  • If we compact TSDB head (up until head max time) on shutdown, what's the last checkpoint created and what's actually replayed from WAL at startup?

Pros:

  • The ingesters rollout may be faster
  • The scale down wouldn't be a snowflake operation anymore (currently it requires calling /shutdown API beforehand)

Cons (potential blockers):

  • At ingester startup, can the ingester ingests samples with timestamp < the last ingested samples before shutting down?

Let's discuss it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions