Description
Is your proposal related to a problem?
On the Prometheus Operator project, we have implemented the ability to autoscale Prometheus shards when running Prometheus using the agent mode. This might be achieved using the native HPA or any tool like Keda.
For the Prometheus Server, we are still working to make this feature available and a lot of discussion is ongoing yet.
For users using Prometheus Server + Thanos Sidecar, I'd like to provide a graceful shutdown by flushing and uploading blocks before shutdown of the Prometheus Server.
I've added the ability to upload blocks to the blob storage using Thanos Tools, we can leverage this on Prometheus Operator by doing some hacks such as a preStop hook to flush tsdb and upload blocks. As we can see we already have feature requests from users implementing this on their own, as you can see here.
This works as expected but it's a bit hacky and I'd like to propose a new feature to the Thanos Sidecar.
Describe the solution you'd like
A new endpoint on the Thanos Sidecar e.g. /flush
, /shutdown
, or /snapshot
(naming is hard 😅).
This brand-new endpoint should execute the following logic when invoked.
- Invoke Prometheus TSDB Snapshot Endpoint
- Upload Snapshot Blocks to the blob storage
- Delete the Snapshot dir after upload (I'm not sure about it)
This new endpoint is only providing an easier experience to graceful shutdown Prometheus Server when some of the following operations are required.
- Scaling Down Events
- Users Decreasing Disk Size
- Prometheus Migration from one place to another. (e.g namespace migration)
By having this new endpoint I envision Prometheus Operator calling it before scaling down the instance, and after this endpoint returns success we can just delete the Statefulset and its Persistent volume.
Describe alternatives you've considered
N/A