You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As an operator, it would be useful to see how much time/storage/bandwidth my users are saving by enabling stargz-snapshotter. I'm excited to see the benchmarks in the README, but that might not be representative of my own images. As future improvements to the snapshotter are released and adopted, I'd also like to see whether time/storage/bandwidth savings are improving or regressing over time.
I see the state directory tracks some of this information, and that's a reasonable start. It'd be great to scrape this and emit metrics that can be more easily digested by monitoring, to make pretty graphs 📉
Real-world usage information could also be collected and fed into future optimizations, like prioritizing files that actually get fetched in production. Per-file fetch data could help users identify unnecessary bloat in their images and help even non-stargz-snapshotter users.
The text was updated successfully, but these errors were encountered:
Thanks for the suggestion.
+1 for having metrics that can be fed into other tools. Do you have any suggestion about metrics monitor and/or data formats?
Maybe we can start from data currently exposed on state directory and extend this into other information (accessed files, etc).
I don't know enough about containerd's existing metrics, whether there's anything you can reuse or piggyback off of.
I saw the [metrics] section of the containerd config docs:
[metrics] : Section to enable and configure a metrics listener. Contains two properties:
address (Default: "") Metrics endpoint does not listen by default
grpc_histogram (Default: false) Turn on or off gRPC histogram metrics
This seems to be for container-level metrics like CPU and memory usage, but maybe there's an option to extend it with other snapshotter metrics. If not, the snapshotter binary could also emit its metrics in a similar way at least, for Prometheus etc to scrape.
As an operator, it would be useful to see how much time/storage/bandwidth my users are saving by enabling stargz-snapshotter. I'm excited to see the benchmarks in the README, but that might not be representative of my own images. As future improvements to the snapshotter are released and adopted, I'd also like to see whether time/storage/bandwidth savings are improving or regressing over time.
I see the state directory tracks some of this information, and that's a reasonable start. It'd be great to scrape this and emit metrics that can be more easily digested by monitoring, to make pretty graphs 📉
Real-world usage information could also be collected and fed into future optimizations, like prioritizing files that actually get fetched in production. Per-file fetch data could help users identify unnecessary bloat in their images and help even non-stargz-snapshotter users.
The text was updated successfully, but these errors were encountered: