feat(observability): add LFS phase metrics and snapshot serve bandwidth histogram by worstell · Pull Request #318 · block/cachew

worstell · 2026-05-19T22:18:11Z

Adds two pieces of visibility motivated by staging benchmark findings:

cachew.git.snapshot_serve_bandwidth_mbps — per-request MiB/s, keyed by source and repository. Aggregate bytes/duration averages are ambiguous: cash-server cached serves averaged ~325 MiB/s while a raw curl from the same workstation saw ~588 MiB/s. A per-request distribution lets us tell whether slow clients pull the tail down vs the server itself being slow. Also stamped onto the active snapshot span as cachew.snapshot.bandwidth_mbps.
cachew.git.lfs_phase_duration_seconds and cachew.git.lfs_phase_bytes — broken out by phase (discover, clone, fetch, archive_upload). LFS-snapshot generation is the biggest server-side cost in staging (~8.7 min average), and today we only see total duration; per-phase breakdown is needed to know whether to target clone, LFS fetch, or pack/upload.

Both histograms use explicit buckets sized for the values we expect.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c8fc7857a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 265fee5f2f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…th histogram Adds two pieces of visibility motivated by staging benchmark findings: 1. cachew.git.snapshot_serve_bandwidth_mbps (per-request MiB/s, by source and repository). Aggregate bytes/duration averages are ambiguous — e.g. cash-server cached serves averaged ~325 MiB/s while a raw curl from the same workstation saw ~588 MiB/s. A per-request distribution lets us see whether slow clients pull the tail down vs the server itself being slow. 2. cachew.git.lfs_phase_duration_seconds and cachew.git.lfs_phase_bytes for LFS-snapshot generation phases (discover, clone, fetch, archive_upload). LFS snapshot generation is the biggest server-side cost in staging (~8.7 min average), and today we only see total duration; per-phase breakdown is needed to know whether to target clone, LFS fetch, or pack/upload. Also stamps cachew.snapshot.bandwidth_mbps onto the active snapshot span so trace samples carry the same value. Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019e41af-0a15-718d-a9d8-e26df6071f9b

The bandwidth histogram added in #318 topped out at 5000 MiB/s (~5.2 GB/s), so any serve from ~5 GiB/s through the cachew server NIC ceiling collapses into the +Inf bucket and we lose the signal where we most need it. Add 10000 and 15000 MiB/s buckets so cachew.git.snapshot_serve_bandwidth_mbps can distinguish 'saturating a 10 GbE workstation' from 'approaching the server NIC limit', with some headroom past the theoretical max to spot misattribution. Amp-Thread-ID: https://ampcode.com/threads/T-019e41af-0a15-718d-a9d8-e26df6071f9b Co-authored-by: Amp <amp@ampcode.com>

worstell changed the title ~~observability: add LFS phase metrics and snapshot serve bandwidth histogram~~ feat(observability): add LFS phase metrics and snapshot serve bandwidth histogram May 19, 2026

worstell force-pushed the feat/lfs-and-bandwidth-metrics branch from 078b88e to c8fc785 Compare May 19, 2026 22:19

worstell marked this pull request as ready for review May 19, 2026 22:21

worstell requested a review from a team as a code owner May 19, 2026 22:21

worstell requested review from inez and removed request for a team May 19, 2026 22:21

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

Comment thread internal/strategy/git/snapshot.go Outdated

inez approved these changes May 19, 2026

View reviewed changes

worstell force-pushed the feat/lfs-and-bandwidth-metrics branch from c8fc785 to 265fee5 Compare May 19, 2026 22:30

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

Comment thread internal/strategy/git/snapshot.go Outdated

worstell force-pushed the feat/lfs-and-bandwidth-metrics branch from 265fee5 to 77b4ac2 Compare May 19, 2026 22:40

worstell merged commit 6b1f756 into main May 19, 2026
8 checks passed

worstell deleted the feat/lfs-and-bandwidth-metrics branch May 19, 2026 22:43

worstell mentioned this pull request May 19, 2026

feat(observability): extend bandwidth buckets past 10 GbE saturation #319

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observability): add LFS phase metrics and snapshot serve bandwidth histogram#318

feat(observability): add LFS phase metrics and snapshot serve bandwidth histogram#318
worstell merged 1 commit into
mainfrom
feat/lfs-and-bandwidth-metrics

worstell commented May 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

worstell commented May 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants