Skip to content

feat(git): per-repo clone histogram in metadata DB#292

Merged
worstell merged 1 commit intomainfrom
feat-git-repo-clone-histogram
May 7, 2026
Merged

feat(git): per-repo clone histogram in metadata DB#292
worstell merged 1 commit intomainfrom
feat-git-repo-clone-histogram

Conversation

@worstell
Copy link
Copy Markdown
Contributor

@worstell worstell commented May 5, 2026

Adds RepoCounts: a daily-bucketed metadatadb.IntMap[string] keyed by <upstream-url>|<YYYY-MM-DD> that tracks per-repo clone counts. Intended to feed a future readiness gate that pre-warms popular repos before a pod starts serving traffic.

  • Counts clones only: POST /git-upload-pack whose body has no have <oid> lines (v1 or v2). Fetches, GET /info/refs, and v2 command=ls-refs are excluded — fetch volume is dominated by automation cadence (CI bots polling) rather than user demand. RequestIsClone buffers/replays the body, decodes gzip, and scans up to 64 KiB so multi-want fetches aren't misclassified.
  • Validation order: increment runs after cloneManager.GetOrCreate accepts the URL, so unauthenticated callers can't bloat the keyspace.
  • Reaper: daily, deletes buckets older than 90 days, short-circuits when empty, only logs when something was pruned.
  • Wiring: new metadatadb.NamespaceProvider, git.New/Register accept it (nil-safe), config.Load gains a setMetadataStore callback, and cmd/cachewd holds the store in an atomic.Pointer so the provider closure can resolve the "git" namespace at strategy-construction time.

@worstell
Copy link
Copy Markdown
Contributor Author

worstell commented May 5, 2026

@codex review

@worstell worstell force-pushed the feat-git-repo-clone-histogram branch from 4e73b62 to 1005703 Compare May 5, 2026 22:20
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12e84a01ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/strategy/git/git.go Outdated
@worstell worstell force-pushed the feat-git-repo-clone-histogram branch 2 times, most recently from a3d932b to ef5f4ef Compare May 5, 2026 22:53
@worstell worstell marked this pull request as ready for review May 6, 2026 16:25
@worstell worstell requested a review from a team as a code owner May 6, 2026 16:25
@worstell worstell requested review from joshfriend and removed request for a team May 6, 2026 16:25
@worstell worstell force-pushed the feat-git-repo-clone-histogram branch 2 times, most recently from ef80fc4 to 877f001 Compare May 6, 2026 16:32
@worstell worstell changed the title feat(git): persistent histogram of clone/fetch traffic per repository feat(git): per-repo clone histogram in metadata DB May 6, 2026
@worstell worstell force-pushed the feat-git-repo-clone-histogram branch 3 times, most recently from 60bdd0b to af695f6 Compare May 6, 2026 16:50
@worstell
Copy link
Copy Markdown
Contributor Author

worstell commented May 6, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

body, err := io.ReadAll(r.Body)

P1 Badge Limit upload-pack body buffering before clone detection

RequestIsClone reads the entire POST /git-upload-pack body with io.ReadAll before applying the 64 KiB inspection cap, so a large or malicious request can force unbounded memory allocation on this hot path and potentially OOM the proxy. This regression affects every upload-pack request because handleGitRequest now always calls RequestIsClone; the inspection should read only a bounded prefix (or enforce a hard max body size) while still replaying the stream.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@worstell worstell force-pushed the feat-git-repo-clone-histogram branch 3 times, most recently from 0d885c6 to 4c6d1ca Compare May 6, 2026 17:24
@worstell
Copy link
Copy Markdown
Contributor Author

worstell commented May 6, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cmd/cachewd/main.go Outdated
Comment on lines +87 to +89
var metaStore atomic.Pointer[metadatadb.Store]
metaStoreProvider := metadatadb.StoreProvider(sync.OnceValues(func() (*metadatadb.Store, error) {
s := metaStore.Load()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you refactor this to use the existing "provider function" pattern, that uses sync.OnceValues()? There are a few examples.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also shouldn't be in main, it should be in the metadatadb package

@worstell worstell force-pushed the feat-git-repo-clone-histogram branch from 4c6d1ca to 99d1af3 Compare May 6, 2026 19:48
@worstell worstell requested a review from alecthomas May 6, 2026 22:17
@worstell worstell force-pushed the feat-git-repo-clone-histogram branch from 99d1af3 to ec7ec3c Compare May 6, 2026 23:18
Adds RepoCounts: a daily-bucketed metadatadb.IntMap keyed by
"<upstream-url>|<YYYY-MM-DD>" tracking clones (POST /git-upload-pack
with no "have" lines, v1 or v2; excludes fetches, ls-refs, info/refs).
Increments after upstream URL validation. Daily reaper drops buckets
older than 90 days.

Strategies opt in via the new strategy.MetadataConsumer interface;
config.Load installs the *metadatadb.Store on each consumer after the
backend is built. Intended to feed a future warmup readiness gate.

Amp-Thread-ID: https://ampcode.com/threads/T-019dfa1e-79d3-710b-8e22-2097597de66c
Co-authored-by: Amp <amp@ampcode.com>
@worstell worstell force-pushed the feat-git-repo-clone-histogram branch from ec7ec3c to dbed637 Compare May 6, 2026 23:37
@worstell worstell merged commit b43e3a1 into main May 7, 2026
8 checks passed
@worstell worstell deleted the feat-git-repo-clone-histogram branch May 7, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants