feat(node): iCaptcha-aware repo propagation gate with quarantine#125
Conversation
|
Thanks for the contribution. A couple of things will help us review this faster:
See CONTRIBUTING.md. Update the PR and these notes will clear automatically. |
📝 WalkthroughWalkthroughAdds an iCaptcha propagation quarantine system for mirrored repos. Proofs are now persisted at repo creation and fork, exposed via a new ChangesiCaptcha Propagation Quarantine
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 1543-1558: The get_icaptcha_proof endpoint is returning a raw repo
creation proof that can be reused by any whole-repo reader, so mirror admission
should not rely on this unbound token. Update get_icaptcha_proof and the
downstream admission flow in crate::icaptcha::admit_mirror so the propagated
proof is bound to repo identity/content or made peer/target-specific before
acceptance, and ensure the returned value cannot be replayed across origins.
- Around line 500-504: The `git_receive_pack` POST route is missing the
quarantine gate that already exists in the clone/fetch path, so quarantined
mirrors can still accept pushes. Add the same
`state.db.is_repo_quarantined(&record.id).await?` check in `git_receive_pack`
before any receive-pack handling, and return `AppError::RepoNotFound` for
quarantined repos just like the existing repository lookup flow in `repos.rs`.
In `@crates/gitlawb-node/src/db/mod.rs`:
- Around line 864-871: The repo admission logic in the insert/upsert flow
currently keys `quarantined` off disk existence, which can wrongly admit a repo
when the Git directory exists but the DB row is still missing. Update the
sync/admission path around the `INSERT INTO repos ... ON CONFLICT` flow so
mirror admission is driven by absence of the exact mirror DB row, not by
`local_path.exists()`, and ensure the `quarantined` decision is recomputed from
DB presence before calling the repo insert/update logic.
- Around line 803-806: The mirror replay JTI is being stored with the original
propagation proof expiry, so already-expired proofs can be swept immediately and
replayed after cleanup. Update the mirror admission flow around `admit_mirror`
and the proof ledger in `db::mod` to persist replay JTIs with a durable
retention expiry (or a separate propagation replay ledger) instead of
`claims.exp`, and make `sweep_expired_proofs` only remove entries once that
retention window truly expires.
In `@crates/gitlawb-node/src/icaptcha.rs`:
- Line 421: The replay-guard entry for consumed proofs is being removed too
early, allowing the same proof to be reused after expiration. Update the proof
lifecycle in `admit_mirror`, `consume_proof_jti`, and `sweep_expired_proofs()`
so consumed JTI records remain in the replay ledger longer than `claims.exp` and
cannot be admitted again on the same node after cleanup. Make the cleanup
retention window explicit and ensure the sweep logic only removes entries once
the replay-protection period has safely passed.
In `@crates/gitlawb-node/src/sync.rs`:
- Around line 284-299: The admission check in sync logic is using the local
clone existence as a proxy for mirror approval, which can bypass quarantine on
retries. Update the admission flow in the sync path around the quarantined
calculation to base the decision on DB state or a persisted mirror record rather
than the on-disk clone, so `crate::icaptcha::admit_mirror` is still consulted
when a repo exists locally but was never successfully upserted. Preserve the
existing `warn!` quarantine path and ensure the `upsert_mirror_repo`/mirror
insert path records the correct quarantined state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3ebef415-8506-41f2-9e5b-4a24c9dd991f
📒 Files selected for processing (7)
crates/gitlawb-node/src/api/mod.rscrates/gitlawb-node/src/api/repos.rscrates/gitlawb-node/src/db/mod.rscrates/gitlawb-node/src/icaptcha.rscrates/gitlawb-node/src/server.rscrates/gitlawb-node/src/sync.rscrates/gitlawb-node/src/test_support.rs
| // A quarantined mirror is served to no one (clone or push advertisement) — | ||
| // hidden as repo-not-found until an operator releases it. | ||
| if state.db.is_repo_quarantined(&record.id).await? { | ||
| return Err(AppError::RepoNotFound(format!("{owner}/{name}"))); | ||
| } |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -nP 'pub async fn git_receive_pack|is_repo_quarantined|git-receive-pack' crates/gitlawb-node/src/api/repos.rs crates/gitlawb-node/src/server.rsRepository: Gitlawb/node
Length of output: 1055
🏁 Script executed:
sed -n '540,840p' crates/gitlawb-node/src/api/repos.rsRepository: Gitlawb/node
Length of output: 11740
🏁 Script executed:
sed -n '150,220p' crates/gitlawb-node/src/server.rsRepository: Gitlawb/node
Length of output: 2953
🏁 Script executed:
sed -n '556,576p' crates/gitlawb-node/src/api/repos.rsRepository: Gitlawb/node
Length of output: 1017
Add the quarantine gate to git_receive_pack. The direct /{owner}/{repo}/git-receive-pack POST route still lacks an is_repo_quarantined check, so quarantined mirrors can accept pushes even though they’re hidden from clone/fetch.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/api/repos.rs` around lines 500 - 504, The
`git_receive_pack` POST route is missing the quarantine gate that already exists
in the clone/fetch path, so quarantined mirrors can still accept pushes. Add the
same `state.db.is_repo_quarantined(&record.id).await?` check in
`git_receive_pack` before any receive-pack handling, and return
`AppError::RepoNotFound` for quarantined repos just like the existing repository
lookup flow in `repos.rs`.
| /// GET /api/v1/repos/{owner}/{repo}/icaptcha-proof | ||
| /// | ||
| /// Returns the iCaptcha proof token this repo was created with (`null` if none). | ||
| /// A peer mirroring this repo fetches it and re-verifies it offline before | ||
| /// admitting the mirror (see [`crate::icaptcha::admit_mirror`]). Not owner-gated, | ||
| /// but gated on whole-repo `"/"` read like the other replication endpoints, so a | ||
| /// private repo's proof is never disclosed. | ||
| pub async fn get_icaptcha_proof( | ||
| State(state): State<AppState>, | ||
| auth: Option<Extension<AuthenticatedDid>>, | ||
| Path((owner, repo)): Path<(String, String)>, | ||
| ) -> Result<Json<serde_json::Value>> { | ||
| let caller = auth.as_ref().map(|e| e.0 .0.as_str()); | ||
| let (record, _rules) = | ||
| crate::api::authorize_repo_read(&state, &owner, &repo, caller, "/").await?; | ||
| let proof = state.db.get_repo_proof_token(&record.id).await?; |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift
Don’t expose an unbound bearer proof for mirror admission.
This returns the raw creation proof to any whole-repo reader, while mirror admission validates the token against the owner binding only. For public repos, any reader can reuse that token from another origin or spend it first on a target node. Bind the propagated proof to repo identity/content or make it peer/target-specific before accepting it for admission.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/api/repos.rs` around lines 1543 - 1558, The
get_icaptcha_proof endpoint is returning a raw repo creation proof that can be
reused by any whole-repo reader, so mirror admission should not rely on this
unbound token. Update get_icaptcha_proof and the downstream admission flow in
crate::icaptcha::admit_mirror so the propagated proof is bound to repo
identity/content or made peer/target-specific before acceptance, and ensure the
returned value cannot be replayed across origins.
| // The iCaptcha proof presented at repo creation, kept so it can travel | ||
| // with the repo when it propagates to peers. A mirroring node that | ||
| // enforces iCaptcha re-verifies this token offline before admitting the | ||
| // mirror (see `icaptcha::admit_mirror`). One row per repo (its creation |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift
Persist mirror replay JTIs beyond the proof exp.
admit_mirror accepts expired proofs, but it records claims.exp in the same ledger swept by sweep_expired_proofs. For already-expired propagation proofs, the consumed JTI is immediately sweepable, so the same proof can admit another mirror on the same node after cleanup. Use a separate propagation replay ledger or store a durable mirror-retention expiry instead of the original proof expiry.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/db/mod.rs` around lines 803 - 806, The mirror replay
JTI is being stored with the original propagation proof expiry, so
already-expired proofs can be swept immediately and replayed after cleanup.
Update the mirror admission flow around `admit_mirror` and the proof ledger in
`db::mod` to persist replay JTIs with a durable retention expiry (or a separate
propagation replay ledger) instead of `claims.exp`, and make
`sweep_expired_proofs` only remove entries once that retention window truly
expires.
| // `quarantined` is set only on first insert (the admission decision). | ||
| // A re-sync (ON CONFLICT) preserves the existing flag — admission runs | ||
| // once, and an operator's later release must not be silently reverted. | ||
| sqlx::query( | ||
| "INSERT INTO repos (id, name, owner_did, description, is_public, default_branch, | ||
| created_at, updated_at, disk_path, machine_id) | ||
| VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) | ||
| created_at, updated_at, disk_path, machine_id, quarantined) | ||
| VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) | ||
| ON CONFLICT (id) DO UPDATE SET updated_at = $8, disk_path = $9, machine_id = $10", |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Base first-insert admission on DB presence, not disk presence.
The sync caller derives quarantined=false whenever local_path.exists(). If a prior clone left the repo on disk but failed before inserting the DB row, this INSERT path admits it unquarantined because there is no conflict to preserve. Run mirror admission whenever the exact mirror DB row is absent, separately from whether the Git directory already exists.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/db/mod.rs` around lines 864 - 871, The repo admission
logic in the insert/upsert flow currently keys `quarantined` off disk existence,
which can wrongly admit a repo when the Git directory exists but the DB row is
still missing. Update the sync/admission path around the `INSERT INTO repos ...
ON CONFLICT` flow so mirror admission is driven by absence of the exact mirror
DB row, not by `local_path.exists()`, and ensure the `quarantined` decision is
recomputed from DB presence before calling the repo insert/update logic.
| // later switch to enforce still sees the jti as fresh. | ||
| return MirrorAdmission::Admit; | ||
| } | ||
| match db.consume_proof_jti(&claims.jti, claims.exp).await { |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect whether consume_proof_jti reserves already-expired JTIs or treats exp as an immediate TTL.
rg -n -C 10 'consume_proof_jti\s*\(' crates/gitlawb-node/src
# Look for tests covering expired propagation proof replay behavior.
rg -n -C 6 'admit_mirror|expired proof|proof_jti|icaptcha.*jti|already used' crates/gitlawb-node/srcRepository: Gitlawb/node
Length of output: 28343
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the sweep implementation and any scheduled callers/cleanup references.
rg -n -C 4 'sweep_expired_proofs|icaptcha_consumed_proofs|consume_proof_jti\(|cleanup.*proof|sweep.*proof' crates/gitlawb-node/src
# Read the sweep implementation and nearby docs/comments.
sed -n '1128,1180p' crates/gitlawb-node/src/db/mod.rs
sed -n '3736,3765p' crates/gitlawb-node/src/db/mod.rsRepository: Gitlawb/node
Length of output: 13893
Keep replay ledger entries longer than exp
admit_mirror intentionally accepts expired proofs, but sweep_expired_proofs() deletes the consumed jti as soon as exp passes. After cleanup, the same proof token can be replayed again on this node and admit another mirror.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/icaptcha.rs` at line 421, The replay-guard entry for
consumed proofs is being removed too early, allowing the same proof to be reused
after expiration. Update the proof lifecycle in `admit_mirror`,
`consume_proof_jti`, and `sweep_expired_proofs()` so consumed JTI records remain
in the replay ledger longer than `claims.exp` and cannot be admitted again on
the same node after cleanup. Make the cleanup retention window explicit and
ensure the sweep logic only removes entries once the replay-protection period
has safely passed.
| let quarantined = if !exists { | ||
| let proof = | ||
| fetch_icaptcha_proof(client, &origin_url, owner_short, repo_name).await; | ||
| match crate::icaptcha::admit_mirror(db, proof.as_deref(), owner_short).await { | ||
| crate::icaptcha::MirrorAdmission::Admit => false, | ||
| crate::icaptcha::MirrorAdmission::Quarantine(reason) => { | ||
| warn!( | ||
| repo = %item.repo, origin = %origin_url, reason, | ||
| "quarantining mirrored repo: failed iCaptcha propagation gate" | ||
| ); | ||
| true | ||
| } | ||
| } | ||
| } else { | ||
| false | ||
| }; |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Base admission on DB state, not the on-disk clone.
Line 284 treats any existing local path as an already-admitted mirror. If a prior sync cloned the repo but crashed or failed before upsert_mirror_repo, the retry will skip admit_mirror and insert the repo with quarantined = false, bypassing enforce mode.
Suggested direction
+ let known_repo = match db.get_repo(owner_short, repo_name).await {
+ Ok(Some(_)) => true,
+ Ok(None) => false,
+ Err(e) => {
+ warn!(repo = %item.repo, err = %e, "repo lookup failed before admission gate");
+ let _ = db.mark_sync_failed(&item.id).await;
+ continue;
+ }
+ };
+
- let quarantined = if !exists {
+ let quarantined = if !known_repo {
let proof =
fetch_icaptcha_proof(client, &origin_url, owner_short, repo_name).await;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let quarantined = if !exists { | |
| let proof = | |
| fetch_icaptcha_proof(client, &origin_url, owner_short, repo_name).await; | |
| match crate::icaptcha::admit_mirror(db, proof.as_deref(), owner_short).await { | |
| crate::icaptcha::MirrorAdmission::Admit => false, | |
| crate::icaptcha::MirrorAdmission::Quarantine(reason) => { | |
| warn!( | |
| repo = %item.repo, origin = %origin_url, reason, | |
| "quarantining mirrored repo: failed iCaptcha propagation gate" | |
| ); | |
| true | |
| } | |
| } | |
| } else { | |
| false | |
| }; | |
| let known_repo = match db.get_repo(owner_short, repo_name).await { | |
| Ok(Some(_)) => true, | |
| Ok(None) => false, | |
| Err(e) => { | |
| warn!(repo = %item.repo, err = %e, "repo lookup failed before admission gate"); | |
| let _ = db.mark_sync_failed(&item.id).await; | |
| continue; | |
| } | |
| }; | |
| let quarantined = if !known_repo { | |
| let proof = | |
| fetch_icaptcha_proof(client, &origin_url, owner_short, repo_name).await; | |
| match crate::icaptcha::admit_mirror(db, proof.as_deref(), owner_short).await { | |
| crate::icaptcha::MirrorAdmission::Admit => false, | |
| crate::icaptcha::MirrorAdmission::Quarantine(reason) => { | |
| warn!( | |
| repo = %item.repo, origin = %origin_url, reason, | |
| "quarantining mirrored repo: failed iCaptcha propagation gate" | |
| ); | |
| true | |
| } | |
| } | |
| } else { | |
| false | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/sync.rs` around lines 284 - 299, The admission check
in sync logic is using the local clone existence as a proxy for mirror approval,
which can bypass quarantine on retries. Update the admission flow in the sync
path around the quarantined calculation to base the decision on DB state or a
persisted mirror record rather than the on-disk clone, so
`crate::icaptcha::admit_mirror` is still consulted when a repo exists locally
but was never successfully upserted. Preserve the existing `warn!` quarantine
path and ensure the `upsert_mirror_repo`/mirror insert path records the correct
quarantined state.
Phase 2 of the iCaptcha work: extend the proof-of-intelligence gate from
direct writes to federated propagation, so a node that enforces iCaptcha
can defend against spam repos that originate on a node which does not.
The proof now travels with the repo and is re-verified at mirror time:
- Capture & persist: create_repo/fork_repo store the verified proof
(raw token + claims) against the new repo. New repo_icaptcha_proofs table.
- Serve: GET /api/v1/repos/{owner}/{repo}/icaptcha-proof returns the stored
token, gated on whole-repo "/" read like the other replication endpoints.
- Admit at mirror time: the sync worker fetches the origin's proof and
icaptcha::admit_mirror re-verifies it offline — signature + level +
owner-DID binding, but NOT expiry — and consumes the jti in the local
ledger so one solved challenge admits at most one mirror per node.
- Quarantine: in enforce mode a mirror that fails the gate is kept on disk
but hidden from clone/serve/push (authorize_repo_read + the git serve and
receive-pack handlers) and from listings (dedup CTE + federated list), via
a new repos.quarantined column. Shadow logs only; off admits normally.
Fail-safe: off/shadow/no-key always admit.
Hardening from review:
- git-receive-pack also rejects quarantined mirrors (no writes while hidden).
- mirror admission keys "first-time" on DB-row absence, not on-disk presence,
so a clone that crashed before the DB insert is still gated on retry.
- mirror jti is spent against a fixed forward retention window, not the
proof's already-past exp, so the replay record isn't swept early.
Known limitation: an iCaptcha proof binds to the owner DID, not the repo, so
a malicious origin can get one spam repo per harvested public proof admitted
(per-node jti single-use bounds the amplification). Fully binding a proof to
a repo needs a repo-scoped iCaptcha challenge (protocol change, future work).
The operator release HTTP endpoint is deferred (no admin-auth surface yet);
DB methods are in place.
Tested: verify_token expiry-skip + signature/level/DID enforcement on the
propagation path; proof round-trip; quarantine set/release, preservation
across re-sync, exclusion from listings; mirror jti retention survives a
sweep at the proof's exp. Full workspace green against Postgres.
e59aa10 to
f3feafe
Compare
Phase 2 of the iCaptcha work: extend the proof-of-intelligence gate from direct writes to federated propagation, so a node that enforces iCaptcha can defend against spam repos that originate on a node which does not.
The proof now travels with the repo and is re-verified at mirror time:
Capture & persist: create_repo/fork_repo store the verified proof (raw token + claims) against the new repo so it can propagate. A new repo_icaptcha_proofs side table holds one proof per repo.
Serve: GET /api/v1/repos/{owner}/{repo}/icaptcha-proof returns the stored token, gated on whole-repo "/" read like the other replication endpoints so a private repo's proof is never disclosed.
Admit at mirror time: on a first-time mirror the sync worker fetches the origin's proof and icaptcha::admit_mirror re-verifies it offline — signature + level + owner-DID binding, but NOT expiry (proofs are short-lived; the origin enforced freshness at creation) — and consumes the jti in the local ledger, so one solved challenge admits at most one mirror per node (a malicious origin can't reuse one proof for many spam repos).
Quarantine: in enforce mode a mirror that fails the gate is kept on disk but hidden from clone/serve (authorize_repo_read + both git serve handlers) and from listings (dedup CTE + federated list), via a new repos.quarantined column. Shadow logs only and writes nothing; off admits normally. Fail-safe: off/shadow/no-key always admit.
Internals: split verify_token (raw token, optional exp check) out of the header path; ProofGuard::consume now returns the VerifiedProof for persistence. DB gains record_repo_proof/get_repo_proof_token, is_repo_quarantined, and (for the deferred operator release surface) set_repo_quarantine/list_quarantined_repo_ids.
Tested: verify_token expiry-skip + signature/level/DID enforcement on the propagation path; proof round-trip; quarantine set/release, preservation across re-sync, and exclusion from listings. Full workspace green against Postgres.
Known tradeoff: a legit repo from a non-iCaptcha (or shadow) origin has no proof and is quarantined by an enforce node — recoverable via operator release. The operator release HTTP endpoint is deferred (no admin-auth surface exists yet); DB methods are in place.
Summary by CodeRabbit
New Features
Bug Fixes