fix(node): gate GET /ipfs/{cid} on per-caller path-scoped visibility (#110)#128
Conversation
…110) get_by_cid served any git object by raw SHA-256 with no auth and no visibility check. For a public repo with a path-scoped deny rule, an unauthenticated caller could recover a withheld blob's oid from the served trees, derive its CID, and fetch the cleartext here, bypassing the pack-path withholding. Bind optional caller identity (optional_signature on a dedicated /ipfs/{cid} sub-router; the pins route stays unsigned, tracked by #121) and gate each iterated repo row against its OWN rules: visibility_check at "/" denies the repo, and withheld_blob_oids skips a blob withheld from the caller when the row carries path-scoped rules. The row is gated directly rather than via authorize_repo_read, whose fuzzy get_repo re-resolve could authorize a different physical row than the one read. The per-repo withheld set is memoized on repo.id within the request. Adds get_by_cid to the authz_guard drift table (visibility_check marker). Scope: this closes the direct unauthenticated scan. A stale-public mirror row still serves withheld content on both this path and the pack path; that pre-existing leak is tracked separately in #124.
…in get_by_cid Code-review follow-ups for #110: - Replace the per-repo list_visibility_rules call with a single list_visibility_rules_for_repos batch query (the existing #97 pattern), removing the N+1 on an unauthenticated scan and the "one repo's DB error 500s the whole request" semantics. - Fail closed on a spawn_blocking JoinError (walk task panic) by skipping the repo, matching the walk-error arm, instead of 500-ing the whole scan. - Add a regression test for the repo-level "/" deny branch (private repo, anon denied / owner served) and note the access gate in the module doc.
Induce a withheld_blob_oids walk error (a ref pointing at a non-tree-ish blob, same technique as visibility_pack::fails_closed_when_a_ref_cannot_be_traversed) and assert the handler skips the whole repo: the withheld blob 404s with no leak, and the public blob in the same repo also 404s (proving fail-closed-skip rather than serve-on-error). Verified load-bearing: mutating the Ok(Err) arm to treat the error as no-withholding makes it RED (secret served 200).
📝 WalkthroughWalkthroughThe IPFS CID Visibility Gating
Sequence Diagram(s)sequenceDiagram
participant Caller
participant get_by_cid
participant DB
participant spawn_blocking
participant ObjectStore
Caller->>get_by_cid: GET /ipfs/{cid} (optional DID via optional_signature)
get_by_cid->>DB: list_visibility_rules_for_repos
DB-->>get_by_cid: per-repo rules map
loop each repo
get_by_cid->>get_by_cid: visibility_check(caller, repo "/" rules)
alt repo denied
get_by_cid->>get_by_cid: skip repo
else path-scoped rules present
get_by_cid->>spawn_blocking: withheld_blob_oids(repo, caller)
spawn_blocking-->>get_by_cid: withheld SHA-256 hex set (memoized)
alt cid hex in withheld set
get_by_cid->>get_by_cid: skip object
else
get_by_cid->>ObjectStore: read_object(sha)
ObjectStore-->>get_by_cid: bytes
get_by_cid-->>Caller: 200 raw bytes
end
else no path-scoped rules
get_by_cid->>ObjectStore: read_object(sha)
ObjectStore-->>get_by_cid: bytes
get_by_cid-->>Caller: 200 raw bytes
end
end
get_by_cid-->>Caller: 404 not found
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/gitlawb-node/src/api/mod.rs (1)
190-192: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick winGuard the “must not use
authorize_repo_read” invariant too.This row only proves
visibility_check(is present. A futureget_by_cidchange could addauthorize_repo_read(and still pass, despite the KTD2a comment.Suggested guard
for (src, func, marker) in rows { let body = fn_body(src, func); assert!( body.contains(marker), "handler `{func}` is missing its gate marker `{marker}` — gate removed or route reclassified" ); + if func == "get_by_cid" { + assert!( + !body.contains("authorize_repo_read("), + "get_by_cid must gate each iterated row directly, not via authorize_repo_read" + ); + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/mod.rs` around lines 190 - 192, The get_by_cid invariant check only verifies visibility_check( is present, so it can miss a regression where authorize_repo_read( is added later. Update the guard in mod.rs for the get_by_cid row to assert both that visibility_check( remains present and that authorize_repo_read( is not used, using the get_by_cid entry in the existing test/scan logic so the KTD2a requirement is enforced directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/api/ipfs.rs`:
- Around line 109-145: Move the cheap object-presence check in the IPFS read
flow before the expensive withheld-history scan so random/absent CIDs do not
trigger `withheld_blob_oids` across repos. In the
`crates/gitlawb-node/src/api/ipfs.rs` read path, use the existing
`store::read_object`/CID presence logic first, then only run the path-scoped
withheld filter for repos that actually contain the object, and keep the final
bytes-returning path gated by that withheld result.
---
Nitpick comments:
In `@crates/gitlawb-node/src/api/mod.rs`:
- Around line 190-192: The get_by_cid invariant check only verifies
visibility_check( is present, so it can miss a regression where
authorize_repo_read( is added later. Update the guard in mod.rs for the
get_by_cid row to assert both that visibility_check( remains present and that
authorize_repo_read( is not used, using the get_by_cid entry in the existing
test/scan logic so the KTD2a requirement is enforced directly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 93a39896-85ab-4030-b16c-a97ff4808443
📒 Files selected for processing (4)
crates/gitlawb-node/src/api/ipfs.rscrates/gitlawb-node/src/api/mod.rscrates/gitlawb-node/src/server.rscrates/gitlawb-node/src/test_support.rs
| // Per-blob withholding only applies when a path-scoped rule exists (KTD4). | ||
| if has_path_scoped_rule(rules) { | ||
| if !withheld_memo.contains_key(&repo.id) { | ||
| let rp = repo_path.clone(); | ||
| let r = rules.to_vec(); | ||
| let is_public = repo.is_public; | ||
| let owner = repo.owner_did.clone(); | ||
| let caller_for_walk = caller_owned.clone(); | ||
| // Full-history walk shells out to git — keep it off the async runtime. | ||
| let walk = tokio::task::spawn_blocking(move || { | ||
| withheld_blob_oids(&rp, &r, is_public, &owner, caller_for_walk.as_deref()) | ||
| }) | ||
| .await; | ||
| // Fail closed on EITHER a task panic (JoinError) or a walk error: | ||
| // we cannot prove the caller may read here, so skip this repo and | ||
| // let a public copy (if any) serve. Never serve on an unproven gate. | ||
| let set = match walk { | ||
| Ok(Ok(set)) => set, | ||
| Ok(Err(e)) => { | ||
| tracing::warn!(repo = %repo.name, err = %e, "withheld walk failed; skipping repo"); | ||
| continue; | ||
| } | ||
| Err(e) => { | ||
| tracing::warn!(repo = %repo.name, err = %e, "withheld walk task panicked; skipping repo"); | ||
| continue; | ||
| } | ||
| }; | ||
| withheld_memo.insert(repo.id.clone(), set); | ||
| } | ||
| if withheld_memo | ||
| .get(&repo.id) | ||
| .is_some_and(|set| set.contains(&sha256_hex)) | ||
| { | ||
| continue; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
🚀 Performance & Scalability | 🟠 Major | ⚡ Quick win
Avoid running the full withheld walk before proving this repo contains the CID.
Right now any anonymous request for an absent/random CID can force withheld_blob_oids across every visible path-scoped repo before store::read_object is attempted. Move the cheap object-presence check before the full-history walk, then apply the withheld filter before returning bytes.
Suggested shape
+ let object = match store::read_object(&repo_path, &sha256_hex) {
+ Ok(Some(object)) => object,
+ Ok(None) => continue,
+ Err(e) => {
+ tracing::warn!(repo = %repo.name, err = %e, "error reading git object");
+ continue;
+ }
+ };
+
// Per-blob withholding only applies when a path-scoped rule exists (KTD4).
if has_path_scoped_rule(rules) {
...
if withheld_memo
.get(&repo.id)
.is_some_and(|set| set.contains(&sha256_hex))
{
continue;
}
}
- match store::read_object(&repo_path, &sha256_hex) {
- Ok(Some((_obj_type, content))) => {
+ let (_obj_type, content) = object;
+ {
...
- }
- Ok(None) => continue,
- Err(e) => {
- tracing::warn!(repo = %repo.name, err = %e, "error reading git object");
- continue;
- }
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/api/ipfs.rs` around lines 109 - 145, Move the cheap
object-presence check in the IPFS read flow before the expensive
withheld-history scan so random/absent CIDs do not trigger `withheld_blob_oids`
across repos. In the `crates/gitlawb-node/src/api/ipfs.rs` read path, use the
existing `store::read_object`/CID presence logic first, then only run the
path-scoped withheld filter for repos that actually contain the object, and keep
the final bytes-returning path gated by that withheld result.
Closes #110.
GET /ipfs/{cid}served any git object by its raw SHA-256 with no authentication and no visibility check. For a public repo with a path-scoped deny rule (e.g./secret/**), an unauthenticated caller could recover a withheld blob's oid from the served trees, derive its CID, and fetch the cleartext here, bypassing the pack-path withholding. This is the withholding-bypass class of #98 and #116 on the content-addressed egress surface.What changed
/ipfs/{cid}gets its own sub-router underoptional_signature;/api/v1/ipfs/pinsstays unsigned (see residuals).visibility_checkat/denies the row (continue), and for rows with path-scoped rules a blob in the caller'swithheld_blob_oidsset is skipped. The object is served only from a row the caller passes; denial and genuine not-found both return an opaque 404.authorize_repo_read, whose fuzzyget_repore-resolve could authorize a different physical row than the one read. Visibility rules are fetched in one batched query; the per-repo withheld set is memoized within the request; the history walk runs onspawn_blockingand fails closed (skips the repo) on a walk error or task panic.get_by_cidis added to theauthz_guarddrift table so the gate cannot be silently removed.Tests
New
#[sqlx::test]coverage, executed RED-before / GREEN-after: anon and signed-non-reader denied (404, no leak), owner and listed reader served, public blob and withheld-subtree tree CID served (trees are not withheld), multi-repo availability from a public copy when withheld elsewhere, repo-level private deny, and the fail-closed walk-error path (verified load-bearing by mutation). Fullgitlawb-nodesuite green; fmt and clippy clean.Scope and known residuals
This closes the direct unauthenticated CID scan. It does not close the whole withholding bypass on the IPFS surface, and should not be read as doing so:
is_public=truewith no replicated rules)./api/v1/ipfs/pinsstill exposes withheld CIDs to anonymous callers; until it is gated, the opaque-404 here is undermined by that oracle.Note:
store::init_barecreates repos with--object-format=sha1, where a sha256 CID digest never matches a git oid, so this endpoint is dormant for those repos and the gate is defense-in-depth there. The tests seed sha256 repos to exercise the gate.Summary by CodeRabbit
/ipfs/{cid}now respects caller identity when serving content, enabling per-repository and path-based access control.