Skip to content

fix(node): gate GET /ipfs/{cid} on per-caller path-scoped visibility (#110)#128

Merged
kevincodex1 merged 3 commits into
mainfrom
fix/gate-ipfs-cid-visibility
Jun 30, 2026
Merged

fix(node): gate GET /ipfs/{cid} on per-caller path-scoped visibility (#110)#128
kevincodex1 merged 3 commits into
mainfrom
fix/gate-ipfs-cid-visibility

Conversation

@beardthelion

@beardthelion beardthelion commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Closes #110.

GET /ipfs/{cid} served any git object by its raw SHA-256 with no authentication and no visibility check. For a public repo with a path-scoped deny rule (e.g. /secret/**), an unauthenticated caller could recover a withheld blob's oid from the served trees, derive its CID, and fetch the cleartext here, bypassing the pack-path withholding. This is the withholding-bypass class of #98 and #116 on the content-addressed egress surface.

What changed

  • Bind optional caller identity: /ipfs/{cid} gets its own sub-router under optional_signature; /api/v1/ipfs/pins stays unsigned (see residuals).
  • Gate each iterated repo row against its own rules: visibility_check at / denies the row (continue), and for rows with path-scoped rules a blob in the caller's withheld_blob_oids set is skipped. The object is served only from a row the caller passes; denial and genuine not-found both return an opaque 404.
  • The row is gated directly rather than through authorize_repo_read, whose fuzzy get_repo re-resolve could authorize a different physical row than the one read. Visibility rules are fetched in one batched query; the per-repo withheld set is memoized within the request; the history walk runs on spawn_blocking and fails closed (skips the repo) on a walk error or task panic.
  • get_by_cid is added to the authz_guard drift table so the gate cannot be silently removed.

Tests

New #[sqlx::test] coverage, executed RED-before / GREEN-after: anon and signed-non-reader denied (404, no leak), owner and listed reader served, public blob and withheld-subtree tree CID served (trees are not withheld), multi-repo availability from a public copy when withheld elsewhere, repo-level private deny, and the fail-closed walk-error path (verified load-bearing by mutation). Full gitlawb-node suite green; fmt and clippy clean.

Scope and known residuals

This closes the direct unauthenticated CID scan. It does not close the whole withholding bypass on the IPFS surface, and should not be read as doing so:

Note: store::init_bare creates repos with --object-format=sha1, where a sha256 CID digest never matches a git oid, so this endpoint is dormant for those repos and the gate is defense-in-depth there. The tests seed sha256 repos to exercise the gate.

Summary by CodeRabbit

  • New Features
    • /ipfs/{cid} now respects caller identity when serving content, enabling per-repository and path-based access control.
    • Content that should be hidden is now skipped automatically, even when the object exists in a repository.
  • Bug Fixes
    • Anonymous or unauthorized requests now receive a consistent not-found response for protected content.
    • Public copies of shared content can still be served when a hidden copy exists elsewhere.
    • Private repositories now correctly deny access unless the caller is allowed.

…110)

get_by_cid served any git object by raw SHA-256 with no auth and no
visibility check. For a public repo with a path-scoped deny rule, an
unauthenticated caller could recover a withheld blob's oid from the
served trees, derive its CID, and fetch the cleartext here, bypassing
the pack-path withholding.

Bind optional caller identity (optional_signature on a dedicated
/ipfs/{cid} sub-router; the pins route stays unsigned, tracked by #121)
and gate each iterated repo row against its OWN rules: visibility_check
at "/" denies the repo, and withheld_blob_oids skips a blob withheld
from the caller when the row carries path-scoped rules. The row is gated
directly rather than via authorize_repo_read, whose fuzzy get_repo
re-resolve could authorize a different physical row than the one read.
The per-repo withheld set is memoized on repo.id within the request.

Adds get_by_cid to the authz_guard drift table (visibility_check marker).

Scope: this closes the direct unauthenticated scan. A stale-public
mirror row still serves withheld content on both this path and the pack
path; that pre-existing leak is tracked separately in #124.
…in get_by_cid

Code-review follow-ups for #110:
- Replace the per-repo list_visibility_rules call with a single
  list_visibility_rules_for_repos batch query (the existing #97 pattern),
  removing the N+1 on an unauthenticated scan and the "one repo's DB error
  500s the whole request" semantics.
- Fail closed on a spawn_blocking JoinError (walk task panic) by skipping the
  repo, matching the walk-error arm, instead of 500-ing the whole scan.
- Add a regression test for the repo-level "/" deny branch (private repo,
  anon denied / owner served) and note the access gate in the module doc.
Induce a withheld_blob_oids walk error (a ref pointing at a non-tree-ish
blob, same technique as visibility_pack::fails_closed_when_a_ref_cannot_be_traversed)
and assert the handler skips the whole repo: the withheld blob 404s with no
leak, and the public blob in the same repo also 404s (proving fail-closed-skip
rather than serve-on-error). Verified load-bearing: mutating the Ok(Err) arm to
treat the error as no-withholding makes it RED (secret served 200).
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The GET /ipfs/{cid} handler is updated to accept an optional AuthenticatedDid extension and enforce per-repo visibility gating. The server router now applies optional_signature middleware only to this route. The handler fetches all visibility rules in one DB query, gates each repo via visibility_check, computes withheld blob OID sets for path-scoped repos in spawn_blocking, and skips serving withheld objects. Integration tests cover all access-control scenarios.

IPFS CID Visibility Gating

Layer / File(s) Summary
Handler signature and middleware wiring
crates/gitlawb-node/src/api/ipfs.rs, crates/gitlawb-node/src/server.rs
get_by_cid gains an optional AuthenticatedDid parameter; the router applies auth::optional_signature only to /ipfs/{cid}, leaving /api/v1/ipfs/pins unsigned.
Visibility gating and withheld-blob logic
crates/gitlawb-node/src/api/ipfs.rs
Caller identity is derived from auth; all repo visibility rules are fetched in one query; each repo is gated via visibility_check; path-scoped repos compute a withheld blob OID set in spawn_blocking and memoize it; objects whose SHA-256 hex is in the withheld set are skipped.
Gate-marker test assertion
crates/gitlawb-node/src/api/mod.rs
authz_guard test extended to include ipfs.rs and assert get_by_cid contains visibility_check( rather than routing through authorize_repo_read.
CID test fixtures and helpers
crates/gitlawb-node/src/test_support.rs
CidFixture, seed_cid_repos, cid_for_oid, cid_router, cid_anon, cid_signed, and cid_parts create sha256-format git repos with public/secret files and provide anonymous/signed test request utilities.
Integration tests for all visibility scenarios
crates/gitlawb-node/src/test_support.rs
Four test functions cover path-scoped /secret/** denial, cross-repo public-copy fallback, repo-level / gate, and fail-closed walk-error behavior.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant get_by_cid
  participant DB
  participant spawn_blocking
  participant ObjectStore

  Caller->>get_by_cid: GET /ipfs/{cid} (optional DID via optional_signature)
  get_by_cid->>DB: list_visibility_rules_for_repos
  DB-->>get_by_cid: per-repo rules map
  loop each repo
    get_by_cid->>get_by_cid: visibility_check(caller, repo "/" rules)
    alt repo denied
      get_by_cid->>get_by_cid: skip repo
    else path-scoped rules present
      get_by_cid->>spawn_blocking: withheld_blob_oids(repo, caller)
      spawn_blocking-->>get_by_cid: withheld SHA-256 hex set (memoized)
      alt cid hex in withheld set
        get_by_cid->>get_by_cid: skip object
      else
        get_by_cid->>ObjectStore: read_object(sha)
        ObjectStore-->>get_by_cid: bytes
        get_by_cid-->>Caller: 200 raw bytes
      end
    else no path-scoped rules
      get_by_cid->>ObjectStore: read_object(sha)
      ObjectStore-->>get_by_cid: bytes
      get_by_cid-->>Caller: 200 raw bytes
    end
  end
  get_by_cid-->>Caller: 404 not found
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

  • #110 (GET /ipfs/{cid} serves any git object by raw hash with no visibility check, leaking withheld blobs): This PR directly remediates the issue by threading caller identity onto the route and applying visibility_check plus withheld_blob_oids filtering before serving any object.

Possibly related PRs

  • Gitlawb/node#28: Both implement per-caller withheld blob OID filtering using visibility_pack::withheld_blob_oids to skip serving denied blobs—feat(node): subtree content withholding (Phase 3) for #18 #28 on the Git smart-HTTP pack path, this PR on /ipfs/{cid}.
  • Gitlawb/node#84: Directly modifies withheld_blob_oids (full ref scope, fail-closed history walk) that this PR now consumes in the IPFS handler.
  • Gitlawb/node#111: Introduces bulk list_visibility_rules_for_repos and NFC-normalized visibility semantics that this PR's handler relies on for its single-query rule fetch.

Suggested labels

sev:medium, crate:node, subsystem:api, kind:security, subsystem:visibility

Suggested reviewers

  • jatmn
  • kevincodex1

Poem

A bunny once hopped to a CID gate,
No secret blobs served without auth — how great!
The withheld set memoized, spawn_blocking in tow,
Each repo now checked before data can flow.
🐇 "No leaks through the IPFS door today!"

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: gating GET /ipfs/{cid} by per-caller visibility.
Description check ✅ Passed The description covers the fix, tests, and residuals, though it doesn't fully mirror every template section.
Linked Issues check ✅ Passed The code matches #110 by adding caller identity, visibility checks, withheld-blob filtering, opaque 404s, and regression tests.
Out of Scope Changes check ✅ Passed The added router split, authz guard update, and tests all support the IPFS gating fix and don't introduce unrelated scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/gate-ipfs-cid-visibility

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/gitlawb-node/src/api/mod.rs (1)

190-192: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win

Guard the “must not use authorize_repo_read” invariant too.

This row only proves visibility_check( is present. A future get_by_cid change could add authorize_repo_read( and still pass, despite the KTD2a comment.

Suggested guard
         for (src, func, marker) in rows {
             let body = fn_body(src, func);
             assert!(
                 body.contains(marker),
                 "handler `{func}` is missing its gate marker `{marker}` — gate removed or route reclassified"
             );
+            if func == "get_by_cid" {
+                assert!(
+                    !body.contains("authorize_repo_read("),
+                    "get_by_cid must gate each iterated row directly, not via authorize_repo_read"
+                );
+            }
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/mod.rs` around lines 190 - 192, The get_by_cid
invariant check only verifies visibility_check( is present, so it can miss a
regression where authorize_repo_read( is added later. Update the guard in mod.rs
for the get_by_cid row to assert both that visibility_check( remains present and
that authorize_repo_read( is not used, using the get_by_cid entry in the
existing test/scan logic so the KTD2a requirement is enforced directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/ipfs.rs`:
- Around line 109-145: Move the cheap object-presence check in the IPFS read
flow before the expensive withheld-history scan so random/absent CIDs do not
trigger `withheld_blob_oids` across repos. In the
`crates/gitlawb-node/src/api/ipfs.rs` read path, use the existing
`store::read_object`/CID presence logic first, then only run the path-scoped
withheld filter for repos that actually contain the object, and keep the final
bytes-returning path gated by that withheld result.

---

Nitpick comments:
In `@crates/gitlawb-node/src/api/mod.rs`:
- Around line 190-192: The get_by_cid invariant check only verifies
visibility_check( is present, so it can miss a regression where
authorize_repo_read( is added later. Update the guard in mod.rs for the
get_by_cid row to assert both that visibility_check( remains present and that
authorize_repo_read( is not used, using the get_by_cid entry in the existing
test/scan logic so the KTD2a requirement is enforced directly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 93a39896-85ab-4030-b16c-a97ff4808443

📥 Commits

Reviewing files that changed from the base of the PR and between adc20f9 and c2a578a.

📒 Files selected for processing (4)
  • crates/gitlawb-node/src/api/ipfs.rs
  • crates/gitlawb-node/src/api/mod.rs
  • crates/gitlawb-node/src/server.rs
  • crates/gitlawb-node/src/test_support.rs

Comment on lines +109 to +145
// Per-blob withholding only applies when a path-scoped rule exists (KTD4).
if has_path_scoped_rule(rules) {
if !withheld_memo.contains_key(&repo.id) {
let rp = repo_path.clone();
let r = rules.to_vec();
let is_public = repo.is_public;
let owner = repo.owner_did.clone();
let caller_for_walk = caller_owned.clone();
// Full-history walk shells out to git — keep it off the async runtime.
let walk = tokio::task::spawn_blocking(move || {
withheld_blob_oids(&rp, &r, is_public, &owner, caller_for_walk.as_deref())
})
.await;
// Fail closed on EITHER a task panic (JoinError) or a walk error:
// we cannot prove the caller may read here, so skip this repo and
// let a public copy (if any) serve. Never serve on an unproven gate.
let set = match walk {
Ok(Ok(set)) => set,
Ok(Err(e)) => {
tracing::warn!(repo = %repo.name, err = %e, "withheld walk failed; skipping repo");
continue;
}
Err(e) => {
tracing::warn!(repo = %repo.name, err = %e, "withheld walk task panicked; skipping repo");
continue;
}
};
withheld_memo.insert(repo.id.clone(), set);
}
if withheld_memo
.get(&repo.id)
.is_some_and(|set| set.contains(&sha256_hex))
{
continue;
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Performance & Scalability | 🟠 Major | ⚡ Quick win

Avoid running the full withheld walk before proving this repo contains the CID.

Right now any anonymous request for an absent/random CID can force withheld_blob_oids across every visible path-scoped repo before store::read_object is attempted. Move the cheap object-presence check before the full-history walk, then apply the withheld filter before returning bytes.

Suggested shape
+        let object = match store::read_object(&repo_path, &sha256_hex) {
+            Ok(Some(object)) => object,
+            Ok(None) => continue,
+            Err(e) => {
+                tracing::warn!(repo = %repo.name, err = %e, "error reading git object");
+                continue;
+            }
+        };
+
         // Per-blob withholding only applies when a path-scoped rule exists (KTD4).
         if has_path_scoped_rule(rules) {
             ...
             if withheld_memo
                 .get(&repo.id)
                 .is_some_and(|set| set.contains(&sha256_hex))
             {
                 continue;
             }
         }
 
-        match store::read_object(&repo_path, &sha256_hex) {
-            Ok(Some((_obj_type, content))) => {
+        let (_obj_type, content) = object;
+        {
             ...
-            }
-            Ok(None) => continue,
-            Err(e) => {
-                tracing::warn!(repo = %repo.name, err = %e, "error reading git object");
-                continue;
-            }
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/ipfs.rs` around lines 109 - 145, Move the cheap
object-presence check in the IPFS read flow before the expensive
withheld-history scan so random/absent CIDs do not trigger `withheld_blob_oids`
across repos. In the `crates/gitlawb-node/src/api/ipfs.rs` read path, use the
existing `store::read_object`/CID presence logic first, then only run the
path-scoped withheld filter for repos that actually contain the object, and keep
the final bytes-returning path gated by that withheld result.

@kevincodex1 kevincodex1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevincodex1 kevincodex1 merged commit 174f25a into main Jun 30, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GET /ipfs/{cid} serves any git object by raw hash with no visibility check, leaking withheld blobs

2 participants