Skip to content

fix(node): gate /ipfs/pins and /arweave/anchors behind authentication (#121)#134

Open
Gravirei wants to merge 42 commits into
Gitlawb:mainfrom
Gravirei:bug_fix_2
Open

fix(node): gate /ipfs/pins and /arweave/anchors behind authentication (#121)#134
Gravirei wants to merge 42 commits into
Gitlawb:mainfrom
Gravirei:bug_fix_2

Conversation

@Gravirei

@Gravirei Gravirei commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Closes #121

Summary

Two node-wide metadata endpoints were served without authentication or visibility filtering, leaking private-repo metadata.

Changes

/api/v1/ipfs/pins (list_pins):

  • Added optional_signature middleware to the route
  • Handler now returns 401 Unauthorized for anonymous callers, stopping anonymous node-wide CID enumeration

/api/v1/arweave/anchors (list_anchors):

  • Added optional_signature middleware to the route
  • When ?repo=<owner>/<name> is provided: gates on authorize_repo_read (deny → 404), preventing anchor metadata leaks for private repos
  • Without ?repo=: requires authentication for the global anchor listing

Testing

  • All 314 tests pass (including PostgreSQL integration tests)
  • Builds clean with no new warnings

Summary by CodeRabbit

  • Bug Fixes
    • Secured IPFS pinned listings: anonymous access is rejected; results are filtered by repo/path visibility, excluding quarantined repos and any withheld objects.
    • Improved Arweave anchor listings: anonymous access is rejected; ?repo=<owner>/<name> is strictly validated and unauthorized repos are hidden; results are capped (limit ≤ 200) and returned with correct count.
    • Updated routing so IPFS pins and Arweave anchors consistently apply optional-signature authentication.
  • Tests
    • Added integration coverage for 401 (anonymous) and 200/404 visibility behavior for both global and ?repo= scopes.

Gravirei added 2 commits June 30, 2026 19:52
Gitlawb#126)

The IPFS visibility gate used withheld_blob_oids (a deny-set enumerating
only reachable blobs), so a dangling/unreachable blob was absent from the
set and served in cleartext to anonymous callers. Flip to an allowed-set
(allowed_blob_set_for_caller) that enumerates reachable blobs the caller
may read: a dangling blob has no path, is never in the set, and 404s.
Move store::read_object before the allowed_blob_set_for_caller
spawn_blocking call so random-CID spray against repos with
path-scoped rules cannot trigger full-history git walks on
repos that don't carry the object.
@github-actions github-actions Bot added the needs-tests Source changed without accompanying tests (advisory) label Jun 30, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution. A couple of things will help us review this faster:

  • This changes Rust source but no tests changed. Tests are required for fixes and strongly encouraged for features.

See CONTRIBUTING.md. Update the PR and these notes will clear automatically.

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds optional-signature middleware for /api/v1/ipfs/pins and /api/v1/arweave/anchors, then filters list_pins and list_anchors by caller authentication and repo visibility. Adds integration tests for anonymous, authenticated, and repo-scoped access.

Changes

Auth Gating for Metadata Index Endpoints

Layer / File(s) Summary
Router middleware wiring
crates/gitlawb-node/src/server.rs
/api/v1/ipfs/pins is defined inside the IPFS router so it inherits auth::optional_signature; auth::optional_signature is added to the Arweave anchors route group.
Handler auth enforcement
crates/gitlawb-node/src/api/arweave.rs, crates/gitlawb-node/src/api/ipfs.rs
list_pins now requires a caller context and filters pinned objects through repo visibility, quarantined-repo checks, and withheld-blob handling. list_anchors now accepts optional auth, validates owner/name, rejects anonymous global listings, applies repo read authorization, and filters global results by per-anchor access.
Integration tests for pins and anchors
crates/gitlawb-node/src/test_support.rs
Adds SQLx tests for anonymous and authenticated pins access, plus anonymous, authenticated, owner-allowed, and non-reader-denied anchors access with and without ?repo=.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

  • Gitlawb/node#25: Introduces the repo visibility machinery that list_anchors and list_pins now consume.
  • Gitlawb/node#52: Establishes the authorize_repo_read read-gating pattern used by list_anchors.
  • Gitlawb/node#90: Adds the object-enumeration path that list_pins now uses when building allowed hashes.

Suggested labels

kind:security, subsystem:visibility, subsystem:api, sev:medium

Suggested reviewers

  • kevincodex1
  • jatmn

Poem

🐇 I hop past pins with careful feet,
And anchors only show what's meet.
Signed little hops, both safe and neat,
Keep secret burrows off the street.
A carrot cheer for guarded streams! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: gating the IPFS pins and Arweave anchors endpoints behind auth/visibility checks.
Description check ✅ Passed The description covers the fix, affected endpoints, and testing, though it omits several template sections like kind of change and reviewer verification.
Linked Issues check ✅ Passed The code adds auth and visibility enforcement for both /ipfs/pins and /arweave/anchors as requested in #121.
Out of Scope Changes check ✅ Passed The changes stay focused on auth/visibility hardening for the two listed metadata endpoints and the related tests.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/gitlawb-node/src/api/ipfs.rs (1)

186-203: 🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift

Avoid exposing the node-wide pin index to every signed DID.

This blocks anonymous callers, but any authenticated DID still reaches list_pinned_cids() and receives the full node-wide CID index. If authenticated users are not all node-wide admins, private-repo CIDs remain enumerable. Gate this with a real node-wide permission or make pins repo-scoped and authorize repo read before returning entries.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/ipfs.rs` around lines 186 - 203, The current auth
check in list_pins only blocks anonymous callers, but still lets any
authenticated DID reach list_pinned_cids() and see the full node-wide pin index.
Update list_pins to enforce a real node-wide permission using the authenticated
caller from AuthenticatedDid before returning pins, or change the data model so
pins are repo-scoped and authorize repo read access per caller. Keep the
existing unauthorized path for unauthenticated requests, and ensure the final
result only includes entries the caller is allowed to see.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/arweave.rs`:
- Around line 45-57: The global anchor listing path in the arweave handler still
exposes private repo metadata because it only checks that a caller exists before
calling list_arweave_anchors(None, limit). Update the handler in the arweave API
to enforce per-repo read visibility for the no-repo case, or require q.repo to
be present, or gate global listing behind a node-admin capability. Use the
existing caller check and the list_arweave_anchors call site to locate the fix.
- Around line 52-55: The Arweave anchor query is only capping the upper bound in
the handler, so negative `q.limit` values can still reach the database and fail
in `list_arweave_anchors`. Update the limit handling in `arweave.rs` to clamp
the request value to a minimum of zero and a maximum of 200 before passing it
into `state.db.list_arweave_anchors`, keeping the existing `q.repo.as_deref()`
flow unchanged.

---

Outside diff comments:
In `@crates/gitlawb-node/src/api/ipfs.rs`:
- Around line 186-203: The current auth check in list_pins only blocks anonymous
callers, but still lets any authenticated DID reach list_pinned_cids() and see
the full node-wide pin index. Update list_pins to enforce a real node-wide
permission using the authenticated caller from AuthenticatedDid before returning
pins, or change the data model so pins are repo-scoped and authorize repo read
access per caller. Keep the existing unauthorized path for unauthenticated
requests, and ensure the final result only includes entries the caller is
allowed to see.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f9fd5b6a-451e-4f4f-a04b-f97da029b822

📥 Commits

Reviewing files that changed from the base of the PR and between 8d96cab and b558bec.

📒 Files selected for processing (3)
  • crates/gitlawb-node/src/api/arweave.rs
  • crates/gitlawb-node/src/api/ipfs.rs
  • crates/gitlawb-node/src/server.rs

Comment thread crates/gitlawb-node/src/api/arweave.rs Outdated
Comment thread crates/gitlawb-node/src/api/arweave.rs Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/gitlawb-node/src/test_support.rs (1)

1968-2017: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a repo-scoped public success case and assert the filter actually narrows the result set.

These tests only exercise ?repo= against a single private repo with a single anchor. That leaves two intended contracts unpinned: anonymous access to a public repo-scoped listing, and returning only anchors for the requested repo. A regression that blanket-denies public ?repo= reads or ignores the repo predicate would still pass here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/test_support.rs` around lines 1968 - 2017, The
`anchors_repo_denies_anonymous_on_private` and `anchors_repo_allows_owner` tests
only cover a single private repo, so they do not prove that `?repo=` works for
public repositories or that the repo filter is actually applied. Add a
repo-scoped success case for a public repo in the `anchors_router`/`seed_anchor`
test area, and make the assertion check that only anchors for the requested repo
are returned (not just the count). Use the existing `anchors_repo_*` test
patterns and the `anchors_router` request helpers to locate the right spot.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/gitlawb-node/src/test_support.rs`:
- Around line 1968-2017: The `anchors_repo_denies_anonymous_on_private` and
`anchors_repo_allows_owner` tests only cover a single private repo, so they do
not prove that `?repo=` works for public repositories or that the repo filter is
actually applied. Add a repo-scoped success case for a public repo in the
`anchors_router`/`seed_anchor` test area, and make the assertion check that only
anchors for the requested repo are returned (not just the count). Use the
existing `anchors_repo_*` test patterns and the `anchors_router` request helpers
to locate the right spot.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 13b6010e-7968-4a2e-ae89-754bc8ff29d1

📥 Commits

Reviewing files that changed from the base of the PR and between b558bec and 9b71e6a.

📒 Files selected for processing (1)
  • crates/gitlawb-node/src/test_support.rs

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This closes the anonymous hole in #121 and the ?repo= anchor path is correctly visibility-gated, so the direction is right. But the global listings need to filter on current visibility, not just require authentication, and there is a real leak path that auth alone does not cover. I traced the write and read sides end to end against the merged state.

The index tables are correctly gated on the write side: pinning (repos.rs:1055,:1167, behind withheld.is_some()) and anchoring (repos.rs:1249, behind announce) only run for anonymously-root-readable repos, and object_list excludes withheld blobs. So a repo that is private at push time is never indexed. CodeRabbit's finding is still valid though, by a different mechanism than a missing write gate: the gate is evaluated once at push, visibility is mutable afterward, and nothing reconciles the index.

Findings

  • [P2] Filter the global /arweave/anchors and /ipfs/pins listings on current visibility, not just authentication
    crates/gitlawb-node/src/api/arweave.rs:53, crates/gitlawb-node/src/api/ipfs.rs (list_pins)
    This confirms and extends CodeRabbit's open finding on the global anchor listing. Requiring authentication does not authorize: identities here are permissionless (optional_signature verifies a self-made signature, register is open), so any throwaway DID still reads the global lists. And because index rows are never reconciled when a repo is made private after a public push (no DELETE of either table exists, set_visibility touches neither, and both list queries are unfiltered SELECTs), the global lists can serve a now-private repo's slug, owner DID, branch names, commit SHAs, and object CIDs. Content stays gated by GET /ipfs/{cid} (#110/#133), so this is metadata disclosure. The same gap applies to /ipfs/pins, which is not flagged elsewhere. Resolve each row's repo and apply the current-visibility check before returning it (or restrict the global listing to a node-admin capability). Full mechanism and repro in #136.

  • [P2] Fix the ?repo= gate-vs-filter representation drift (and the test that hides it)
    crates/gitlawb-node/src/api/arweave.rs:51
    The gate resolves the repo via authorize_repo_read -> get_repo, which matches the owner by LIKE (full did:key: or bare short form). The result query then filters exact WHERE repo = $1 on the raw ?repo= string, but anchor rows are written short-form ({owner_short}/{name}, repos.rs:1142). So ?repo=did:key:zX/name authorizes (200) yet returns an empty list; only ?repo=zX/name returns the owner's anchors. It fails safe (returns fewer results, not more), so it is a correctness/usability bug, not a leak. The new anchors_repo_* tests pass only because seed_anchor seeds the full-DID form, which production never writes. Normalize the slug from the authorized RepoRecord before querying, and seed the production short form in tests.

  • [P3] Run cargo fmttest_support.rs:1903 fails rustfmt --check
    crates/gitlawb-node/src/test_support.rs:1903
    A long assert_eq! line needs wrapping; the format gate fails. cargo fmt fixes it.

Notes (non-blocking): the parse-fail branch returns NotFound("repo not found") while the gate path returns RepoNotFound; both 404 and deny is indistinguishable from miss, so no oracle, but returning RepoNotFound on both is more consistent. ?limit=-1 reaches Postgres as LIMIT -1 (pre-existing). The new tests use local routers rather than build_router, so the layering fix that is the core of this PR is not asserted end to end; a test through build_router would lock it. The router refactor itself is correct: the old .merge-after-.layer left /ipfs/pins unsigned, and this fixes it.

The ?repo= gating pattern is the right one. Extending the same current-visibility filtering to the two global listings (per #136) is what completes the #121 fix.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/arweave.rs`:
- Around line 36-37: The anonymous global anchor listing path in the arweave API
still allows `list_arweave_anchors(None, limit)` to run when `repo` is absent,
which should be rejected instead. Update the handler in `arweave.rs` around
`caller`/`list_arweave_anchors` so that when no repository is specified and
`auth` yields `None`, the request returns a 401 before any query is executed.
Keep the authenticated and repo-scoped paths unchanged, and make sure the
`list_arweave_anchors` call only happens after a valid caller has been
established.

In `@crates/gitlawb-node/src/api/ipfs.rs`:
- Around line 195-203: The pin listing handler is still continuing when auth is
missing, so anonymous requests get a filtered success response instead of being
rejected. In the IPFS API handler around the caller/auth extraction and
`list_pinned_cids` flow, add an explicit unauthorized check when `caller` is
`None` before loading pins, and return the proper 401 early from this function
instead of proceeding into the database query and filter logic.
- Around line 220-229: The repo selection logic in list_pins is bypassing the
quarantine gate because it calls visibility_check directly instead of the shared
authorization path used by authorize_repo_read. Update the readable-repo
derivation in the loop over repos to preserve the quarantine exclusion before
applying repo-level visibility, preferably by reusing authorize_repo_read or
extracting the quarantine filter into a shared helper referenced by both
list_pins and authorize_repo_read. Ensure the fix is applied around the repo
iteration and visibility_check flow so quarantined repos never contribute CIDs
to the pins index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e012ef44-6106-4e47-a97e-b4a94070946d

📥 Commits

Reviewing files that changed from the base of the PR and between 9b71e6a and 0fed173.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • crates/gitlawb-node/src/api/arweave.rs
  • crates/gitlawb-node/src/api/ipfs.rs
  • crates/gitlawb-node/src/test_support.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/gitlawb-node/src/test_support.rs

Comment thread crates/gitlawb-node/src/api/arweave.rs
Comment thread crates/gitlawb-node/src/api/ipfs.rs
Comment thread crates/gitlawb-node/src/api/ipfs.rs Outdated
…tion

✓ P3 blocker fixed: cargo fmt applied — the format gate will pass.

  ✓ P3 cleanup resolved: withheld_blob_oids is still used by replication code in repos.rs, so it stays.

  • P2 follow-up: Tree/commit disclosure tracked in Gitlawb#135 — out of scope here.

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found issues that need to be addressed before this is ready.

Findings

  • [P1] Complete CodeRabbit's request to reject anonymous global listings
    crates/gitlawb-node/src/api/arweave.rs:36, crates/gitlawb-node/src/api/ipfs.rs:195
    The latest patch still only makes the signature optional and then proceeds when caller is None. As a result, anonymous GET /api/v1/arweave/anchors and GET /api/v1/ipfs/pins return 200 instead of the 401 behavior this PR claims and tests for. I verified this with the PR's own focused tests: cargo test -p gitlawb-node pins_list -- --nocapture fails with pins_list_denies_anonymous returning 200 instead of 401, and cargo test -p gitlawb-node anchors_ -- --nocapture fails with anchors_global_denies_anonymous returning 200 instead of 401. Please complete the current CodeRabbit requests by returning Unauthorized before querying/filtering whenever the global listing has no authenticated caller.

  • [P1] Fix the new endpoint tests so they pass and cover the production data shape
    crates/gitlawb-node/src/test_support.rs:1889, crates/gitlawb-node/src/test_support.rs:1960, crates/gitlawb-node/src/test_support.rs:2011
    The new tests currently seed rows that the new implementation filters out, so the PR's own coverage is red and does not prove the intended production behavior. pins_list_allows_authenticated inserts a bare pinned_cids row but no readable repo/object containing that SHA, while list_pins now only returns pins whose SHA is found by scanning readable repo object databases, so the count is 0 instead of 1. The anchor tests seed some/repo or the full-DID slug (did:key:.../repo), but production anchor writes use the short owner slug from repos.rs, and list_anchors now normalizes authorized ?repo= lookups to that short form, so the owner/global success tests also return count 0. Please seed the same repo/object and short-slug shapes that the push/anchor paths actually write, then keep the assertions on the filtered results.

  • [P2] Preserve the quarantine gate when deriving readable repos for pins
    crates/gitlawb-node/src/api/ipfs.rs:220
    authorize_repo_read hides quarantined repos before it applies visibility rules, but the new list_pins implementation bypasses that helper and iterates list_all_repos() with a direct visibility_check. A quarantined public mirror can therefore still contribute object SHAs to allowed_sha256s, and any matching pinned CID is returned even though the rest of the read surface treats that repo as nonexistent until quarantine is cleared. Please reuse authorize_repo_read semantics here or add the same is_repo_quarantined exclusion before scanning repo objects.

@Gravirei Gravirei requested review from beardthelion and jatmn June 30, 2026 20:14

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core gates verify as resolved on the current head: anonymous global listings return 401 before any query, the pin scan skips quarantined repos, and the tests now seed the production slug and object shapes. The earlier P1/P1#2/P2 all check out on this head. What's left is regression coverage on the new exclusion branches, which currently have no test exercising them.

Findings

  • [P3] Cover the quarantine exclusion in list_pins
    crates/gitlawb-node/src/api/ipfs.rs:235
    The is_repo_quarantined skip is correct, but no pins_ test seeds a quarantined repo, so nothing guards it against regression. Seed a quarantined mirror that shares an object SHA with a pinned CID (and no readable non-quarantined repo carrying that SHA), then assert the pin is withheld.

  • [P3] Cover the path-scoped withheld-blob exclusion in list_pins
    crates/gitlawb-node/src/api/ipfs.rs:260
    list_pins builds its allowed set with withheld_blob_oids, a separate construction from get_by_cid's allowed_blob_set_for_caller, and no pins_ test sets a path-scoped rule, so this subtraction is uncovered. Add a public repo with a /secret/** rule where a withheld blob OID is pinned, and assert an authenticated non-reader's listing excludes that CID while the visible-object pins remain. This is the branch that would leak a private blob SHA through the pin index if the subtraction regresses.

  • [P3] Add a negative case for the global authenticated anchor filter
    crates/gitlawb-node/src/api/arweave.rs:76
    anchors_global_allows_authenticated only proves a public anchor is visible; the per-row authorize_repo_read filter has no test that an authenticated non-reader is denied another repo's private anchor. Add that exclusion assertion, mirroring anchors_repo_denies_non_reader for the ?repo= path.

@Gravirei Gravirei requested a review from beardthelion July 1, 2026 01:33

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
crates/gitlawb-node/src/test_support.rs (3)

2079-2240: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Duplicated owner/short-slug boilerplate across the five anchors tests.

anchors_global_allows_authenticated, anchors_global_denies_non_reader, anchors_repo_denies_anonymous_on_private, anchors_repo_allows_owner, and anchors_repo_denies_non_reader all repeat the identical owner_short/short_slug derivation and repo/anchor seeding sequence. A small shared helper (e.g., seed_owned_repo_with_anchor(is_public, repo_name) -> (Keypair, String /*owner_did*/, String /*short_slug*/)) would cut the duplication and centralize the short-slug convention these tests depend on.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/test_support.rs` around lines 2079 - 2240, The five
anchors tests repeat the same owner DID to short-slug derivation and repo/anchor
seeding logic, so factor that setup into a shared helper in test_support.rs and
reuse it from anchors_global_allows_authenticated,
anchors_global_denies_non_reader, anchors_repo_denies_anonymous_on_private,
anchors_repo_allows_owner, and anchors_repo_denies_non_reader. Have the helper
create the Keypair, derive owner_short/short_slug, seed either a public or
private repo, and insert the anchor so the tests only vary by visibility and
request/assertion details.

2146-2177: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Missing positive-path coverage: anonymous ?repo= access to a public repo's anchors.

The ?repo= tests only cover a private repo (deny-anonymous here, allow-owner at 2180-2212, deny-non-reader at 2214-2240). There's no test asserting an anonymous caller can still read anchors for a public repo via ?repo=, even though the production comments emphasize that gating must not break legitimate anonymous access to public content. Adding that case would directly validate the PR's stated goal of not over-restricting public repos.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/test_support.rs` around lines 2146 - 2177, Add
missing positive-path coverage for anonymous `?repo=` access on a public repo in
the `anchors_repo_denies_anonymous_on_private` area. Create a new test alongside
the existing `anchors_router` / `list_anchors` cases that seeds a public repo
and anchor, calls the `GET /api/v1/arweave/anchors?repo=...` path with
`anon_get`, and asserts `StatusCode::OK` (or the expected success response) to
verify `?repo=` still allows anonymous reads for public content.

1881-2030: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Consider extracting shared setup for the three pins tests.

pins_list_allows_authenticated, pins_list_excludes_quarantined_repos, and pins_list_withholds_path_scoped_blobs each repeat the same Keypair/fs_slug/short/seed_cid_repos boilerplate. Extracting a small helper (e.g., returning (Keypair, owner_did, CidFixture)) would reduce duplication and the risk of the three tests silently drifting apart on the owner-DID/slug convention.

♻️ Sketch of a shared fixture helper
struct PinFixture {
    owner: gitlawb_core::identity::Keypair,
    owner_did: String,
    fx: CidFixture,
}

fn seed_pin_owner(bare_names: &[&str]) -> PinFixture {
    use gitlawb_core::identity::Keypair;
    let owner = Keypair::generate();
    let owner_did = owner.did().to_string();
    let fs_slug = owner_did.replace([':', '/'], "_");
    let short = owner_did.split(':').next_back().unwrap().to_string();
    let fx = seed_cid_repos(&fs_slug, &short, bare_names);
    PinFixture { owner, owner_did, fx }
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/test_support.rs` around lines 1881 - 2030, The three
pins tests repeat the same owner setup, slug derivation, and seed_cid_repos
call, so extract that boilerplate into a small shared helper to keep them
aligned. Add a fixture/helper near pins_list_allows_authenticated,
pins_list_excludes_quarantined_repos, and pins_list_withholds_path_scoped_blobs
that returns the generated Keypair, owner_did, and CidFixture (or equivalent),
and update each test to use it for fs_slug/short and repository seeding. Keep
the existing test-specific assertions and repo/quarantine/visibility setup
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/gitlawb-node/src/test_support.rs`:
- Around line 2079-2240: The five anchors tests repeat the same owner DID to
short-slug derivation and repo/anchor seeding logic, so factor that setup into a
shared helper in test_support.rs and reuse it from
anchors_global_allows_authenticated, anchors_global_denies_non_reader,
anchors_repo_denies_anonymous_on_private, anchors_repo_allows_owner, and
anchors_repo_denies_non_reader. Have the helper create the Keypair, derive
owner_short/short_slug, seed either a public or private repo, and insert the
anchor so the tests only vary by visibility and request/assertion details.
- Around line 2146-2177: Add missing positive-path coverage for anonymous
`?repo=` access on a public repo in the
`anchors_repo_denies_anonymous_on_private` area. Create a new test alongside the
existing `anchors_router` / `list_anchors` cases that seeds a public repo and
anchor, calls the `GET /api/v1/arweave/anchors?repo=...` path with `anon_get`,
and asserts `StatusCode::OK` (or the expected success response) to verify
`?repo=` still allows anonymous reads for public content.
- Around line 1881-2030: The three pins tests repeat the same owner setup, slug
derivation, and seed_cid_repos call, so extract that boilerplate into a small
shared helper to keep them aligned. Add a fixture/helper near
pins_list_allows_authenticated, pins_list_excludes_quarantined_repos, and
pins_list_withholds_path_scoped_blobs that returns the generated Keypair,
owner_did, and CidFixture (or equivalent), and update each test to use it for
fs_slug/short and repository seeding. Keep the existing test-specific assertions
and repo/quarantine/visibility setup unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 77f2dac8-2f19-4e9e-bb77-0aa66c200e5f

📥 Commits

Reviewing files that changed from the base of the PR and between beafa8a and 81a5772.

📒 Files selected for processing (1)
  • crates/gitlawb-node/src/test_support.rs

@Gravirei Gravirei force-pushed the bug_fix_2 branch 2 times, most recently from bf9d690 to 069804a Compare July 1, 2026 01:46

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed on 069804a. The delta since my last pass is test-only (test_support.rs, +200/-84); the gates in list_pins and list_anchors are byte-identical to beafa8a where I already verified them, so this pass covers the new coverage.

All three gaps I flagged are closed, and each new test is load-bearing: with its production guard disabled in isolation, only that test fails, with the expected count.

  • quarantine skip: pins_list_excludes_quarantined_repos (0 vs 1)
  • path-scoped withheld-blob subtraction: pins_list_withholds_path_scoped_blobs (secret OID leaks, 1 vs 2)
  • global per-row filter: anchors_global_denies_non_reader (0 vs 1)

CodeRabbit's three nitpicks on 81a5772 (extract the pins/anchors fixtures, add the anonymous-public anchor case) are all addressed here. The full CI suite did not trigger on this head, so I ran it locally: fmt clean, clippy clean, full gitlawb-node suite green (325 passed).

One optional follow-up, not a blocker: list_pins has its own copy of the fail-closed "withheld walk failed, skip repo" arm with no test, while get_by_cid's copy is covered by ipfs_cid_walk_error_fails_closed.

Approving. @kevincodex1 this clears my earlier changes-requested; jatmn's CHANGES_REQUESTED also predates this head.

@Gravirei

Gravirei commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

@kevincodex1 LGTM

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found issues that need to be addressed before this is ready.

Findings

  • [P2] Cover the production router wiring for the signed metadata endpoints
    crates/gitlawb-node/src/test_support.rs:1848, crates/gitlawb-node/src/test_support.rs:2032
    The new tests prove the handlers work when the test manually builds a tiny router and applies optional_signature, but they do not exercise server::build_router, which is the production wiring this PR had to fix. That matters here because /api/v1/ipfs/pins was previously outside the optional-signature layer; if server.rs regressed or the route were left in the old .merge(...).layer(...) shape, the signed pins_list_allows_authenticated test would still pass while the real route would never attach AuthenticatedDid and would return 401 for every signed pins request. Please add at least one integration test through build_router for the signed success path, especially /api/v1/ipfs/pins, so the route-layer fix is actually pinned.

  • [P3] Complete CodeRabbit's request to clamp negative anchor limits
    crates/gitlawb-node/src/api/arweave.rs:69
    CodeRabbit's earlier negative-limit request is still valid on the current head: q.limit.min(200) caps only the upper bound, so /api/v1/arweave/anchors?limit=-1 still passes -1 into list_arweave_anchors, where it is bound directly into PostgreSQL LIMIT. PostgreSQL rejects a negative limit, so a malformed query can turn this listing endpoint into an internal error instead of a bounded empty/small result. Please clamp the lower bound as well, for example with q.limit.clamp(0, 200), before calling state.db.list_arweave_anchors.

@Gravirei Gravirei requested review from beardthelion and jatmn July 1, 2026 17:49
…rsor

- Removed fail-open else branch after MAX_WALKS: unverified path-scoped
  pins are dropped (fail closed) instead of silently included.
- Added MAX_BATCHES = 10 cap on keyset scan iterations (200 rows each,
  2000 rows max per request) so a single request can't scan the full
  pinned_cids table when hidden newer pins push visible results far out.
- Return next_cursor (pipe-delimited pinned_at|repo|sha256_hex) when the
  scan bound is reached before filling the page.
- Accept incoming cursor via query param for client-side pagination.

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found an issue that needs to be addressed before this is ready.

Findings

  • [P2] Do not advance past path-scoped pins that were skipped by the walk budget
    crates/gitlawb-node/src/api/ipfs.rs:392
    The latest walk cap now fails closed instead of returning unverified path-scoped rows, but rows from path-scoped repos after the first 50 distinct walks are still skipped while the batch cursor advances past them. For example, if the 51st path-scoped repo in the scan contains a caller-visible pin, allowed_blobs_by_repo.len() < MAX_WALKS is false, allowed_blobs_by_repo.get(&repo.id) is None, and that pin is omitted; because the response cursor is already moved to the end of the scanned batch, the caller cannot recover that row on the next page. Please stop before advancing past unverified path-scoped rows, return an explicit truncation/error state, or use a cursor/work-budget shape that preserves visible rows instead of silently hiding them.

  • [P2] Follow the pins cursor in the CLI listing
    crates/gl/src/ipfs_cmd.rs:46
    /api/v1/ipfs/pins is now paginated by default (default_limit() is 50 and the handler emits next_cursor), but gl ipfs list still performs a single signed GET with no cursor loop. Any node with more than the first page of visible pins will have older pins omitted from the command even though the subcommand is documented as listing all pinned CIDs, and the output gives no indication that another page exists. Please either page through next_cursor until exhaustion, or change the CLI/API contract to make the bounded first page explicit to users.

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed the current head dd24f3d by execution. The gate still holds: I ran anonymous-401, signed-non-reader deny, owner-allow, quarantine (owner path), negative and oversized ?limit, cross-owner slug-collision, and the v10 migration, all passing, and a fail-open mutation of the repo-read gate goes red as expected. The new MAX_BATCHES cap and the limit == 0 early return close the earlier unbounded-scan and zero-limit items.

The new next_cursor reopens the withheld-metadata channel this PR exists to close. Both P2s below come from one place: cursor is taken from batch.last() at ipfs.rs:377, before the per-pin visibility filter, and then emitted.

Findings

  • [P2] Do not put a withheld blob's id in next_cursor
    crates/gitlawb-node/src/api/ipfs.rs:447
    The cursor is set from the last fetched row (ipfs.rs:377) before the loop filters each pin through allowed_blob_set_for_caller, so next_cursor can carry the sha256_hex of a path-scoped-hidden blob. Executed: a signed caller who can read a root-public repo but not its /secret/** subtree gets count: 0, while next_cursor comes back as <pinned_at>|<repo>|<secret_sha> where secret_sha is the withheld blob's OID. Same CID/SHA-of-hidden-pins class you flagged at 17:58, on a different channel, reachable by any registered DID.

  • [P2] next_cursor skips visible pins across pages
    crates/gitlawb-node/src/api/ipfs.rs:377
    Same cause, availability side: the cursor advances to the last fetched row, so the next page resumes past rows the caller was never shown. Executed: five visible pins, no path rules, paged at limit=2 following next_cursor returns only two of the five. Broader than the walk-budget rows in your 18:17 note; it drops plainly-visible rows too.

Both close together by deriving next_cursor from the last pin actually pushed to the response (the last visible row), emitted only when the page fills. I verified that change by execution: it turns both cases above green and leaves the deny/allow/owner/quarantine/limit/collision/migration suite green. The one case it does not cover is resuming when a whole batch window is entirely hidden, which needs an explicit truncation flag or an opaque cursor, i.e. the item below.

  • [P3] Encode next_cursor opaquely
    crates/gitlawb-node/src/api/ipfs.rs:447
    The cursor embeds the RFC3339 pinned_at, which contains +00:00. A client echoing next_cursor back as ?cursor=...+00:00 has the + decoded to a space, corrupting the keyset boundary. An opaque (base64) token also gives you a clean place to carry batch-cap resume state without exposing row ids.

Your 18:17 "follow the pins cursor in the CLI" item is the client half of this and still stands. Not ready: the leak is a still-open confidentiality item and the skip is a correctness regression, both from the same cursor-anchoring fix.

@beardthelion beardthelion dismissed stale reviews from themself July 4, 2026 18:44

Superseded: reviewed an earlier head. Current review is on the current head (dd24f3d).

…I pagination

- next_cursor is now derived from the last *accepted* (visible) pin, not
  from batch.last(). This avoids leaking withheld-blob SHA/CID metadata
  via the cursor and prevents skipping rows the caller was never shown
  across pages (both P2).
- next_cursor is emitted only when the page fills (pins.len() >=
  max_visible), not unconditionally from the DB cursor.
- When MAX_BATCHES is hit before the page fills, emit truncated: true
  plus a resume cursor (from the last visible row if any, or the last
  scanned row as a last resort).
- Cursor is base64-encoded (URL-safe, no pad) for opacity and to avoid
  URL-decoding corruption of RFC3339 +00:00 timestamps (P3).
- gl ipfs list now loops on next_cursor until exhaustion, collecting
  and printing all visible pins across pages.
@Gravirei Gravirei requested a review from jatmn July 4, 2026 19:22

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I rechecked the changed paths and found issues that still need to be addressed.

Findings

  • [P2] Do not expose hidden-row IDs through the truncated-page cursor
    crates/gitlawb-node/src/api/ipfs.rs:489
    The new cursor is base64url, but it is still just a reversible encoding of pinned_at|repo|sha256_hex. When the scan hits MAX_BATCHES before finding any visible pin, the fallback emits next_cursor from db_cursor, which is the last scanned row rather than an accepted visible row. For a caller whose first 2000 scanned rows are all path-scoped-hidden, the response can therefore return count: 0, truncated: true, and a decodable cursor containing a withheld blob SHA. Please make the resume token genuinely opaque or server-side, or return a truncation/error state that does not encode hidden row metadata.

  • [P3] Report the full paginated pin count in node status
    crates/gl/src/node.rs:191
    /api/v1/ipfs/pins is now paginated by default, and the handler's count is the number of pins in the current page. gl node status still makes one signed request and prints that first-page count as Pinned CIDs, so a node with more than the first page of visible pins will be underreported as 50 or fewer. Please either follow next_cursor until exhaustion, as gl ipfs list now does, or change the status text/API contract so it does not present a page count as the total pinned CID count.

  • [P2] Preserve pre-existing non-tip pins during the v10 ownership migration
    crates/gitlawb-node/src/db/mod.rs:863
    The v10 backfill only assigns repo/owner_did when a pinned row's cid matches branch_cids, then falls every other existing pin back to empty strings. Those rows are then invisible to the new /api/v1/ipfs/pins query, which only asks for real readable (repo, owner_did) pairs. Since the pin table was populated from the full pushed object list rather than only current branch-tip CIDs, upgraded nodes can silently lose most historical object pins from the listing even when the repo is still readable. Please backfill from a source that preserves the original repo for pinned objects, or keep an explicit compatibility path for legacy rows instead of migrating them into an ownerless bucket that the endpoint never returns.

  • [P3] Expose the identity directory for signed IPFS pin reads
    crates/gl/src/ipfs_cmd.rs:20
    gl ipfs list now requires a keypair because the endpoint returns 401 for anonymous callers, and gl node status signs the pins panel when a keypair is available. Both code paths only load the default identity directory, even though the helper functions already accept a dir argument and most signed gl subcommands expose --dir. Users who keep their active identity outside the default location cannot list pins, and node status reports that identity is required even though there is no flag to supply it. Please thread a --dir option through these signed read paths so the new auth gate remains usable for the same identity layouts as the rest of the CLI.

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed on e2addc0. Splitting response_cursor from db_cursor is the right direction and closes the leak on the page-filled and truncated-with-visible paths; I paged that path end to end and it is clean (the cursor derives from the last returned pin, no skip or dup across pages, terminates correctly). One branch still leaks, so this is not ready yet.

Findings

  • [P1] Do not seed next_cursor from db_cursor on a truncated all-hidden page
    crates/gitlawb-node/src/api/ipfs.rs:488
    When a scan reaches MAX_BATCHES with no visible pins, response_cursor is None and the code falls back to db_cursor, the last fetched row, which is a withheld pin, base64-encoded. I drove this end to end: a caller who passes the repo-level gate for a repo with a /secret/** rule, scanning a hidden-only stretch, gets truncated:true and a next_cursor that base64-decodes straight back to a withheld pin's pinned_at|repo|sha256_hex. URL_SAFE_NO_PAD is an encoding, not opacity, so the withheld content hash is recoverable, which is the exact disclosure #121/#136 defend against. It does not need a large hidden set: batch_count increments before the fetch, so a scan that finishes right on the 1800-row batch boundary is still flagged truncated and still emits the stale db_cursor, leaking even when there is no further page. Reachable by any authenticated caller who can read such a repo at the repo level.

  • [P1 fix] Make the resume point an opaque server-side token, not the row's own columns
    Just omitting the cursor here stops the leak (I confirmed: the page then returns truncated:true with no next_cursor), but on its own it starves a legitimately visible row that sits past a hidden stretch longer than the scan bound: the caller gets truncated with no cursor and can never advance to it. If the endpoint should stay live through hidden spans, the scan-position resume needs to be a genuinely opaque token (encrypted, or server-side state), never a base64 of the row tuple and never HMAC-signed plaintext. Keep deriving the visible next_cursor from the last emitted row as you do now.

  • [P2] Bound the CLI pagination loop
    crates/gl/src/ipfs_cmd.rs:49
    gl ipfs list follows next_cursor with no page cap and no check that the cursor advanced. Against an honest node it terminates (the keyset scan is strictly monotone), but the node URL is caller-supplied, and a hostile or misconfigured node returning a constant next_cursor loops the client forever while all_pins grows in memory (it is only printed after the loop). An honest but large result set also grows client memory without bound. Add a page cap, break if the returned cursor equals the previous one, and consider streaming the output.

  • [P3] Add a cursor guard test; the fix currently has none
    crates/gitlawb-node/src/api/ipfs.rs
    There is no test for next_cursor, truncated, or the base64 round-trip anywhere, which is how the P1 branch regressed with nothing to catch it. Two cases belong here: seed a path-scoped repo with a hidden-only stretch past the scan bound and assert the emitted next_cursor does not base64-decode to any withheld sha256_hex; and a visible-pagination case asserting the cursor equals the last returned row with no skip or dup across pages.

The anonymous reject, the quarantine and dedup gates, the limit==0/negative clamp, and the per-page status check are all intact on this head.

@beardthelion beardthelion dismissed their stale review July 4, 2026 20:31

Superseded by re-review on e2addc0.

@Gravirei Gravirei requested review from beardthelion and jatmn July 5, 2026 00:10

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I rechecked the changed paths and found issues that still need to be addressed.

Findings

  • [P2] Do not expose unmapped legacy pins as public
    crates/gitlawb-node/src/api/ipfs.rs:288
    v10 falls back to repo = '' / owner_did = '' for pre-existing pinned rows that cannot be mapped through branch_cids, and list_pins then unconditionally adds that empty pair to every signed caller's readable set. Those rows no longer have a repo context, so they bypass the current-visibility filtering this PR is adding; a pin created while a repo was public, or an ambiguous/non-tip historical object from a repo later made private, is returned to any authenticated DID as an "assumed public" legacy pin. Please do not treat unmapped legacy rows as globally readable: either backfill enough repo context to apply the same visibility gate, hide/expire the unmapped rows, or expose them only through an explicitly privileged compatibility path.

  • [P2] Keep paginating past hidden-only scan windows without silently ending the list
    crates/gitlawb-node/src/api/ipfs.rs:501
    When the scan reaches MAX_BATCHES before accepting any visible pin, the handler now returns truncated: true with no next_cursor; the new cursor guard test even seeds an older visible pin behind the 2005 hidden rows and asserts this no-cursor response. That avoids leaking the hidden db_cursor, but it also makes the older visible pin unreachable, and both gl ipfs list and gl node status stop as soon as next_cursor is absent, so users get an incomplete listing/count with no recovery path. Please either provide a non-leaking resume token for the scan position, or make truncation an explicit client-visible error/state that is not treated as end-of-list.

P2: Remove the legacy-pin empty-repo compatibility path that exposed
unmapped legacy pinned_cids rows as unconditionally readable to every
authenticated caller. Legacy rows with empty repo/owner_did are now
invisible (they don't match the UNNEST pair filter).

P2: Add a non-leaking truncated_cursor resume mechanism. When the
keyset scan hits MAX_BATCHES before accepting any visible pin (all
hidden rows), the response now includes a truncated_cursor that
encodes only the pinned_at timestamp (no repo or SHA). The caller can
echo this back as ?truncated_cursor=<token> to resume scanning past
the hidden window. The SQL query for truncated_cursor uses a
(pinned_at < ) condition that avoids passing any hidden-blob
identifier into the query.
@Gravirei Gravirei requested a review from jatmn July 5, 2026 01:45

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I rechecked the changed paths and found issues that still need to be addressed.

Findings

  • [P2] Follow truncated_cursor in the GL pin readers
    crates/gl/src/ipfs_cmd.rs:81
    The server now returns truncated: true plus truncated_cursor when it scans a hidden-only window without accepting a visible pin, and the new cursor guard test proves callers must echo that token to reach an older visible pin behind the hidden span. gl ipfs list and gl node status still only read next_cursor and stop as soon as it is absent, so they treat this recoverable truncated page as end-of-list. That means the CLI can still omit visible pins and undercount Pinned CIDs in exactly the hidden-window case this update is trying to make recoverable. Please handle truncated_cursor in both pagination loops, or surface truncation as an explicit incomplete/error state instead of ending the listing.

  • [P2] Preserve the full keyset when resuming from truncated pages
    crates/gitlawb-node/src/db/mod.rs:2334
    The truncated resume query drops the (repo, sha256_hex) tie-breakers and resumes with only pinned_at < $3, while the real scan order is (pinned_at DESC, repo DESC, sha256_hex DESC). If the last hidden row scanned at the batch cap shares its pinned_at with later rows in that same ordered timestamp group, the resume skips every remaining row with that timestamp, including visible pins. The PR's own tests seed many rows with identical pinned_at, so this is a supported shape rather than an impossible state. Please make the non-leaking resume token preserve a total-order position, or otherwise avoid skipping same-timestamp rows when resuming after a truncated hidden window.

  • [P2] Do not expose hidden pin timestamps through truncated_cursor
    crates/gitlawb-node/src/api/ipfs.rs:512
    The all-hidden truncated path now avoids putting the hidden row's repo and SHA in the cursor, but it still returns truncated: true and a base64url token that decodes directly to the last scanned hidden row's exact pinned_at. pinned_at is part of the /ipfs/pins metadata this PR is gating, and the response also tells the caller they just crossed a hidden-only window. A caller who can read the repo root but not a withheld subtree can therefore still learn timing/existence information about hidden pins. Please make this resume token genuinely opaque, or return a truncation/error state that does not encode metadata from hidden rows.

…n CLI

P2: Replace pinned_at-in-base64 truncated_cursor with a genuinely opaque
token. The server stores (pinned_at, repo, sha256_hex) under a random
UUID in an in-memory Mutex<HashMap> and returns only the UUID. Callers
echo it back via ?truncated_cursor= and the server looks up the full
cursor — no hidden-row metadata is ever exposed (no pinned_at, repo,
or sha256_hex leaks through the wire). Entries TTL at 5 minutes.

P2: Both gl ipfs list (ipfs_cmd.rs) and gl node status (node.rs) now
follow truncated_cursor when next_cursor is absent and truncated is
true, enabling recovery past hidden-only scan windows.

P2: With the full DB cursor preserved in the cache, the resume query
uses the complete (pinned_at, repo, sha256_hex) 3-tuple, so no rows
are skipped due to same-timestamp grouping.
@Gravirei Gravirei requested a review from jatmn July 5, 2026 02:48

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I rechecked the changed paths again, including the latest hidden-window pagination changes, and found issues that still need to be addressed.

Findings

  • [P2] Do not advance past path-scoped pins after the walk budget is exhausted
    crates/gitlawb-node/src/api/ipfs.rs:424
    The current loop still treats every path-scoped repo after the first 50 distinct allowed-blob walks as invisible: once allowed_blobs_by_repo.len() < MAX_WALKS is false, no entry is inserted for the new repo, and the later allowed_blobs_by_repo.get(&repo.id) check fails. Because db_cursor is already advanced to batch.last() before filtering, a caller-visible pin in the 51st path-scoped repo is skipped and cannot be recovered by following next_cursor or truncated_cursor. Please stop before advancing past unverified path-scoped rows, return an explicit truncation/error state for this budget, or carry a resume shape that preserves those rows instead of silently dropping them.

  • [P2] Do not consume truncated cursors in a way that makes GET pagination non-retryable
    crates/gitlawb-node/src/state.rs:113
    take_truncated_cursor removes the opaque hidden-window token on the first lookup, and list_pins silently falls back to None when that token is unknown or expired. If the request using truncated_cursor is retried after a dropped response, timeout, client reconnect, or intermediate retry, the same GET token no longer resumes from the stored DB position; the handler restarts at the beginning and the client has no signal that the cursor was consumed. Pagination tokens need to be safe to retry, or invalid/expired tokens need to produce an explicit error/incomplete state rather than silently returning the first page again.

  • [P2] Treat stalled IPFS pagination as incomplete instead of returning partial success
    crates/gl/src/ipfs_cmd.rs:57
    crates/gl/src/node.rs:201
    The new GL readers follow next_cursor and truncated_cursor, but they no longer check that the returned cursor actually advances. A misconfigured or hostile node can return the same cursor forever; gl ipfs list will make up to 1000 requests, print the accumulated pins as a successful listing after only a warning, and gl node status silently breaks at the cap and reports the partial Pinned CIDs total as if it were complete. Since the node URL is user-controlled and the existing Arweave recovery path already guards non-advancing cursors, please restore an advancement guard for both cursor types and make the page cap an error/incomplete result rather than a successful count/listing.

  • [P2] Preserve legacy non-tip pins instead of migrating them out of the listing
    crates/gitlawb-node/src/db/mod.rs:863
    The v10 backfill still only assigns repo/owner_did when an old pinned_cids.cid matches branch_cids, then falls every other pre-existing pin back to empty strings. The current list_pins no longer exposes that empty pair, which fixes the public-leak variant, but it also means upgraded nodes silently lose historical/non-tip object pins from /api/v1/ipfs/pins, gl ipfs list, and the Pinned CIDs status count even when the repo is still readable. Since pins were recorded from pushed object lists rather than only current branch tips, please either backfill enough repo context to keep applying the visibility gate to these rows, or make the legacy-unmapped state explicit instead of moving those pins into an ownerless bucket that the endpoint never queries.

… guard, legacy pins

P2: Walk budget exhaustion now signals truncated instead of silently
dropping rows. db_cursor is advanced per-pin (not per-batch). When a
new path-scoped repo exceeds MAX_WALKS, the loop breaks without
advancing past that pin so the caller can retry with a fresh walk
cache.

P2: truncated_cursor is now retry-safe — the server stores the full
(pinned_at, repo, sha256_hex) under a random UUID in an in-memory
cache and never removes entries on lookup. Expired/unknown tokens
simply don't match (fall back to start).

P2: CLI advancement guard — gl ipfs list and gl node status now
detect non-advancing cursors and error instead of silently looping
to MAX_PAGES or returning partial counts.

P2: Re-added legacy-pin compat path with explicit log warnings.
Pre-Gitlawb#134 pins with empty repo/owner_did are returned to any
authenticated caller with a tracing::warn! so operators can
backfill repo context.
@Gravirei Gravirei requested a review from jatmn July 5, 2026 03:24

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I rechecked the changed paths and found an issue that still needs to be addressed.

Findings

  • [P2] Do not return unmapped legacy pins to every signed caller
    crates/gitlawb-node/src/api/ipfs.rs:294
    The v10 migration still maps old pinned rows that cannot be matched through branch_cids to repo = '' / owner_did = '', and list_pins now deliberately adds that empty pair to every signed caller's readable query set. Those rows then take the pin.repo.is_empty() branch and are returned without any repo visibility context, so a pin created while a repo was public, or an ambiguous/non-tip historical object from a repo later made private, is exposed to any authenticated DID. Please complete the earlier legacy-pin review item by not treating unmapped rows as globally readable: backfill enough repo context to apply the same visibility gate, hide/expire the unmapped rows, or expose them only through an explicitly privileged compatibility path.

  • [P2] Make truncated pin cursors durable or reject invalid tokens explicitly
    crates/gitlawb-node/src/api/ipfs.rs:347
    The new truncated_cursor is stored only in AppState.truncated_cursors, and get_truncated_cursor returns None when the token is missing, expired, generated before a process restart, or routed to another machine. list_pins then falls back to None via truncated_resume.or(initial_cursor), so a client trying to resume past a hidden-only scan window silently restarts at the first page instead of getting an error or incomplete state. That makes the new GET pagination token non-durable and non-retryable in normal deployment/restart paths, and gl ipfs list / gl node status can duplicate or miscount pages rather than knowing the listing is incomplete. Please either make the truncated resume token self-contained/durable enough to work across retries and node instances, or return an explicit invalid/expired cursor error instead of treating it as an absent cursor.

kevincodex1 pushed a commit that referenced this pull request Jul 5, 2026
* fix(gl,ipfs): sign gl ipfs list so it works under the #134 pins auth gate

/api/v1/ipfs/pins now 401s anonymous callers (#134). Load the caller's
identity (--dir, default keystore) and use NodeClient::get_signed; when no
identity exists, propagate load_keypair_from_dir's existing 'gl identity new'
error instead of a raw 401. Tests: signed-headers present, empty, and
no-identity-issues-no-request (mockito).

* fix(gl,node): sign the gl node status pins panel, degrade gracefully

The pins panel signs its /api/v1/ipfs/pins read (#134 gates it behind auth)
via an injectable fetch_pins helper with four states: pins, empty,
unavailable (signed read rejected/errored), and needs-identity (no keypair,
no request issued). cmd_status loads the identity gracefully so a missing
keystore never aborts status; peers/repos/p2p/events panels stay anonymous.
Tests drive fetch_pins directly against a mock node.

* fix(gl): address review — surface pins HTTP errors, correct empty-panel label

- gl ipfs list: check the /ipfs/pins response status before parsing, so a
  rejected signed read (e.g. 401) surfaces as an error instead of silently
  printing 'No IPFS pins recorded'. Matches the sibling gl ipfs get.
- gl node status: PinsPanel::Empty now renders 'Pinned CIDs: 0' (a reachable
  node with zero pins), reserving 'unavailable' for the failure state; the
  prior label reused 'IPFS not configured' and misreported the zero-pins case.
- Carry the resolved count in PinsPanel::Pins, removing the duplicated
  count-fallback closure between fetch_pins and the render arm.

* test(gl,node): cover fetch_pins error branches and pins-panel rendering

Extract the pins-panel render into a testable pins_status_line helper and
assert all four states' output (closes the untested cmd_status render path,
including the empty->'Pinned CIDs: 0' label). Add fetch_pins tests for the
two error branches that were code-traced but unexecuted: a malformed 200
body and a transport error both degrade to Unavailable without panicking.

* style(gl): rustfmt the ipfs list pins-response parse

* fix(gl,node): honor --dir identity selector in `gl node status` (#146)

`gl node status` always loaded the default keystore via
`load_keypair_from_dir(None)`, so a user who created or selected an identity
with `--dir` saw the status pins panel render "sign in to view" even though
`gl ipfs list --dir` authenticated fine. Add a `--dir` option to
`NodeCmd::Status` and thread it into the signed pins fetch, so both CLI pins
callers select the same identity directory.

An explicit `--dir` that can't be loaded (missing, corrupt, or unreadable) no
longer masquerades as "no identity, run `gl identity new`": the pins identity
is resolved up front into Keyed / Anonymous / DirUnusable, and the unusable
explicit-dir case renders a distinct "no usable identity in <dir>" panel line
that names the path, while a missing default keystore (no `--dir`) still
degrades quietly. A pins failure never aborts the dashboard.

Tests cover both flag branches (Some/None) and the resolver both ways: a good
`--dir` loads Keyed, an unusable one reports DirUnusable/IdentityError without
issuing a request. The new symbols were absent before, so they compile-RED
first.

* test(gl,node): pin the pins-identity resolver to the --dir key

Follow-up test hardening on the #146 --dir wiring after review:

- resolve_pins_auth_loads_keyed_identity_from_explicit_dir now destructures
  Keyed and asserts loaded.did() equals the --dir key's DID. A bare Keyed(_)
  match passed even when the load fell through to the default keystore; the
  DID assertion catches that regression (confirmed red under a simulated
  fallthrough, green on the real code).
- add resolve_pins_auth_none_is_never_dir_unusable: an absent --dir must
  degrade to Anonymous or Keyed, never DirUnusable. It asserts only the
  None to DirUnusable mapping, so it is stable whether or not the test
  runner has a default ~/.gitlawb identity.

Test module only; no production behavior change.

---------

Co-authored-by: t <t@t>
@beardthelion beardthelion dismissed their stale review July 5, 2026 13:46

Superseded by a re-review of the current head (9087e8c); the findings there are re-verified and updated below.

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed the current head (9087e8c) by execution. Concurring with jatmn's two open findings, and raising the legacy-pin one to P1 after reproducing the leak. Separately, the two INV-9 items the bots left unresolved (the next_back anchor slug and the normalize_owner_did duplicate) are already fixed on this head, so no action is needed there. Two blockers remain.

Findings

  • [P1] Do not return unmapped legacy pins to any signed caller
    crates/gitlawb-node/src/api/ipfs.rs:294, crates/gitlawb-node/src/api/ipfs.rs:416
    This is jatmn's open finding, and it reproduces as a live leak. list_pins pushes the empty ("", "") pair into every signed caller's readable set unconditionally, and the if pin.repo.is_empty() branch returns those rows with only a tracing::warn, no visibility decision. I seeded a private repo plus one v10-backfilled repo=''/owner_did='' pin and drove list_pins as a freshly generated stranger keypair: the response was 200 with count:1 carrying the pin's sha256_hex, the object id of a withheld blob. Identities are permissionless, so any signed DID is effectively public, which puts this at P1 rather than P2. Drop the empty pair from the query set (serve nothing until the row is backfilled) or restrict the legacy branch to an operator identity; a warn log is not an access control. The /arweave/anchors twin is not affected: list_anchors builds its query set only from readable repos and there is no repo='' backfill for arweave_anchors.

  • [P2] Bound and make durable the truncated pin cursor, or drop the server-side map
    crates/gitlawb-node/src/state.rs:101, crates/gitlawb-node/src/api/ipfs.rs:536
    truncated_cursors is an in-process Arc<Mutex<HashMap>> with a 300s TTL and no size or per-caller cap, and add_truncated_cursor runs an O(n) retain over the whole map on every insert under a single mutex. On the all-hidden pagination path any signed DID mints an entry per request, so the map grows one entry per request until the TTL catches up, the retain turns into O(n squared) contention that blocks every cursor user rather than just the abuser, and expired entries are evicted only on the next insert, not on read. The same map is also the durability gap jatmn raised: an unknown, expired, post-restart, or cross-node token falls back to None and silently restarts pagination at page 1. Both problems go away if the truncated cursor is a self-contained opaque token (encode the scan position with an expiry) instead of server-side state, which is how the visible next_cursor is already emitted.

  • [P3] Rebase onto current main before merge
    This branch's base predates #143, and both touch db/mod.rs, so the rebase will conflict there. Re-review the resolved diff before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crate:node gitlawb-node — the serving node and REST API kind:bug Defect fix — wrong or unsafe behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unauthenticated metadata indexes leak private-repo data: /ipfs/pins and /arweave/anchors

4 participants