Skip to content

feat(node): subtree content withholding (Phase 3) for #18#28

Open
beardthelion wants to merge 12 commits into
Gitlawb:mainfrom
beardthelion:feat/phase3-subtree-withholding
Open

feat(node): subtree content withholding (Phase 3) for #18#28
beardthelion wants to merge 12 commits into
Gitlawb:mainfrom
beardthelion:feat/phase3-subtree-withholding

Conversation

@beardthelion
Copy link
Copy Markdown
Contributor

@beardthelion beardthelion commented Jun 6, 2026

Phase 1 (#25) made "private" a property of a path subtree and enforced whole-repo reads. This adds the piece Phase 1 deferred: a mode-B subtree rule now actually withholds that subtree's file content on the git read path, while the directory structure and blob SHAs stay visible.

How it works

withheld_blob_oids walks each ref's tree and, reusing the existing visibility_check, returns the blob OIDs a caller may not read. A blob is withheld only if it is denied at every path it appears at, so a blob that is also reachable through an allowed path is still sent.

upload_pack_excluding builds the response pack from all reachable objects minus those blob OIDs (commits and trees are always included, so SHAs stay intact) and frames it as a protocol v0 upload-pack response, matching what info_refs advertises.

git_upload_pack branches to the filtered serve only when the caller has at least one withheld blob. Public and fully authorized clones take the unchanged fast path and stay byte-identical to before.

Client behavior worth knowing

A pack that omits a blob still referenced by a sent tree is not closed under reachability, so a stock full git clone is refused at fetch time with "remote did not send all necessary objects". A partial clone (git clone --filter=...) accepts it with the private blob absent, and the tree entry and SHA remain visible. This is a git constraint the server cannot override: the client has to opt into partial-clone semantics. A plain git clone working without --filter is a separate client-side follow-up (git-remote-gitlawb).

The security guarantee, that private bytes never enter the served pack, holds for every client and is what the tests assert.

Tests

  • withheld_blob_oids truth table (owner / listed reader / non-reader / no rules) against real git repos.
  • build_filtered_pack excludes the withheld OID and keeps the public one.
  • A real git partial clone through the actual info_refs + upload_pack_excluding: the withheld blob's bytes never arrive, its tree entry and SHA remain.
  • A real incremental git fetch after a partial clone: new objects arrive, the withheld blob stays absent.

Full suite green, fmt and clippy clean.

Out of scope (follow-ups)

  • git-remote-gitlawb partial-clone UX so a non-reader can clone without --filter.
  • Thin-pack negotiation: the filtered serve ignores have/want and re-sends the full object set minus the withheld blobs. Correct, but not bandwidth-optimal.
  • info/refs is intentionally not gated on subtree rules (refs expose commit tips only); noted in code.
  • Replication-path enforcement (Phase 2) is unrelated and still waiting on the A/B replication-semantics call in Path/package-scoped visibility — make "private" a property of a subtree, not a whole repo #18.

Refs #18.

Summary by CodeRabbit

  • New Features
    • Added privacy controls for Git repositories. Blobs and files can now be restricted based on user permissions, allowing authorized users to access full repository content while unauthorized users receive filtered views that preserve structure but exclude private content.

…al clone

upload_pack_excluding emitted a v2 packfile section, but info_refs
advertises v0, so real clients negotiated v0 and rejected the response
with 'expected ACK/NAK, got packfile'. Frame the v0 stateless-rpc shape
instead (NAK, then the pack via side-band-64k when offered).

Add an end-to-end test that stands up info_refs + upload_pack_excluding
and runs a real git partial clone, asserting the withheld blob's bytes
never reach the client while its tree entry and SHA stay visible. A stock
full clone cannot consume the pack (it is not closed under reachability,
so fetch fails the connectivity check); a partial clone is required.
…tion choice

Add a real-git test that partial-clones, pushes a new commit server-side,
then fetches: the new object arrives and the withheld blob stays absent.
This pins down that ignoring have/want negotiation (always sending a
self-contained pack of all refs minus withheld, with NAK) is correct for
both clone and fetch; the only cost is a fetch re-sends the full object
set. Refactor the real-git tests onto a shared server harness and document
the negotiation decision in code and in the plan's follow-ups.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 6, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 3fdb8f5f-023a-45fa-b5d7-f1140fc04eb9

📥 Commits

Reviewing files that changed from the base of the PR and between 85a9711 and 0c8a1b7.

📒 Files selected for processing (1)
  • .gitignore

📝 Walkthrough

Walkthrough

This PR implements blob-level withholding for Git smart-HTTP upload-pack responses. A new visibility_pack module computes withheld blob OIDs based on repo visibility rules, smart_http.rs adds filtered pack generation and serving, and the API handler routes requests through the filtered path when withheld blobs exist.

Changes

Subtree Blob Withholding on Upload-Pack

Layer / File(s) Summary
Blob Visibility Computation
crates/gitlawb-node/src/git/mod.rs, crates/gitlawb-node/src/git/visibility_pack.rs
New visibility_pack module exposes withheld_blob_oids, which enumerates blob OIDs and repo-relative paths via git ls-tree -r, applies visibility_check per (oid, path), and returns OIDs denied at every path occurrence. Includes unit tests for owner, reader, non-reader, and no-rule scenarios.
Filtered Pack Generation and Serving
crates/gitlawb-node/src/git/smart_http.rs
Adds build_filtered_pack to shell out to git rev-list and git pack-objects to build a pack excluding withheld blob OIDs. Adds upload_pack_excluding to serve the filtered pack as a protocol v0 response with optional side-band-64k framing. Includes unit tests validating pack contents and end-to-end tests confirming real git clone --filter=blob:none and incremental git fetch correctly omit withheld blob bytes while preserving tree structure.
Upload-Pack Handler Integration
crates/gitlawb-node/src/api/repos.rs
Handler computes withheld blob OIDs in spawn_blocking, then conditionally routes to smart_http::upload_pack or upload_pack_excluding based on whether any OIDs are withheld. Preserves whole-repo visibility gate and error-class mapping; clarifies that advertisement is unaffected and blob withholding occurs during pack building.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Gitlawb/node#25: Both PRs modify crates/gitlawb-node/src/api/repos.rs's git-upload-pack read handling to enforce path-scoped visibility using visibility_check, with this PR further extending it to filter out withheld blob OIDs via visibility_pack/upload_pack_excluding.

Suggested reviewers

  • kevincodex1

Poem

🐰 With whiskers twitching, the rabbit spins,
Withheld blobs tucked safely within,
Pack-objects hops, git rev-list skips,
Filtered clones from visibility's tips!
Smart-HTTP serves what eyes may see—
Subtrees hidden, yet trees run free. 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: implementing Phase 3 of subtree content withholding, which is the core focus of all file changes (visibility_pack module, upload_pack_excluding handler, and visibility enforcement).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
crates/gitlawb-node/src/api/repos.rs (1)

398-411: ⚡ Quick win

Consider fast-path when no subtree rules apply.

withheld_blob_oids runs git ls-tree -r for every ref on every upload-pack request, even for fully public repos with no visibility rules. This is unnecessary overhead when there are no subtree restrictions.

A quick check before the tree walk could skip the work entirely:

+    // Fast path: if the repo is public and has no visibility rules, no blobs are withheld.
+    let withheld = if record.is_public && rules.is_empty() {
+        std::collections::HashSet::new()
+    } else {
+        visibility_pack::withheld_blob_oids(
+            &disk_path,
+            &rules,
+            record.is_public,
+            &record.owner_did,
+            caller,
+        )
+        .map_err(|e| AppError::Git(e.to_string()))?
+    };

Alternatively, push this optimization into withheld_blob_oids itself.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/repos.rs` around lines 398 - 411, Currently every
upload-pack calls visibility_pack::withheld_blob_oids which does expensive git
tree walks even when no subtree rules apply; short-circuit this by checking
before the call (e.g., if record.is_public is true AND rules is empty/no subtree
rules) and skip calling withheld_blob_oids, directly calling
smart_http::upload_pack, or alternatively add an early-return fast-path inside
withheld_blob_oids itself to return an empty Vec when rules indicate no
filtering; update the code paths around withheld_blob_oids and the subsequent
conditional that uses withheld.is_empty() (the smart_http::upload_pack vs
upload_pack_excluding branch) so behavior is unchanged when filtering is needed.
crates/gitlawb-node/src/git/smart_http.rs (1)

128-169: ⚖️ Poor tradeoff

Blocking I/O in async context.

build_filtered_pack uses synchronous std::process::Command but is called from the async upload_pack_excluding. On large repositories, git rev-list and git pack-objects can take significant time, blocking the tokio worker thread.

Consider wrapping in tokio::task::spawn_blocking or switching to tokio::process::Command with async I/O to avoid starving other tasks.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/git/smart_http.rs` around lines 128 - 169,
build_filtered_pack performs blocking std::process::Command calls (git rev-list
/ git pack-objects) but is invoked from async upload_pack_excluding, which can
block the tokio runtime; fix by moving the blocking work into a spawn_blocking
closure or converting the function to async and using tokio::process::Command
with async I/O. Specifically, either (A) leave build_filtered_pack as-is but
call it via tokio::task::spawn_blocking from upload_pack_excluding and await the
JoinHandle, or (B) change build_filtered_pack to async and replace
std::process::Command usages with tokio::process::Command and async stdin/stdout
handling so it no longer blocks the runtime; update return signatures and error
propagation accordingly while keeping the same behavior for collecting keep oids
and returning the packed Vec<u8>.
crates/gitlawb-node/src/git/visibility_pack.rs (1)

25-31: ⚡ Quick win

Consider adding a timeout to prevent indefinite blocking.

The synchronous git ls-tree command has no timeout. On a corrupted or malicious repository, this could block the async runtime thread indefinitely. Since this runs per-ref and is called from the git_upload_pack async handler, a hung git process would stall the request.

Consider wrapping with std::process::Command timeout or switching to async tokio::process::Command with tokio::time::timeout.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/git/visibility_pack.rs` around lines 25 - 31, The git
ls-tree invocation (creating `listing` via std::process::Command with `refname`
and `repo_path`) can block the async runtime; replace the synchronous call with
an async spawn using tokio::process::Command and wrap the await in
tokio::time::timeout (e.g., Duration::from_secs(5-10)). Specifically, change the
code that builds and awaits `Command::output()` to use
tokio::process::Command::new("git")...output().await inside
tokio::time::timeout(...). If the timeout elapses, handle it by logging or
treating it like a non-successful listing (skip this ref) and
propagate/contextualize errors similarly to the existing .context("git ls-tree
-r failed") semantics so callers like the async git_upload_pack handler are not
blocked indefinitely.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/superpowers/plans/2026-06-05-phase3-subtree-content-withholding.md`:
- Line 712: There is a stray fenced code block marker (```) in the markdown (the
fenced code block delimiter itself) with no language label which triggers
markdownlint MD040; fix it by either removing the stray fence or adding an
appropriate language specifier after the opening backticks (e.g., ```json,
```bash, etc.) so the fenced code block is valid and the linter warning is
resolved.
- Around line 102-108: Update the documented protocol framing from "v2 packfile
section" to the actual v0 framing used by the implementation: change the
description that mentions emitting `packfile\n`, `pkt_line`, and driving `git
upload-pack --stateless-rpc` with `GIT_PROTOCOL=version=2` to instead describe
the v0 sideband framing (sideband-64k with `0x01` for pack data, `0x02` for
progress if present, pack payload beginning with `PACK...`, pkt-line chunking
with `0x01` prefixes, and termination with the `0000` flush), and remove or
correct any references to v2-specific control lines or flags so the text matches
the current implementation's v0 contract.

---

Nitpick comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 398-411: Currently every upload-pack calls
visibility_pack::withheld_blob_oids which does expensive git tree walks even
when no subtree rules apply; short-circuit this by checking before the call
(e.g., if record.is_public is true AND rules is empty/no subtree rules) and skip
calling withheld_blob_oids, directly calling smart_http::upload_pack, or
alternatively add an early-return fast-path inside withheld_blob_oids itself to
return an empty Vec when rules indicate no filtering; update the code paths
around withheld_blob_oids and the subsequent conditional that uses
withheld.is_empty() (the smart_http::upload_pack vs upload_pack_excluding
branch) so behavior is unchanged when filtering is needed.

In `@crates/gitlawb-node/src/git/smart_http.rs`:
- Around line 128-169: build_filtered_pack performs blocking
std::process::Command calls (git rev-list / git pack-objects) but is invoked
from async upload_pack_excluding, which can block the tokio runtime; fix by
moving the blocking work into a spawn_blocking closure or converting the
function to async and using tokio::process::Command with async I/O.
Specifically, either (A) leave build_filtered_pack as-is but call it via
tokio::task::spawn_blocking from upload_pack_excluding and await the JoinHandle,
or (B) change build_filtered_pack to async and replace std::process::Command
usages with tokio::process::Command and async stdin/stdout handling so it no
longer blocks the runtime; update return signatures and error propagation
accordingly while keeping the same behavior for collecting keep oids and
returning the packed Vec<u8>.

In `@crates/gitlawb-node/src/git/visibility_pack.rs`:
- Around line 25-31: The git ls-tree invocation (creating `listing` via
std::process::Command with `refname` and `repo_path`) can block the async
runtime; replace the synchronous call with an async spawn using
tokio::process::Command and wrap the await in tokio::time::timeout (e.g.,
Duration::from_secs(5-10)). Specifically, change the code that builds and awaits
`Command::output()` to use tokio::process::Command::new("git")...output().await
inside tokio::time::timeout(...). If the timeout elapses, handle it by logging
or treating it like a non-successful listing (skip this ref) and
propagate/contextualize errors similarly to the existing .context("git ls-tree
-r failed") semantics so callers like the async git_upload_pack handler are not
blocked indefinitely.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 5a58718e-8f0d-4eb9-ad32-a28944a9f957

📥 Commits

Reviewing files that changed from the base of the PR and between 6abaf1d and b0af815.

📒 Files selected for processing (5)
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/git/mod.rs
  • crates/gitlawb-node/src/git/smart_http.rs
  • crates/gitlawb-node/src/git/visibility_pack.rs
  • docs/superpowers/plans/2026-06-05-phase3-subtree-content-withholding.md

Comment thread docs/superpowers/plans/2026-06-05-phase3-subtree-content-withholding.md Outdated
Comment thread docs/superpowers/plans/2026-06-05-phase3-subtree-content-withholding.md Outdated
Move the two blocking git shell-outs in the filtered upload-pack path off
the async worker thread, matching the tokio::process / spawn_blocking usage
already in this file: build_filtered_pack (rev-list + pack-objects) and
withheld_blob_oids (per-ref ls-tree) now run inside spawn_blocking so a large
repo cannot stall the tokio runtime. Behavior is unchanged.

Also fix the Task 0 findings block in the Phase 3 plan: it still recorded v2
packfile framing, which is the exact path that failed against a real client
and was corrected to v0. The block now documents the shipped v0 contract.
Drop a stray trailing code fence flagged by markdownlint (MD040).

The speculative ls-tree timeout and the public/no-rules fast-path from the
review are intentionally left out: the timeout guards against adversarial
repos we do not yet host, and the fast-path is a micro-optimization not worth
the extra branch right now.
Copy link
Copy Markdown
Contributor

@kevincodex1 kevincodex1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the plans under doc. i think its not necessary to be committed


Run:
```bash
cd "$(mktemp -d)" && export FIX=$PWD
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this docs? I think we should not commit this

kevincodex1 asked to keep the superpowers planning docs out of the repo. The
Phase 3 plan was scaffolding for this change, not something the project needs
to carry. Removing it leaves only the code and tests in the PR.
@beardthelion
Copy link
Copy Markdown
Contributor Author

beardthelion commented Jun 7, 2026

Good catch, removed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants