Workspace v3.0: S3 hardening, remote git, typed errors, SDK alignment#32
Merged
Conversation
S3WorkspaceBackend::read_text previously buffered the full GetObject response into memory — a stray read on a multi-GB object would OOM the agent process. Add an S3BackendConfig.max_read_bytes ceiling (default 10 MiB) and inspect Content-Length on the response before consuming the body; oversized objects are rejected with a clear error and never buffered. Responses without Content-Length are refused rather than risking unbounded reads.
S3 has no atomic read-modify-write — the prior edit/patch flow read a file then wrote it back, silently overwriting concurrent writers (last-writer-wins). Add an optional WorkspaceFileSystemExt trait with read_text_with_version + write_text_if_version, implemented on the S3 backend via ETag + If-Match on PutObject. Map HTTP 412 to a typed WorkspaceVersionConflict (anyhow-downcastable). The edit and patch tools now route through WorkspaceServices::read_for_edit / write_for_edit, capture the ETag during the read, and surface a "Concurrent modification detected" error on conflict so the model can re-read and retry instead of clobbering. Backends without versioning (local, future plain backends) keep the old plain-write semantics — the helper transparently falls through.
Object storage has no native search, so grep and glob were previously hidden whenever the S3 backend was wired. Add WorkspaceSearch on S3WorkspaceBackend via LIST + GET + regex, off by default to keep the cost surprise out of the default path. Hosts opt in with S3BackendConfig::enable_search(true). Two hard ceilings bound the per-call API cost: max_objects_scanned caps the LIST/pagination scope (default 500), and max_grep_bytes_per_object skips oversized bodies before downloading (default 1 MiB). Either ceiling sets WorkspaceGrepResult::truncated so callers can detect incomplete scans. Glob keeps the local backend's recursion convention — "*.rs" stays at the immediate level, "**/*.rs" recurses — which required filtering nested keys manually since the `glob` crate's Pattern::matches crosses '/' by default.
Two small ergonomics + observability changes on the S3 backend that hosts have been blind to: 1. S3 list_dir on a missing prefix used to silently return Ok(empty), so a typo'd path looked identical to a real empty directory. Track the total LIST response count (including the prefix marker that we skip from entries) and bail with "S3 path not found" on non-root paths that observed zero entries. Paths with only a zero-byte directory marker still return Ok(empty), matching the S3-tooling convention for empty directories. 2. Wrap every S3 SDK call (GET, PUT, conditional PUT, two flavours of LIST) with a `tracing::debug!` event carrying structured fields (op, bucket, target, bytes, outcome, duration_ms). Hosts that want to meter S3 cost can subscribe at DEBUG level without the backend taking a dependency on any metrics framework. Zero-cost when the level is disabled.
…workspaces
Phase 4.2 of the non-local workspace hardening track. Object storage
cannot host a .git directory, so until now the git tool was hidden on
S3 sessions and the model lost branch / commit / diff awareness. This
adds a small HTTP/JSON client (per the Phase 4.1 RFC) that delegates
git operations to a host-operated gitserver, plus a one-liner factory
for attaching it on top of any existing WorkspaceServices.
let ws = WorkspaceServices::s3(s3_cfg)
.with_remote_git(RemoteGitBackendConfig::new(url, repo_id)
.bearer_token(token))?;
Implements WorkspaceGit (status, log, list_branches, create_branch,
checkout, diff, list_remotes, is_repository) and
WorkspaceGitStashProvider (list_stashes, stash). WorkspaceGitWorktreeProvider
is intentionally not implemented — worktrees are a local-FS concept
that does not map to a remote service.
Recoverable 409 / 422 responses surface as a typed RemoteGitConflict
(anyhow-downcastable), the same pattern as WorkspaceVersionConflict
on the S3 path. Bearer token + tls auth; mTLS reserved for a follow-up
and returns a clear error at construction if requested.
Client-side ceilings (request_timeout 30s, max_log_entries 200,
max_diff_bytes 1 MiB) plus tracing::debug! per call with the same
field shape used by emit_s3_call_event, so a single subscriber meters
both backends.
23 new unit tests via wiremock cover every op shape, the conflict
mapping for each 409 code, error mapping for 401/404/500, body
truncation in both directions, and an end-to-end GitTool integration
test that drives the actual git tool through a wiremock-backed
RemoteGitBackend.
The tools::builtin::git module is bumped to pub(crate) so the
integration test in workspace::remote_git can reach GitTool.
wiremock is added as a dev-dependency only.
The Rust core gained max_read_bytes / search_enabled / max_objects_scanned / max_grep_bytes_per_object on S3BackendConfig and the entire RemoteGitBackend across Phase 1-4, but the language SDKs were still locked to the pre-hardening surface — JS / Python users could not raise the read ceiling, enable degraded grep, or attach remote git. This commit threads all of it through. Node: - JsS3BackendConfig gains four optional Phase 1-3 fields. - New JsRemoteGitBackendConfig POJO with the full RemoteGitBackendConfig field set (base URL, repo id, bearer token, mTLS placeholders, timeout, diff/log caps). - SessionOptions gains a top-level remoteGit field that decorates whatever workspaceBackend was provided; passing remoteGit without workspaceBackend errors clearly rather than silently no-op'ing. - index.d.ts updated to match the napi-generated shape. - 3 new unit tests: Phase 1-3 fields thread to core, remoteGit attaches on top of S3, standalone remoteGit raises a typed error. Python: - S3WorkspaceBackend class gains four new constructor args mirroring the Node shape. New RemoteGitBackendConfig pyclass and a remote_git attribute on SessionOptions. - BackendKind enum boxes the S3 variant after the new fields pushed it past the large_enum_variant clippy threshold. - 3 parallel unit tests verifying the same behaviour as the Node side. Clippy -D warnings clean across core / Node SDK / Python SDK.
Previously RemoteGitBackend::new errored out as "mTLS not yet implemented" whenever client_cert_pem or client_key_pem was set, forcing every deployment to bearer-token-only auth. Wire the PEM files into reqwest::Identity::from_pem so gitservers requiring client cert auth (defence-in-depth deployments, hostile-tenant environments) can be reached. Implementation: - New load_mtls_identity helper reads both PEM files, concatenates them with a newline separator, and parses via Identity::from_pem. All I/O and parse errors include the source path for debuggability. - Both files must be present; setting only one returns a clear error naming the missing field, rather than silently degrading to bearer or surprising at TLS handshake. - The "no auth" warn now covers no-bearer + no-mTLS together so enabling mTLS no longer trips the localhost warning. Tests (3 new): - mtls_requires_both_cert_and_key: half-pair → error mentions field - mtls_rejects_invalid_pem_blob: garbage PEM → error mentions path - mtls_accepts_self_signed_pair_from_rcgen: rcgen-generated cert + PKCS#8 key → backend constructs successfully. rcgen is dev-only. SDK docs (Node + Python) updated to reflect the implemented state. RFC §6 in apps/docs (root repo) updated separately.
S3 grep previously fetched candidate objects sequentially, so scanning N matches took N × round-trip latency — with the default 500-object cap on a typical 50 ms-RTT region that is ~25 s of wall-clock just on GETs. Pipeline the GETs with futures::stream::buffer_unordered so up to `search_concurrency` (default 8) downloads are in flight at once, bringing the worst case down to ~N/8 RTTs. Implementation: - New search_concurrency config field on S3BackendConfig + backend struct, builder and accessor. Defaults to 8; 0 resets to default at backend construction. - grep() now runs in three phases: filter the listing (sequential, no I/O), fan out GET + per-file regex via buffer_unordered, then sort matches by workspace path and accumulate output until max_output_size is hit. Output ordering is deterministic regardless of S3 response order — callers and tests see the same layout every run. - truncation accounting unchanged: scan-ceiling truncation OR output- size truncation both set WorkspaceGrepResult.truncated. SDK exposure (Node + Python) mirrors the rest of the Phase 1-3 fields: - Node JsS3BackendConfig.searchConcurrency, threaded through s3_config_to_core - Python S3WorkspaceBackend constructor gains search_concurrency, threaded through to_core 3 new core tests on the config flow (default applies, 0 resets, override). Existing 25 remote_git tests + Node 19 + Python 12 still green; clippy -D warnings clean across core / Node SDK / Python SDK.
…fields WorkspaceServices::with_remote_git previously went through WorkspaceServicesBuilder to assemble the decorated services. The builder does not expose `local_root` (it's only set by WorkspaceServices::local()'s direct constructor), so every call to `local().with_remote_git()` was silently losing local_root — breaking the deprecated `ToolContext::resolve_path` path and any future code that branches on `local_root.is_some()`. More broadly, any field added to WorkspaceServices going forward would be silently dropped by the decorator, since the builder route opts every unknown field into its default. Fix: add `WorkspaceServices::with_git_provider(&self, git, git_stash)` which produces the new services via an explicit struct literal listing every field. Adding a new field to WorkspaceServices now causes a compile error in this method, forcing the author to make an explicit decision about whether a git-provider swap preserves it. The same pattern can be reused for any future decorator (with_local_search, with_command_runner, ...). `with_remote_git` is now a one-liner over `with_git_provider`. Per RFC §8, git_worktree is explicitly reset to None — worktrees and the git provider are the same domain; routing one to a remote service while keeping the other local would surface inconsistent state to the model. Regression test asserts local_root, command_runner, and search all survive `with_remote_git` while git is replaced and git_worktree becomes None.
… cap RemoteGitBackend::diff previously called post_json, which deserialises the entire HTTP body before any client-side size check kicks in. A gitserver returning a 1 GiB diff JSON would be fully buffered and parsed before max_diff_bytes was applied — turning the supposed display cap into an OOM vector. Two-layer fix: 1. **Eager Content-Length rejection.** When the server advertises a body larger than the hard cap, refuse to consume any bytes and return a typed "Content-Length exceeds client cap" error. 2. **Stream-bound accumulation.** When Content-Length is absent or the server lies, the body is read via reqwest::Response::bytes_stream() and the accumulator aborts once it exceeds the cap. Memory is bounded at cap + one chunk. Hard cap = `max_diff_bytes * 4` (floor 64 KiB). The 4× factor leaves room for legitimate large-but-over-soft diffs that still want to reach the post-decode display truncation; anything past that is treated as a misbehaving server. Soft truncation (`max_diff_bytes` → display cut + "[truncated by client]" marker) is unchanged so the existing test `diff_enforces_client_max_diff_bytes` still passes verbatim. Refactor: `map_error_response` becomes a sync function taking the response body as `&str`. Both `post_json` and `post_unit` extract the text themselves before delegating, and the new `post_streamed` helper shares the same error path — keeping the conflict-vs-generic mapping in one place across three call shapes. 2 new wiremock tests cover layer 1 (oversized Content-Length) and layer 2 (oversized chunked body).
…ence Behavioural invariants for WorkspaceFileSystem and the optional WorkspaceFileSystemExt CAS surface previously lived only in commit messages, code comments, and the maintainer's head. A new backend (GCS, container, browser) would have to reverse-engineer them from existing implementations, and there was no mechanism to catch silent drift in current backends. Introduces workspace::conformance (test-only, pub(crate)) with two public entry points: * assert_filesystem_conformance — required surface: read-after-write roundtrip, read of nonexistent path errors, write overwrites existing, write creates parent path components, list_dir on root succeeds, list_dir sees just-written entries, list_dir on nonexistent path errors (matching the local backend's behaviour we introduced in Phase 3.1). * assert_filesystem_ext_conformance — optional CAS surface: version token is non-empty, matching-version write succeeds, stale-version write yields a downcastable WorkspaceVersionConflict, empty expected_version is rejected. The suite is exercised against two backends: * InMemoryFileSystem — a new HashMap-backed reference impl with a single mutex covering both files and the version counter so the CAS is genuinely atomic. Serves as the smallest possible backend that satisfies the contract, doubling as documentation for what real backends have to do. * LocalWorkspaceBackend — proves the contract is implementable over real I/O. S3 is intentionally out of scope here because a full LIST/GET/PUT mock is heavyweight; that's a separate piece of test infrastructure to be added when worth the cost. WorkspaceCommandRunner, WorkspaceSearch, and the WorkspaceGit family are also out of scope for the first cut — their semantics are too implementation-defined to capture in a single contract.
Two near-identical in-memory mocks coexisted: VersionedMemoryFs in
workspace/mod.rs tests (introduced in Phase 1.2 for the
read_for_edit/write_for_edit suite) and InMemoryFileSystem in
workspace/conformance.rs (introduced in Phase 6.3 as the reference
backend for the conformance suite). They served overlapping purposes
but were maintained independently — and VersionedMemoryFs had a
genuine bug, holding two separate mutexes for state and counter, so a
concurrent writer slipping between the two locks could defeat the
"stale-version yields conflict" test it was meant to support.
Delete VersionedMemoryFs and route the read_for_edit/write_for_edit
tests through InMemoryFileSystem. Net effect:
* One mock to maintain instead of two.
* The mock that backs these tests now satisfies the conformance suite,
so drift between "what tests exercise" and "what the contract
documents for new backends" is impossible.
* CAS atomicity is genuinely atomic in the fixture (single mutex over
both files map and version counter), matching what real backends
must do.
* Tests no longer hardcode literal version strings ("v0", "v-other").
They capture the auto-generated version at read time and assert
structural properties (version is non-empty / actual differs from
expected), so the version scheme of the mock can change without
breaking the assertions.
Phase 7.2 of the post-self-critique cleanup. No production code change.
…anges) Phase 7.3.a — first half of the typed-error refactor. This commit adds the typed surface without touching any trait method or callsite, so it compiles green against today's anyhow::Result-returning impls. The companion Phase 7.3.b commit will flip every trait signature to WorkspaceResult and is the v3.0.0 breaking change. What lands: * `WorkspaceError` — `#[non_exhaustive]` enum with six structured variants (NotFound, VersionConflict, RemoteGitConflict, InvalidArgument, Timeout, Unsupported) plus a Backend(anyhow::Error) catch-all for unmapped failures. * `WorkspaceResult<T>` type alias. * `#[from]` impls for the two existing public conflict structs (WorkspaceVersionConflict, RemoteGitConflict) so backends can build the enum directly without going through anyhow. * `WorkspaceError::from_anyhow(err)` constructor that **preserves** the typed variant when the source anyhow::Error originally wrapped one of the known conflict structs — the plain `Into` path drops to `Backend(_)` because anyhow erases the source type at the value level. This is the migration bridge for Phase 7.3.b. * Re-exported at crate root alongside the other workspace public types. 8 unit tests cover the round-trip behaviour, the `?` lift to anyhow::Result via the blanket Error impl, and Display output for every variant. 1627 lib tests green; clippy -D warnings clean. The next commit (7.3.b) will: * Change WorkspaceFileSystem, WorkspaceFileSystemExt, WorkspaceCommandRunner, WorkspaceSearch, WorkspaceGit, WorkspaceGitStashProvider, WorkspaceGitWorktreeProvider trait methods to return WorkspaceResult<T>. * Update Local, S3, RemoteGit, InMemoryFileSystem implementations. * Update WorkspaceServices helpers (read_for_edit, write_for_edit, run_with_timeout) and every tool that calls them. * Bump crate version to 3.0.0. Splitting the migration in two preserves bisect at every commit.
BREAKING. Phase 7.3.b — the cascade following the additive 7.3.a type introduction. WorkspaceFileSystem and WorkspaceFileSystemExt now return WorkspaceResult<T> (= Result<T, WorkspaceError>) instead of anyhow::Result<T>. Backend implementations emit typed variants — NotFound, VersionConflict, InvalidArgument, Unsupported — where the failure category is known; opaque failures fall into the Backend escape hatch. Migrated: * trait surface: WorkspaceFileSystem (read_text, write_text, list_dir) and WorkspaceFileSystemExt (read_text_with_version, write_text_if_version) * WorkspaceServices helpers: read_for_edit, write_for_edit. The generic run_with_timeout is now polymorphic in the error type via `E: From<anyhow::Error>` so the same helper works for futures that return WorkspaceResult (the migrated fs paths) and futures that return anyhow::Result (the search/git tools, not migrated this release). * backend impls: LocalWorkspaceBackend, S3WorkspaceBackend (incl. get_object_text helper, classify_get_error, classify_list_error, map_put_error now all return the typed enum / variants directly), InMemoryFileSystem (conformance reference impl). * test fixtures: PlainFs, EmptyFs in workspace tests; MemoryWorkspace in the integration test; TestWorkspaceFs in agent_api tests; MemoryWorkspaceFs in tools/mod tests; AlwaysConflictFs in the edit conflict-surface test. * tool layer: edit / patch swap their downcast_ref calls for direct enum matching (`matches!(e, WorkspaceError::VersionConflict(_))`). The `?` operator paths continue to work because thiserror's derived Error impl lifts WorkspaceError into anyhow via the blanket impl. * conformance suite: assertion that CAS conflicts produce WorkspaceVersionConflict is now matches!(WorkspaceError::VersionConflict(_)). NOT migrated this release (intentional — kept on anyhow::Result): * WorkspaceCommandRunner, WorkspaceSearch, WorkspaceGit, WorkspaceGitStashProvider, WorkspaceGitWorktreeProvider. * WorkspacePathResolver and the internal path-normalisation helpers. These are stable as anyhow::Result today and would all be additive (non-breaking) future migrations: their callers already lift any typed error variant they emit through #[from] today, and a future release can swap their signatures to WorkspaceResult without breaking SDK or downstream callers. crate version bump: 2.6.0 → 3.0.0 in core/Cargo.toml; SDK Cargo.toml deps bumped to match. 1627 lib tests + 19 Node SDK tests + 12 Python SDK tests all green; clippy -D warnings clean.
v3.0 introduced the typed WorkspaceError enum on the workspace trait
surface, but that improvement died at the tool boundary: every
failure ended up as a plain `ToolOutput::error(string)`, so SDK
callers were forced back to regex-matching the output to detect e.g.
concurrent-modification conflicts.
This commit threads typed error info all the way through:
WorkspaceError → ToolOutput.error_kind → ToolResult.error_kind
↓ ↓
AgentEvent::ToolEnd.error_kind → ToolCallResult.error_kind
↓ ↓
Node `errorKindJson` + `ToolErrorKind` union ↓
Python `error_kind_json` + parsed `error_kind` dict ↓
SDK callers switch/match on `.type` — no string scan needed.
New types:
* `ToolErrorKind` (#[non_exhaustive], serde-tagged on `type`,
six variants: version_conflict / remote_git_conflict / not_found /
invalid_argument / unsupported / timeout).
* `ToolErrorKind::from_workspace_error()` constructor mapping each
known `WorkspaceError` variant; `Backend(_)` returns `None` so the
catch-all stays untyped.
Wiring (all additive, no breaking API change):
* `ToolOutput`/`ToolResult`/`ToolCallResult` gain optional
`error_kind` field; serde skips it when None.
* `AgentEvent::ToolEnd` gains `error_kind` so streaming consumers see
it too.
* `edit` and `patch` tools populate `error_kind` via the mapper. The
human-readable retry hint in `content` is unchanged — the model
keeps seeing exactly what it saw before.
* Three other `ToolOutput { ... }` literals (bash, skill x2) plus
one `ToolResult { ... }` literal in the agent runtime updated to
include `error_kind: None`. Existing `ToolOutput::error()` /
`ToolOutput::success()` factories also default it to None.
SDK exposure:
* Node: new `errorKindJson: Option<String>` on JsToolResult and
AgentEvent. New `ToolErrorKind` discriminated-union TypeScript type
in index.d.ts. `tool_result_from_core` and the AgentEvent mapper
serialise the typed value via `serde_json::to_string`.
* Python: new `error_kind_json` (raw) and `error_kind` (parsed dict
via `json_string_to_py`) getters on PyToolResult and PyAgentEvent.
Tests (5 new):
* core: edit conflict-surface test now asserts both the typed enum
match and the serialised wire shape (`type`, `path`, `expected`,
`actual`).
* node SDK: tool_result_from_core round-trips a VersionConflict +
produces None on success.
* python SDK: matching pair of round-trip + none-on-success tests.
1627 core / 21 node / 14 python lib tests green; clippy -D warnings
clean across all three crates.
Phase 9 — release prep cap stone for the 8-phase workspace rework. * Bump SDK package metadata to 3.0.0 across: - sdk/node/package.json (root version + 6 native-deps) - sdk/node/package-lock.json (root + nested) - sdk/node/examples/package-lock.json - sdk/python/pyproject.toml All artifacts now pass `./check-version.sh 3.0.0`. * CHANGELOG: promote [Unreleased] to [3.0.0] with date 2026-05-20. * README: add a "Typed Tool Errors (v3.0+)" section under the Remote Git Backend block, showing the Rust match / Node discriminated union / Python dict-match patterns side by side. Documents the six initial ToolErrorKind variants and the non_exhaustive escape hatch for future additions. Release sanity checks (all green): * `cargo test --lib` (no features): 1599 passed * `cargo test --lib --features s3`: 1627 passed * `cargo test --lib --features ahp`: 1612 passed * `cargo test --lib --all-features`: 1644 passed * `cargo fmt --check` across core / sdk-node / sdk-python: clean * `cargo clippy -D warnings --lib --tests` previously green across all three crates in the prior Phase 8 commit * `cargo doc --no-deps --features s3`: builds (warnings are pre-existing rustdoc::invalid_rust_codeblocks, not regressions) * `./check-version.sh 3.0.0`: all artifacts aligned
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
v2.6 → v3.0 release of the workspace subsystem. 16 commits across Phase 1–9 of an incremental hardening track, ending with a single breaking-but-mechanical signature change on the workspace fs trait surface.
Phase index
4f667cb33671887cd7247dfdebcb2d1651fda83de41017d38c6027883f1dec5dad37ce03d36c57bf17d099d742ce0c2377ac1a0ce2dc8436Breaking change
Only one, in
e0c2377:WorkspaceFileSystemandWorkspaceFileSystemExttrait methods now returnWorkspaceResult<T>(=Result<T, WorkspaceError>) instead ofanyhow::Result<T>.?to lift errors intoanyhow::Resultkeep workingunchanged via the blanket
From<WorkspaceError> for anyhow::Errorimpl.err.downcast_ref::<WorkspaceVersionConflict>()should switch to
matches!(err, WorkspaceError::VersionConflict(_)).The other 5 traits (CommandRunner, Search, Git, GitStashProvider, GitWorktreeProvider)
still return
anyhow::Resultso backend implementors that don't touchthe fs traits don't need to update anything.
Test plan
cargo test --lib(no features): 1599 passedcargo test --features s3 --lib: 1627 passedcargo test --features ahp --lib: 1612 passedcargo test --all-features --lib: 1644 passedcargo fmt --checkacross core + Node SDK + Python SDK: cleancargo clippy -D warnings --lib --tests: clean across all three cratescargo doc --no-deps --features s3: builds (only pre-existing warnings)./check-version.sh 3.0.0: all artifacts alignedRelease flow
Once this PR merges, tag
v3.0.0on main →.github/workflows/release.ymlfires:publish-crate→ crates.iopublish-node→ npm (6 native targets)publish-python→ PyPI (maturin)github-releasewith auto-generated notesOut of scope
Surfaced in self-critique but deliberately deferred:
WorkspaceResult(additive when it happens)