chore: merge upstream/main into native-e2e-expansion, resolve conflicts by Copilot · Pull Request #4 · JerrettDavis/headroom

Copilot · 2026-04-28T17:00:57Z

Upstream chopratejas/headroom main had diverged significantly (Rust rewrite, SmartCrusher port, diff_compressor retirement), leaving the native-e2e-expansion PR with merge conflicts.

Conflict resolutions

.github/workflows/ci.yml — Kept pytest-cov in the macOS native wrapper pip install; the downstream pytest --cov invocation and Codecov upload step require it (upstream dropped it incidentally).
tests/test_transforms/test_smart_crusher_bugs.py — Accepted upstream's removal of the TestHelperCoverage Python test class; the underlying Python helpers (_percentile_linear, _detect_sequential_pattern, etc.) were deleted with the Python SmartCrusher implementation in Stage 3c.1b — Rust parity fixtures and bug{1–4}_* tests in crates/headroom-core/ cover the same invariants.

Bootstrap the Rust port of Headroom. Additive only — no existing Python code modified. Ships the workshop, not the widgets. Layout Cargo.toml (workspace) + rust-toolchain.toml crates/headroom-core — transform library, stub only crates/headroom-proxy — axum binary, /healthz only crates/headroom-py — PyO3 cdylib, exposes headroom._core.hello() crates/headroom-parity — Rust-vs-Python oracle harness + parity-run CLI Tooling Makefile: test, test-parity, bench, build-proxy, build-wheel, fmt, lint .github/workflows/rust.yml: test, wheels (linux/mac), audit, parity-nightly deny.toml for cargo-deny Parity corpus tests/parity/recorder.py + scripts/record_fixtures.py 125 recorded fixtures across 5 leaf transforms (ccr, tokenizer, log_compressor, diff_compressor, cache_aligner) Docs RUST_DEV.md — developer setup and workspace reference docs/spec/022-rust-migration.md — migration plan and stage breakdown .gitignore: whitelist scripts/record_fixtures.py; ignore target/ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Builds out crates/headroom-proxy from a /healthz stub into a transparent reverse proxy: catch-all router that forwards every method/path/query to --upstream verbatim, streaming both request and response bodies through reqwest without buffering. Adds clap-based config (CLI + env), thiserror error type with sane upstream-status mapping, JSON tracing-subscriber logging, and graceful shutdown. The library surface (build_app, AppState, Config) is reused by the integration tests.

Implements RFC 7230 6.1 hop-by-hop filtering on both request and response sides (Connection, Keep-Alive, Proxy-Authenticate, Proxy-Authorization, TE, Trailers, Transfer-Encoding, Upgrade), plus the additional headers listed inside any incoming Connection: header. Injects X-Forwarded-For (appending to existing value if any), X-Forwarded-Proto, X-Forwarded-Host, and X-Request-Id. The proxy module wires these in for both HTTP and WS.

When the catch-all sees an Upgrade: websocket request, hand it to the ws module: axum upgrades the client side, tokio-tungstenite connects to the upstream (rewriting http->ws / https->wss while preserving path + query), and two pumps shovel messages until either side closes. Forwarded headers exclude what tungstenite manages (Host, Upgrade, Connection, Sec-*) but preserve Authorization, Sec-WebSocket-Protocol, etc. Supports text, binary, ping, pong, and close frames in both directions.

/healthz returns 200 unconditionally (own health). /healthz/upstream proxies a GET to the upstream's /healthz and returns 200 when reachable + 2xx, 503 otherwise. Both endpoints are intercepted in axum and never forwarded; documented in RUST_DEV.md as reserved paths.

15 integration tests across five suites that spin up the proxy on an ephemeral port pointed at a per-test mock upstream: - integration_http: all 7 methods round-trip with body, status passthrough for 404/500/502, query strings preserved, 1MB POST streams through. - integration_sse: a 10-event in-process hyper SSE upstream emits at 50ms cadence; chunks reach the client with max gap < 500ms (loose CI bound) and a client disconnect propagates to the upstream within 2s. - integration_ws: 5 text + 5 binary messages echo through a tungstenite upstream byte-equal; client-initiated close propagates. - integration_headers: hop-by-hop strip both directions, X-Forwarded-* injection, X-Forwarded-For appends to existing value, multi-valued response headers preserved. - integration_body: 5MB POST round-trips byte-equal; streaming response yields first byte before the upstream finishes sending. - integration_health: own /healthz always 200; /healthz/upstream is 200 when upstream healthy and 503 when down. The Sec-WebSocket-Protocol forwarding is exercised implicitly by the WS tests via tungstenite handshake. The harness lives at tests/common/mod.rs and is shared by every integration suite.

…rs (chopratejas#31) Adds `HEADROOM_QDRANT_URL`, `_HOST`, `_PORT`, `_API_KEY`, `_HTTPS`, `_PREFER_GRPC`, `_GRPC_PORT` support across the memory stack: - `headroom/memory/qdrant_env.py`: shared resolver helper with explicit-arg > env > default precedence (URL wins over host/port; booleans parsed via standard truthy set). - `memory/easy.py`, `backends/{mem0,direct_mem0}.py`, `proxy/memory_handler.py`: call the resolver so `Memory(backend="qdrant-neo4j")`, `Mem0Config`, and the proxy's `MemoryConfig` all honor the same env keys. - `proxy/models.py` + `proxy/server.py`: `ProxyConfig` picks up the same keys so hosted Qdrant (e.g. Qdrant Cloud) works without code changes. - `cli/proxy.py`: adds `--memory-qdrant-{url,host,port,api-key}` flags that override the env when present. - `tests/test_memory/test_qdrant_env.py`: unit coverage for precedence, URL-vs-host/port, boolean parsing, and unset defaults. - `CHANGELOG.md`: documented under [Unreleased] / Added. Explicit constructor arguments still win; unset env keeps the existing localhost:6333 defaults, so this is backwards-compatible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maturin>=1.5 requires -m to point to Cargo.toml, not pyproject.toml. Fixes wheel build job failure in CI (all three matrix targets). Also switches to manifest-path: action param for cleaner workflow syntax. Applies same fix to Makefile build-wheel and develop targets.

Bug 1 (HIGH) health.rs: Url::join('healthz') used relative resolution, stripping non-trailing-slash base paths. Fixed with set_path('/healthz'). Bug 2 (HIGH) main.rs: graceful_shutdown_timeout was configured and logged but never enforced. Now sleeps for the configured duration after signal before axum exits, giving in-flight LLM streams time to drain. Bug 3 (MEDIUM) websocket.rs: WS pump half-close could hang forever if close() on one side failed. Replaced tokio::join! on async blocks with spawned tasks + CancellationToken so either direction cancels the other. Bug 4 (MEDIUM) proxy.rs/websocket.rs: URL path-join logic was copy-pasted verbatim in two places. Extracted to join_upstream_path() helper; websocket now calls it instead of duplicating the 15-line block. Bug 5 (MEDIUM) proxy.rs: mid-stream upstream errors were silently swallowed by Body::from_stream. Added a .map() wrapper that logs before re-raising. Bug 6 (LOW) websocket.rs: WS session log was missing the request path, making it hard to correlate logs with client sessions. Added path field. Bug 7 (LOW) websocket.rs: scheme match arm 'ws'|'wss' borrowed joined immutably while set_scheme needed a mutable borrow. Fixed by using literal 'ws' (set_scheme on an already-ws URL is a no-op for the ws case).

- Replace full-sha image tags with type=sha,format=short (7-char) so the primary package versions list stops accumulating long sha-only entries. - Route cosign signatures into a sibling GHCR package via COSIGN_REPOSITORY=<image>-signatures, so the main image's package version list stays clean. GHCR does not yet implement the OCI 1.1 Distribution Referrers API (community discussion #163029, June 2025), so legacy signature mode is used here -- OCI 1.1 mode would force the signature manifest's subject into the same repo as the image and override COSIGN_REPOSITORY. Verifiers must export the same COSIGN_REPOSITORY value when running 'cosign verify'. - Add a promote-latest job that runs after the variant matrix and re-pushes the :latest tag pointing at the root image with a unique index annotation. This forces a fresh manifest digest, generating a new GHCR package version with current timestamp so :latest sits at the top of the version listing instead of whichever variant happened to finish last.

When the docker workflow is triggered directly by release.published (rather than via workflow_call from the Release parent), inputs.enable_ref_tags is null and produced an empty enable= attribute that the metadata-action rejected. Default to true on non-release triggers and skip ref/pr tags on release events where they don't apply anyway.

CodeQL alert chopratejas#61 (CWE-275, actions/missing-workflow-permissions): add explicit `permissions: contents: read` to the rust workflow root. Defaults the GITHUB_TOKEN to read-only across all jobs, so even if the repo policy changes, this workflow stays at least-privilege. No job in this workflow needs write — wheels/audit/parity all read-only. Add real end-to-end test suite at tests/e2e_real.rs gated behind HEADROOM_E2E=1. Spawns the actual Python Headroom proxy as a subprocess, runs the Rust proxy in-process in front of it, and exercises: - health endpoints across the full chain - Anthropic non-streaming (real API call) - Anthropic streaming SSE (real API call) with chunk-level validation - OpenAI non-streaming (real API call) - X-Request-Id generation and pass-through Adds tokio-process feature for Command/Child usage. Loads .env at the repo root for API keys (does not log values). Tests skip cleanly when HEADROOM_E2E is unset, so cargo test stays fast.

Adds 0.10.7-ab46594 (root) and 0.10.7-<variant>-<sha> (variants) so images can be referenced by an exact version+commit pair without relying on the moving variant or :latest tags.

Previously, any package registered under the headroom.proxy_extension entry-point group auto-loaded at proxy startup. A user pip-installing a plugin (or pulling one in transitively) would get its middleware running in front of all their LLM traffic with zero opt-in or visibility — the same mechanism that masked the Shield Enterprise streaming bug. Change: install_all() now takes an explicit enabled set (or reads HEADROOM_PROXY_EXTENSIONS). Discovery still runs to enumerate what's available, but only names the operator opted into actually install. The literal '*' is a wildcard for trusted environments. CLI: headroom proxy --proxy-extension shield_enterprise headroom proxy --proxy-extension shield_enterprise,mypkg headroom proxy --proxy-extension '*' Env: HEADROOM_PROXY_EXTENSIONS=shield_enterprise The startup banner now shows discovered + enabled extensions: Extensions: discovered=shield_enterprise (opt-in: --proxy-extension ...) Extensions: ENABLED shield_enterprise (available: shield_enterprise) Extensions: ENABLED (wildcard) shield_enterprise Names that were requested but not found are logged as warnings. Adds proxy_extensions: list[str] | None to ProxyConfig. Plumbs it through CLI -> ProxyConfig -> install_all(enabled=...). This is a behavior change for users who relied on auto-loading. Existing Shield/extension users must add --proxy-extension or set HEADROOM_PROXY_EXTENSIONS to keep their middleware running.

Cargo.lock: pick up tokio-util added in the WS half-close fix. RUST_DEV.md: document how to run headroom-proxy in passthrough mode (listen + upstream flags, e2e test gate, env vars).

cargo fmt --check failed in CI: import order in proxy.rs (cfg(test) attributes before/after non-attr imports) and a few line-wrapping nits in e2e_real.rs. Ran cargo fmt --all to fix. maturin-action@v1 does not have a 'manifest-path' input — the action warned 'Unexpected input(s) manifest-path' and proceeded to invoke maturin from the repo root, which sees the workspace Cargo.toml with no [package] section and bails. Move -m crates/headroom-py/Cargo.toml back inside the 'args' string.

# Conflicts: # headroom/proxy/server.py

GitHub Actions deprecated the macos-13 runner label. The validate-workflows actionlint step in CI fails because macos-13 is no longer in the available labels list. macos-15-intel is the current x86_64 macOS runner. (Bumped from macos-14 to macos-15 for arm64 was unnecessary; macos-14 is still valid and we keep it for cache-warmth.)

phase-0: rust workspace scaffolding + parity harness

ci(docker): clean up image tags, signatures, and Latest indicator

fix(memory): resolve Qdrant connection from HEADROOM_QDRANT_* env vars (chopratejas#31)

Stage 2 of the Rust port: a `headroom_core::tokenizer` module mirroring the Python `headroom.tokenizers` surface, with three backends behind a single `Tokenizer` trait. Backends, in dispatch order: 1. HuggingFace (`HfTokenizer`) — pure-Rust `tokenizers` crate loading any public `tokenizer.json`. Covers the gap between OpenAI (tiktoken) and the Anthropic/Gemini estimator: Cohere `command-*`, Llama-3.x, Mistral, Qwen, BERT, T5, etc. Construct from bytes or a file path; register against a model-name prefix via `register_hf` for automatic dispatch. No `hf-hub` auto-download yet — keeps networking, auth, and `~/.cache/huggingface` out of core. Longest-prefix wins; lookups are RwLock-protected. 2. Tiktoken (`TiktokenCounter`) — `tiktoken-rs` 0.11 BPE for OpenAI / o-series families. Byte-identical to Python `tiktoken` for ordinary text. Lazy shared `Arc<CoreBPE>` per encoding (o200k_base, cl100k_base, p50k_base, r50k_base). 3. Estimation (`EstimatingCounter`) — `chars / cpt` last-resort fallback. Matches Python's `max(1, int(len(text) / cpt + 0.5))` round-half-up formula (a self-review caught and fixed an earlier `ceil`-based version that diverged in the middle of the range, e.g. 5 chars at 4.0 cpt). Tests: 43 unit tests + 5 proptests; parity 40/40 byte-equal. Bench: criterion baseline on small/medium/large inputs. Workspace MSRV bumped 1.78 → 1.80 for `LazyLock`/`OnceLock`. No proxy wiring. Library-only; production behavior unchanged.

The `qdrant-env-vars` change in d3c37d7 (PR chopratejas#266) added `qdrant_url` and `qdrant_api_key` keys to the kwargs that `MemoryHandler` passes into `DirectMem0Adapter.__init__`. The corresponding assertion in `test_ensure_initialized_fast_paths_and_qdrant_variants` was missed in that PR and has been failing on `main` ever since. Surfacing here because it fails on every PR's CI; not caused by the Rust tokenizer work this branch adds. The two new keys are both `None` when the corresponding `HEADROOM_QDRANT_*` env vars are unset, which is the case in this test.

Stage 2.1: closes the loop on the HuggingFace tokenizer story. Stage 2 shipped `HfTokenizer::from_bytes`/`from_file`, which required callers to manage their own tokenizer.json files. This adds the third constructor: let t = HfTokenizer::from_pretrained("CohereForAI/c4ai-command-r-v01")?; register_hf("command-", t); `from_pretrained` is a thin wrapper around the `hf-hub` crate's blocking `ureq` API. First call downloads `tokenizer.json` to `~/.cache/huggingface/ hub` (or `$HF_HOME` if set); subsequent calls reuse the on-disk cache. Uses the `main` revision; gated repos (Llama, Mistral) require `HF_TOKEN` in env or `~/.cache/huggingface/token`. Also adds `try_register_hf(prefix, repo)` as the obvious one-liner for proxy startup code: let _ = try_register_hf("command-", "CohereForAI/c4ai-command-r-v01"); let _ = try_register_hf("mistral-", "mistralai/Mistral-7B-v0.1"); Each call is independent — a download failure for one model (e.g. gated without a token) does not affect others. `HfTokenizerError` gains a new `Hub` variant so callers can distinguish "couldn't fetch" from "fetched but malformed" — relevant when deciding whether to retry, surface to the user, or fall back to the estimator. Why blocking, not async: `from_pretrained` is called once at startup. A sync API works from `main()`, from a `OnceLock` initializer, or from `tokio::task::spawn_blocking` if a tokio caller needs it later. The async hf-hub backend would force callers to await at startup, which doesn't fit the `register_hf` registry pattern. Why rustls, not native-tls: keeps the binary statically linkable for AWS deploys (no system OpenSSL dependency). Tests: a network-dependent integration test (`#[ignore]`d in CI; hits HF for `gpt2`, ~1.4 MB) verifies the real download + load + count path. A non-network negative test verifies that an invalid repo name surfaces as `HfTokenizerError::Hub`, not a panic. 44 unit tests + 5 proptests + 1 doctest pass; parity stays 40/40 byte-equal.

…nizer Rust stage 2 tokenizer

…-hub rust(stage 2.1): HfTokenizer::from_pretrained via hf-hub

…nizer Rust stage 2 tokenizer

Stage 3a: first real transform port. Faithful Rust port of `headroom.transforms.diff_compressor` with byte-equal parity against all 20 recorded fixtures. # Algorithm (matching Python) 1. Hand-rolled unified-diff parser (state machine over `diff --git`, `index`, `--- a/`, `+++ b/`, `@@`, mode/binary/rename markers, +/- / space lines, "other" lines like `\ No newline at end of file`). 2. File cap (`max_files=20`): when fired, sort by total changes (most first) and keep top N. 3. Per-file hunk cap (`max_hunks_per_file=10`): keep first + last + top relevance-scored middle, then resort by hunk-header start line to restore appearance order. 4. Relevance scoring: change-density base + user-query word overlap + priority patterns (ERROR / IMPORTANCE / SECURITY regexes — matches `error_detection.PRIORITY_PATTERNS_DIFF`). 5. Per-hunk context trim: keep `max_context_lines=2` lines either side of each `+`/`-` line. 6. CCR cache_key: `md5(original)[:24]` (matches `compression_store.CompressionStore.store`). Emitted only when compression saved >20% of lines. Parity result: `[diff_compressor ] total=20 matched=20 skipped=0 diffed=0`. # Information preservation hardening Three pass-through paths inherited from Python that we keep deliberate (would lose info if we changed them): - Below `min_lines_for_ccr` (50): return input unchanged. - No diff sections parsed: return input unchanged. - Below 20% compression savings: emit compressed output but no CCR marker (the original is the cheaper representation anyway). Plus a parity-bound subtlety: `compressed_line_count` is captured BEFORE the CCR retrieval marker is appended, both for the marker text (`compressed to N`) and the result field. The output string therefore ends up with one more line than the field reports — by design, matching Python exactly. An off-by-one bug from recounting after appending the CCR marker was caught and pinned by a synthetic 8-file diff test. # Observability — the Rust escape hatch Python's `DiffCompressionResult` has thin observability: input/output line counts, additions/deletions, hunks_kept/removed, files_affected, cache_key. The Rust port adds a sidecar `DiffCompressorStats` struct with metrics Python doesn't emit: - `files_dropped: Vec<String>` — names (old → new path) of files silently discarded by the `max_files` cap. Python loses these. - `hunks_dropped_per_file: BTreeMap<String, usize>` — per-file hunk drops, stable iteration via `BTreeMap`. - `context_lines_input` / `context_lines_kept` / `context_lines_trimmed` — directly proxies info loss from the context trim. - `largest_hunk_kept_lines` / `largest_hunk_dropped_lines` — outlier detection (a single huge dropped hunk is much worse than many small). - `parse_warnings: Vec<String>` — surfaces malformed input rather than dropping silently. - `processing_duration_us` — latency budget. - `cache_key_emitted` + `ccr_skipped_reason: Option<String>` — explicit signal for "we chose not to emit CCR and this is why". A `tracing::info!(target: "diff_compressor", ...)` event is emitted on every call, carrying these fields for OTel scraping in prod. The sidecar struct is returned alongside via `compress_with_stats`; the parity-only `compress` API discards it. # Module layout - `crates/headroom-core/src/transforms/mod.rs` — namespace, doc comment with the guiding principle ("information preservation > aggressive compression") so future ports inherit the philosophy. - `crates/headroom-core/src/transforms/diff_compressor.rs` — full port (parser, scorer, hunk selector, context trimmer, formatter, CCR layer, stats, tracing). # Dependencies added to headroom-core - `md-5 = "0.10"` — for the CCR cache_key (matches Python MD5[:24]). - `regex = "1"` — was a transitive dep via tokenizers; now a direct dependency for the hunk-header parser and priority patterns. # Tests 6 unit tests covering pass-through paths, MD5 hex truncation, the Python `split("\n")` line-count semantics, sidecar stats emission, and a synthetic 8-file diff that locks the byte-equal behavior found in the parity fixtures.

…e-arg-list-too-long ci(docker): fix Argument list too long when signing bake outputs

…ilder Stage 3c.2 PR1 — the public extension surface that lets Enterprise crates plug richer components into SmartCrusher without forking. Three traits, one builder, behavior-equivalent on every parity fixture. The three traits: - Scorer (re-exported from `crate::relevance::RelevanceScorer`). Already a trait; OSS HybridScorer (BM25 + fastembed). Enterprise point: per-tenant Loop-trained scorer. - Constraint (new in `traits.rs`). `must_keep(items, item_strings) -> Vec<usize>` — indices the allocator must keep regardless of saliency. OSS defaults: `KeepErrorsConstraint`, `KeepStructuralOutliersConstraint` — thin wrappers around the existing `detect_error_items_for_preservation` and `detect_structural_outliers` functions. Enterprise point: BusinessRuleConstraint, RegulatoryConstraint::HIPAA, and so on. - Observer (new in `traits.rs`). `on_event(&CrushEvent)` fires once per top-level `crush()` call with strategy + sizes + elapsed_ns. OSS default: TracingObserver — writes to the `tracing` crate at debug, zero-cost when filtered out. Enterprise point: AuditObserver, MetricsObserver, LoopTrainingObserver. The builder (`builder.rs`): `SmartCrusherBuilder::new(config)` starts EMPTY (no scorer, no constraints, no observers — explicit composition; "no silent fallbacks" applied to the API surface). Methods stack: with_scorer, add_constraint, add_default_oss_constraints (appends KeepErrors + KeepStructuralOutliers), add_observer, with_default_oss_setup (HybridScorer + default constraints + TracingObserver in one call). `SmartCrusher::new(config)` is preserved as the OSS default factory (equivalent to `SmartCrusher::builder(config).with_default_oss_setup .build()`). Every existing caller (proxy, content_router, integrations, evals) continues to work unchanged. Internal refactor: `SmartCrusherPlanner` now holds `&[Box<dyn Constraint>]` and iterates the configured constraints via a new `apply_constraints(items, item_strings, keep)` method. Replaces four hardcoded `detect_structural_outliers` + `detect_error_items_for_preservation` call sites in the four plan methods. With the OSS default constraint stack the must-keep set is byte-identical to pre-PR1 — verified by all 17 parity fixtures. `SmartCrusher` gained two fields: `constraints: Vec<Box<dyn Constraint>>` and `observers: Vec<Box<dyn Observer>>`. New `from_parts` constructor (#[doc(hidden)]) is the builder's exit point. What did NOT change in this PR: - The internal planning algorithm (lossless tabular, saliency scoring, structured markers — those are PR 2/3/4). - The string/number/object/mixed-array crusher paths in `crushers.rs` and the `prioritize_indices` helper in `orchestration.rs` — they still call the detection functions directly. Path B from the design doc: dict-array path is the primary value plugin point; lifting the leaf compressors can come later if customers ask. Tests: 15 new tests across `traits.rs`, `constraints.rs`, `observer.rs`, `builder.rs`. Coverage: each constraint trait method called and pinned (errors flagged, structural outliers detected, item_strings cache parity, empty-array safety); builder empty-build path, default-OSS-stack append, add_constraint order preservation, with_default_oss_setup yields expected counts, observer fires end-to-end on a real crush; TracingObserver name stable, on_event doesn't panic. Verification: - cargo test --workspace: 403 passed (was 388, +15 new), 0 failed. - parity: 17/17 byte-equal for smart_crusher. - make ci-precheck: green. Stage 3c.2 PR sequence: - PR 1 (this commit): three traits + builder. - PR 2 (next): improvement A — TabularCompactor. - PR 3: improvement B — saliency scoring + structured allocator. - PR 4: improvement C — structured marker formatter. - PR 5: ENT-A — `headroom-enterprise` scaffold.

Stage 3c.2 PR2. Adds an opt-in compaction stage that runs BEFORE the existing lossy pipeline. When configured, it tries to losslessly re-shape arrays of objects into a recursive Compaction IR and renders that to bytes via a pluggable Formatter trait. When not configured (default OSS), behavior is byte-equal with the pre-PR2 path — all 17 SmartCrusher parity fixtures stay green. # What lands - Recursive Compaction IR (`compaction/ir.rs`): Table / Buckets / OpaqueRef / Untouched. CellValue can hold a nested Compaction so multi-level cases (stringified-JSON inside cells, heterogeneous arrays bucketed by discriminator, opaque blobs CCR-substituted) share one tree shape. - Cell classifier (`compaction/classifier.rs`): per-cell decision — Scalar / JsonObject / JsonArray / StringifiedJson(parsed) / Opaque(kind). Conservative: in doubt, return Scalar. - TabularCompactor (`compaction/compactor.rs`): array → IR. Handles uniform-nested flattening into dotted columns ("meta.region", "meta.tier"), stringified-JSON parsing + recursion, opaque-blob CCR-substitution (12-char SHA-256 prefix), and heterogeneous bucketing by discriminator. Falls through to a sparse Table when no clean discriminator exists, so we always do better than the lossy path for object arrays. - Formatter trait (`compaction/formatter.rs`) + two impls: - JsonFormatter: structured JSON for debugging / programmatic use. - CsvSchemaFormatter: [N]{col:type,col:type} declaration + CSV rows. Steals TOON's row-count-and-shape declaration without adopting TOON's bespoke escaping. CSV is the format LLMs are strongest at — every model has seen millions of examples in training. >30% smaller than raw JSON serialization on tabular fixtures. - Wiring (`crusher.rs`, `builder.rs`): SmartCrusher gains an optional compaction stage. Builder methods with_compaction(stage) and with_default_compaction() opt in. CrushArrayResult gets two new fields (compacted, compaction_kind) populated only when the stage runs. strategy_info becomes compaction kind when compaction won. # Why this design - Three-trait extension surface preserved. PR1 added Constraint / Observer / Scorer; PR2 adds Formatter as the fourth pluggable seam. Enterprise plug-ins land cleanly without forking core. - Empty default builder rule held. SmartCrusherBuilder::new() still produces a no-compaction crusher. with_default_compaction() is the explicit OSS preset. No silent fallbacks. - Recursive IR was the unlock. A flat table-of-scalars IR would have collapsed the moment a cell held nested JSON. Making CellValue::Nested hold another Compaction made stringified-JSON parsing + heterogeneous bucketing + opaque substitution all share one renderer pass. - CCR substitution for opaque cells. Strings classified as base64/HTML/long-opaque become structured markers keyed by 12-char SHA-256 prefix. The full bytes round-trip via the CCR store (PyO3 bridge owns actual storage; this PR emits the marker and computes the hash). # Tests - 60 new unit tests across IR / classifier / compactor / formatter / wiring (448 total in headroom-core, was 388). - 17/17 SmartCrusher parity fixtures byte-equal — default-config path completely unchanged. - 21/21 Python parity tests pass via PyO3 bridge. - make ci-precheck green: ruff, mypy, cargo fmt/clippy/test (1.95.0), commitlint. # Deferred to follow-up PRs - ToonFormatter (small; ship after eval harness compares formats) - Diff/code detection in cells → routes to DiffCompressor / CodeCompressor (coupled to ContentRouter Phase 4) - Budget-aware row dropping (Constraint-respecting) when rendered size exceeds budget - Format A/B eval harness - ContentRouter unification (Phase 4) Modules: crates/headroom-core/src/transforms/smart_crusher/compaction/*, builder.rs, crusher.rs, mod.rs

…r1-traits feat(rust): SmartCrusher extension surface — Constraint, Observer, Builder

…r2-tabular-compactor feat(rust): SmartCrusher PR2 — lossless-first tabular compaction

…estoration Stage 3c.2 PR4. Restores Python's CCR-Dropped semantics on the lossy path (the cornerstone reversibility guarantee that the port had silently dropped) and flips the OSS default to lossless-first with a configurable savings threshold. # The user-visible behavior Default `SmartCrusher::new()` now runs: 1. Try lossless compaction. 2. If savings >= `lossless_min_savings_ratio` (default 0.30), ship it — `compacted` populated, `ccr_hash = None`, nothing dropped. 3. Otherwise fall through to the lossy path — drop rows AND populate `ccr_hash` so the runtime can cache the full original for tool-call retrieval. **No data is ever lost.** "Lossy" means "compressed view inline; full payload retrievable via CCR cache" — same semantics as Python's SmartCrusher with CCR enabled. The runtime (PyO3 bridge / proxy server) owns the cache; this crate computes the hash and emits a marker so the prompt knows where to look. # What changed - `SmartCrusherConfig.lossless_min_savings_ratio: f64` (default 0.30). Single configurable knob — Enterprise overrides as needed. Below the threshold, lossless declines and lossy + CCR runs. - `SmartCrusher::new(cfg)` flips to include the compaction stage by default. `SmartCrusher::without_compaction(cfg)` is the explicit opt-out for callers / fixtures that depend on pre-PR4 behavior. - `crush_array` rewritten: - Lossless-first dispatch with savings-ratio gate - Lossy path now hashes the full original (12-char SHA-256 prefix) and emits a CCR-Dropped marker in `dropped_summary` whenever rows are dropped - `ccr_hash` field populated whenever rows were dropped - `process_value` substitutes the compacted string into the JSON tree when lossless wins, so `crush()` output reflects the win - PyO3 bridge: `SmartCrusher.without_compaction()` static method; `SmartCrusherConfig` exposes the new `lossless_min_savings_ratio` field; Python `SmartCrusher` wrapper accepts `with_compaction=True` (default) and routes to the right Rust constructor. - Parity harness: legacy 17 fixtures use `without_compaction()` so byte-equal coverage of the lossy path is preserved. # Tests - Rust: 281/281 smart_crusher unit tests pass (was 277). Six new tests cover: lossless wins above threshold, lossy falls through below threshold, CCR hash deterministic + input-dependent, lossy without compaction emits CCR, passthrough paths don't emit CCR, without_compaction yields no compacted field. - Python parity: 21/21 (legacy fixtures via without_compaction). - Python lossless default smoke: 3/3 new tests in test_smart_crusher_lossless_default.py. - Python retention: 21/21 (updated to opt into the lossy path explicitly since their semantics target row-level retention). - make ci-precheck green. Modules: crates/headroom-core/src/transforms/smart_crusher/{config,crusher}.rs crates/headroom-parity/src/lib.rs crates/headroom-py/src/lib.rs headroom/transforms/smart_crusher.py tests/test_quality_retention.py tests/test_transforms/test_smart_crusher_{lossless_default,rust_parity}.py

…rom main) Stage 3c.2 PR3a-redux. The original PR3a (chopratejas#286) was merged on GitHub on 2026-04-27 but its content never reached main — its parent merge (chopratejas#285 PR2) was squash-merged, which drops commits stacked on top of the source branch. The walker module disappeared along with the intermediate commits. This PR re-lands `walker.rs` and updates the `compaction/mod.rs` re-exports — same content as commit 4db4f46 from the lost branch. No new functionality, no behavior change for existing crusher paths. # What the walker does (recap) Recursive descent over any JSON value: match value { Object(m) => recurse into each field Array(xs) => recurse into items, then try TabularCompactor on the array String(s) => parse-as-JSON-and-recurse / CCR-substitute / leave scalar => unchanged } Compactable spots become inline strings holding the rendered bytes. The wrapping JSON structure is preserved. # Why chore, not feat This is a re-land of previously-merged-but-lost code. No new feature shipped — fixing a regression caused by the stacked-PR squash-merge accident. Using `chore` keeps the version from bumping for what is effectively a parity restoration. # Tests - 13 walker unit tests pass. - 462/462 headroom-core lib tests overall. - 17/17 SmartCrusher parity fixtures byte-equal (untouched). - 185/185 Python tests pass. - make ci-precheck green. Module: crates/headroom-core/src/transforms/smart_crusher/compaction/walker.rs

PR4 flipped the OSS default to lossless-first. The MCP server and LangChain eval tests assert wire-format and row-level retention properties that belong to the lossy path; the lossless path substitutes a CSV+schema STRING in place of arrays, which is great for LLM prompts but wire-incompatible with consumers that iterate the JSON. Pin both call sites to the lossy + CCR-Dropped path via `with_compaction=False`. Same retention semantics as Python's pre-PR4 SmartCrusher behavior — full payload still cached via CCR for tool retrieval; nothing is lost. Modules: - headroom/integrations/mcp/server.py — runtime MCP wrapper - tests/test_integrations/langchain/test_evals.py — eval fixture CI run that surfaced these: actions/runs/25025161868

Two more tests broke after PR4's lossless-first default flip — same root cause as the langchain/MCP fixes (chopratejas#287's first patch): - tests/test_proxy_ccr.py — TestEndToEndTOINIntegration asserts CCR-cache state after compression. Lossless wins on the test fixture and skips CCR entirely (nothing dropped). Pin to lossy via with_compaction=False so the cache assertion holds. - tests/test_text_compressors.py — TestSmartCrusherTextIntegration asserts JSON-array shape round-trip. Lossless substitutes a CSV+schema string. Pin to lossy + JSON shape via with_compaction=False. Lossless coverage exists separately in test_smart_crusher_lossless_default.py. Same pattern, same fix. CI run that surfaced these: actions/runs/25025876328

…r4-lossless-first-default feat(rust): SmartCrusher PR4 — lossless-first default + CCR-Dropped restoration

…r3a-redux-walker chore(rust): re-land DocumentCompactor walker (squash-merge lost it from main)

…paque strings Stage 3c.2 PR5. Closes the gap between the public `crush()` API and the standalone `DocumentCompactor` walker. Augments `SmartCrusher::process_value`'s String branch to mirror the walker's two String cases: 1. Stringified-JSON containers: parse, recurse via process_value, re-emit. The wrapping field stays a string but its contents are processed end-to-end. Special-cases the lossless-compaction path (when recursion returns Value::String) to avoid double- JSON-encoding. 2. Opaque blobs (long base64 / HTML / long-text strings): substitute with `<<ccr:HASH,KIND,SIZE>>` markers — same format as walker.rs and PR4's lossy CCR-Dropped markers, so downstream consumers can pattern-match regardless of which path emitted. Type chore so the package version doesn't bump. # Why this matters After PR4, calling `SmartCrusher::new()` and then `crush(json_blob)` gets lossless-first compaction on top-level arrays. But for tool outputs that wrap a JSON-encoded payload INSIDE a string field, or contain opaque blobs (base64-encoded files, HTML chunks), the public API was a no-op. The walker handled these but only via its own entry point; users calling `crush()` never hit it. PR5 brings walker semantics into the public path. Same vision the user described early in this stage: > JSON within JSON, opaque payloads, multi-layered now works for `crush()` callers without any extra setup. # Implementation (~80 lines) - `process_string` method on SmartCrusher: dispatches to JSON-recurse / CCR-substitute / passthrough. - `try_parse_json_container_str`: cheap parse-only-containers helper. - `ccr_marker_for_string` + `opaque_kind_label` + `humanize_bytes`: marker formatting matching walker.rs byte-for-byte. Reuses every PR2 primitive — no new traits, no new IR, no new abstractions. # Tests (6 new) - Short string passthrough (no false positives) - Stringified-JSON array recurses (50 items inside a string field) - Opaque base64 blob → CCR marker substitution - Top-level plain text passthrough (crush(plain_text) unchanged) - Short JSON-looking strings unchanged (no false-positive opaque) - Helper parses only containers, not bare scalars 303/303 smart_crusher tests pass (was 297 → +6 PR5). 185/185 Python tests. make ci-precheck green. Module: crates/headroom-core/src/transforms/smart_crusher/crusher.rs

CcrStore trait + InMemoryCcrStore (1000 entries, 5-min TTL, FIFO eviction, idempotent re-store) live at the crate root. SmartCrusher's lossy crush_array path now actually stashes the full original [items] canonical-JSON into the configured store keyed by the same ccr_hash it embeds in the prompt marker -- closing the no-data-loss contract that was previously hash-only. PyO3 surface: - crusher.crush_array_json(items_json) -> dict with ccr_hash + kept items - crusher.ccr_get(hash) -> Optional[str] for retrieval - crusher.ccr_len() -> int for telemetry Python shim passes both through. Default constructors enable the store (matches Python's CCR-enabled default); without_compaction() also gets it because CCR is a contract, not an opt-in extra. Tests proving compress -> store -> retrieve -> reconstruct: - 7 unit tests in ccr.rs (put/get/eviction/expiry) - 9 Rust integration tests (crates/headroom-core/tests/ccr_roundtrip.rs) - 10 Python tests including 4 explicit before/after element-equality assertions through both the native PyO3 surface and the Python shim Plugin manifest versions auto-bumped by the sync-plugin-versions pre-commit hook (unrelated to CCR but co-resident in the working tree).

….toml The action was set to @stable, which installs whatever the latest stable is (1.95.0 right now). Then maturin invokes cargo, which reads rust-toolchain.toml and re-resolves to "1.95.0 + clippy + rustfmt". rustup treats stable and 1.95.0 as distinct toolchain identities and refuses the second install with: failed to install component 'clippy-preview-x86_64-unknown-linux-gnu', detected conflict: 'bin/cargo-clippy' This was intermittent across the matrix (only test (3.10) tripped on the most recent run; others got lucky on cache state). Pinning the action ref to 1.95.0 makes both sides ask for the exact same toolchain identity, so the second install is a no-op and the conflict can't fire. Bump procedure stays the same: when rust-toolchain.toml's channel changes, update these refs in lock-step. Plugin manifests auto-bumped 0.11.0 -> 0.13.2 by sync-plugin-versions hook (unrelated to the workflow fix).

…r5-walker-integration chore(rust): walker semantics in process_value — stringified-JSON + opaque strings

…r7-ccr-store chore(rust): SmartCrusher CCR storage layer + roundtrip verification

Closes four gaps in the Rust SmartCrusher pipeline that, together, wire CCR storage end-to-end so the LLM can actually retrieve dropped data: 1. CCR-Dropped marker is now injected into process_value's lossy-path output as a sentinel object {"_ccr_dropped": "<<ccr:HASH N_rows_offloaded>>"} appended to the kept-items array. Previously the store held the original but no pointer reached the prompt -- the retrieval contract was data-on-server, no-way-to-ask. Sentinel-as-object preserves the array-of-dicts shape so downstream iteration with x.get(...) keeps working. 2. Walker / process_value drift removed. process_value gains a Value::String arm that handles stringified-JSON containers (parse, recurse, re-encode) and opaque blobs (CCR marker + store) -- same semantics walker.rs has always had, now reachable from the main crush() pipeline. 3. Opaque-string CCR now stores originals. DocumentCompactor gains an Option<Arc<dyn CcrStore>> field; emit_opaque_ccr_marker calls store.put when one is configured. Same hash regardless of store presence -- runtime contract is stable across configurations. Same wiring is shared between walker.rs and process_value via the extracted helper. 5. PyO3 surface adds SmartCrusher.compact_document_json(doc_json) -> compacted-json string. Routes through the crusher's existing CCR store, so ccr_get resolves both row-drop and opaque-string hashes. Tests: - 5 new Rust integration tests in ccr_roundtrip.rs (marker visibility, nested-array marker, opaque-string roundtrip, stringified-JSON recursion, walker-with-store) - 4 new Python tests covering the marker visible-to-LLM contract via both the native PyO3 surface and the Python shim - 5 legacy parity fixtures re-recorded (dict_array_*, duplicate_dicts_40) -- their lossy outputs now carry the sentinel; Rust + Python both match the new bytes (parity-run smart_crusher: 17/17)

The PR8 marker injection appends a sentinel object {"_ccr_dropped": "<<ccr:HASH N_rows_offloaded>>"} to the kept-items array on the lossy path so the LLM sees the retrieval pointer in the prompt. Tests that iterate compressed arrays via subscript access (e["level"], r["status"], i["labels"], m["text"]) hit KeyError on the sentinel because it doesn't share the record schema. Same root cause as the test_quality_retention fixes in PR8 -- these integration tests were left out of that pass. Ship a public helper headroom.transforms.smart_crusher.strip_ccr_sentinels so tests can use it cleanly: `for e in strip_ccr_sentinels(entries):` and production callers iterating compressed output get a single canonical filter instead of inlining the _ccr_dropped check. The 7 previously-failing tests in PR chopratejas#292 CI now pass: - langchain test_100_percent_errors_preserved_logs - langchain test_errors_preserved_with_many_errors - langchain test_search_results_with_query_term - mcp test_all_log_errors_preserved - mcp test_slack_significant_compression_with_content - mcp test_database_error_status_preserved - mcp test_github_bugs_partial_preservation 753 tests across the integration + transforms + retention suites pass locally. Plugin manifests auto-bumped 0.13.3 -> 0.13.4 by the sync-plugin-versions hook (unrelated to this fix).

…r8-marker-injection-walker-unify chore(rust): SmartCrusher CCR marker injection + walker unification

…, single-serialize CCR write Three orthogonal hot-path fixes targeting concurrent-request throughput. Each is independently bench-measured below; the proxy hot path benefits from all three at once. == 1. PyO3 GIL release on heavy compute == PyO3 methods (crush, smart_crush_content, crush_array_json, compact_document_json, compress, compress_with_stats) used to hold the GIL across the entire Rust call. Result: a 100ms compress() blocked EVERY other Python thread for 100ms — multi-worker uvicorn deployments serialized through SmartCrusher. Wrap each compute call in `py.allow_threads(|| ...)`. Inputs (`&str` from Python) are copied to owned `String` first because PyO3 ties them to the GIL hold. PyDict construction stays on the GIL side. Measured: 4 Python threads each running 20 crushes: before (GIL held): ~3.3s wall (serialized — equivalent to 4×0.83s) after (allow_threads): 826ms wall (4.01x speedup, perfect parallel) == 2. CcrStore: Mutex<HashMap> -> DashMap-backed sharded == Single Mutex was the dominant bottleneck under multi-worker load — every put/get serialized through one lock. Replace with DashMap (sharded concurrent map, lock-free reads within a shard) plus a separate small Mutex<VecDeque> for FIFO insertion-order eviction. Reads of distinct keys never contend; writes only contend during the brief order-queue push or capacity-sweep. A/B bench (200 mixed put/get ops × N threads, in benches/ccr_store.rs): Threads | DashMap Legacy Mutex Speedup ------------------------------------------- 1 | 63 µs 71 µs 1.13x 2 | 98 µs 194 µs 2.0x 4 | 178 µs 707 µs 4.0x 8 | 342 µs 1267 µs 3.7x Legacy degrades ~linearly with thread count; DashMap stays near-flat per-thread. Real multi-worker scaling. == 3. Single-serialize the lossy CCR payload == The lossy `crush_array` path used to serialize the full array TWICE: once in `hash_array_for_ccr` (allocates `Value::Array(items.to_vec())`, deep-clones every Value subtree, then serializes), and a second time in the store-write site. For a 50-item dict array that's ~MB of allocator pressure per crushed array. Introduce `canonical_array_json` (serializes `&[Value]` directly — same bytes as `Value::Array(items.to_vec())` but no wrapper allocation + no tree clone), call it ONCE per lossy path, then both hash and store from those same bytes. Hash-format stable — all 17 parity fixtures match byte-for-byte. == Tests == - 8 ccr.rs unit tests including a new concurrent-stress test (8 threads × 200 puts/gets, every key readable afterwards) - 14 ccr_roundtrip integration tests stay green - parity-run smart_crusher: 17/17 fixtures match - 479 lib + 14 integration + 185 Python tests all pass - New benches/ccr_store.rs runs the A/B and is committed for regression visibility == Dependencies added == - dashmap v6 (mature, widely-used in tokio/linkerd ecosystem)

…il-dashmap-single-serialize perf(rust): tier-1 multi-worker wins — GIL release, sharded CCR store, single-serialize CCR write

Co-authored-by: JerrettDavis <2610199+JerrettDavis@users.noreply.github.com>

`headroom/transforms/tag_protector.py` was a regex-driven scan-and- replace loop that ran on every kompress call from ContentRouter (`content_router.py:1089`). The Python implementation had five real bugs we now fix in the port — the most consequential being a `str.replace(.., .., 1)` first-occurrence-replace bug that silently collapsed two identical custom-tag blocks in the same input to a single placeholder + a stray duplicate of the second block. # Bug fixes (each pinned by a `fixed_in_3e4` test) * **#1: O(n²) on nested custom tags.** Python's `while changed` loop restarted a full regex scan after every replacement. Rust walks once in linear time on input length. * **#2: First-occurrence replace bug.** `result.replace(orig, ph, 1)` replaces the FIRST textual match, not the matched offset. Two identical custom-tag blocks collapsed to one placeholder + a stray duplicate of the second block. The Rust walker stitches output by offset so distinct blocks always get distinct placeholders. * **#3: Silent 50-iteration cap.** Python had a hard `max_iterations = 50` safety limit that quietly truncated tag protection on deeply nested input. The Rust walker is bounded by input length only. * **#4: Self-closing pass duplicate-replace risk.** Python ran a second loop with the same `replace_first` bug for self-closers. Rust handles self-closers in the same single pass. * **#5: Placeholder collision.** If the input contained a literal `{{HEADROOM_TAG_…}}` substring, Python silently let the collision break restoration. Rust salts the prefix and reports it in stats. # Architecture Two-phase walker: * Phase 1 (`identify_spans`): linear scan over input bytes, hand- rolled tag-open / tag-close lexer (no regex). Maintains a stack of open custom tags; on a matching close, collapses the inner span into a single `Span { start, end, Block }`. Self-closing custom tags become `Span { ..., SelfClosing }` immediately. Marker-only mode (`compress_tagged_content=true`) emits Open/CloseMarker spans instead. Orphan opens stay un-protected (matches Python behavior). Orphan closes are emitted verbatim and counted in stats. * Phase 2 (`emit_output`): walks `text` once, splicing placeholders for span ranges and copying everything else verbatim. Offset-based, never `str.replace`. PyO3 surface: `protect_tags`, `restore_tags`, `is_html_tag`, `known_html_tag_names`. The Python shim retires the regex internals and re-exports `KNOWN_HTML_TAGS` (rebuilt from the Rust list) + `_is_html_tag` for backwards compat with `content_router.py` and the existing test surface. # Test plan * 25 Rust unit tests including 4 `fixed_in_3e4_*` bug-fix tests * 27 Python tests (23 existing + 4 new `fixed_in_3e4` parity tests) * 5 integration tests in `test_tag_protection_integration.py` pass * `make ci-precheck` clean

chopratejas and others added 30 commits April 24, 2026 13:39

ci(docker): add <version>-<short-sha> tag for each variant

a297b8f

Adds 0.10.7-ab46594 (root) and 0.10.7-<variant>-<sha> (variants) so images can be referenced by an exact version+commit pair without relying on the moving variant or :latest tags.

docs(rust): lockfile + RUST_DEV.md for proxy CLI

c2749c0

Cargo.lock: pick up tokio-util added in the WS half-close fix. RUST_DEV.md: document how to run headroom-proxy in passthrough mode (listen + upstream flags, e2e test gate, env vars).

Merge remote-tracking branch 'origin/main' into rust-rewrite

4429a11

# Conflicts: # headroom/proxy/server.py

Merge pull request chopratejas#268 from chopratejas/rust-rewrite

68c506d

phase-0: rust workspace scaffolding + parity harness

Merge pull request chopratejas#269 from pratikbin/main

b2527a0

ci(docker): clean up image tags, signatures, and Latest indicator

Merge pull request chopratejas#266 from ipapapa/fix/31-qdrant-env-vars

7c1cff2

fix(memory): resolve Qdrant connection from HEADROOM_QDRANT_* env vars (chopratejas#31)

chore: sync plugin versions to 0.10.13

a22a727

chore: sync plugin versions to 0.11.0

cb80bf6

Merge pull request chopratejas#271 from chopratejas/rust-stage-2-toke…

342a6a4

…nizer Rust stage 2 tokenizer

Merge pull request chopratejas#272 from chopratejas/rust-stage-2.1-hf…

c275567

…-hub rust(stage 2.1): HfTokenizer::from_pretrained via hf-hub

Merge pull request chopratejas#275 from chopratejas/rust-stage-2-toke…

ab7b191

…nizer Rust stage 2 tokenizer

chopratejas and others added 23 commits April 27, 2026 13:01

Merge pull request chopratejas#283 from chopratejas/ci-fix-docker-bak…

e0f372f

…e-arg-list-too-long ci(docker): fix Argument list too long when signing bake outputs

Merge pull request chopratejas#284 from chopratejas/rust-stage-3c-2-p…

5b9cd5f

…r1-traits feat(rust): SmartCrusher extension surface — Constraint, Observer, Builder

Merge pull request chopratejas#285 from chopratejas/rust-stage-3c-2-p…

00e25d8

…r2-tabular-compactor feat(rust): SmartCrusher PR2 — lossless-first tabular compaction

Merge pull request chopratejas#287 from chopratejas/rust-stage-3c-2-p…

8e09878

…r4-lossless-first-default feat(rust): SmartCrusher PR4 — lossless-first default + CCR-Dropped restoration

Merge pull request chopratejas#288 from chopratejas/rust-stage-3c-2-p…

7b1ce22

…r3a-redux-walker chore(rust): re-land DocumentCompactor walker (squash-merge lost it from main)

Merge pull request chopratejas#290 from chopratejas/rust-stage-3c-2-p…

b376012

…r5-walker-integration chore(rust): walker semantics in process_value — stringified-JSON + opaque strings

Merge pull request chopratejas#291 from chopratejas/rust-stage-3c-2-p…

ee79ebb

…r7-ccr-store chore(rust): SmartCrusher CCR storage layer + roundtrip verification

Merge pull request chopratejas#292 from chopratejas/rust-stage-3c-2-p…

e4dfa78

…r8-marker-injection-walker-unify chore(rust): SmartCrusher CCR marker injection + walker unification

Merge pull request chopratejas#293 from chopratejas/rust-perf-tier1-g…

f81b88b

…il-dashmap-single-serialize perf(rust): tier-1 multi-worker wins — GIL release, sharded CCR store, single-serialize CCR write

chore: merge upstream/main into native-e2e-expansion, resolve conflicts

96c5d66

Co-authored-by: JerrettDavis <2610199+JerrettDavis@users.noreply.github.com>

Copilot AI assigned Copilot and JerrettDavis Apr 28, 2026

Copilot created this pull request from a session on behalf of JerrettDavis April 28, 2026 17:01 View session

JerrettDavis marked this pull request as ready for review April 28, 2026 17:01

JerrettDavis merged commit f6ddbcf into native-e2e-expansion Apr 28, 2026

JerrettDavis deleted the copilot/update-pr-branch-from-main branch April 28, 2026 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: merge upstream/main into native-e2e-expansion, resolve conflicts#4

chore: merge upstream/main into native-e2e-expansion, resolve conflicts#4
JerrettDavis merged 88 commits intonative-e2e-expansionfrom
copilot/update-pr-branch-from-main

Copilot AI commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Copilot AI commented Apr 28, 2026

Conflict resolutions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants