feat: Offline mode for Pipelex Gateway setup and dry-run#900
Conversation
Lays out a TDD plan for offline-safe Pipelex setup: cache remote config on first init, fall back to cache when network is unavailable, and fail clearly when a referenced gateway model is missing from both fresh and cached specs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply 9 review-driven edits to TODOS.md: move source provenance off GatewayConfig, cache raw JSON, keep RemoteConfigFetchError, require fresh data for doc generators, add retry-exhaustion and regression tests, replace test env-var backdoor with PIPELEX_REMOTE_CONFIG_URL.
…and provenance tracking - Refactored `RemoteConfigFetcher.fetch_remote_config()` to return a `RemoteConfigResult` containing the fetched config, source of the config (FRESH or CACHED), and cache timestamp. - Introduced `RemoteConfigUnavailableError` for scenarios where both network fetch and cache fallback fail, providing user-facing error messages with remediation steps. - Added `RemoteConfigStaleWarning` to indicate when a cached config is used due to network issues. - Updated all existing callers of `fetch_remote_config()` to accommodate the new return type and error handling. - Enhanced tests to cover new behaviors, including success cases, network failures, and validation errors. - Ensured that the internal retry logic raises `RemoteConfigFetchError` while the outer layer handles user-facing errors appropriately.
…y specs - Added GatewayUnknownModelError to handle cases where a model referenced in the deck is not found in the active gateway specs. - Enhanced model manager to enforce gateway model membership, raising the new error when discrepancies are detected. - Updated remote config fetcher to include source provenance (FRESH vs CACHED) for better error messaging and telemetry control. - Refactored related tests to ensure proper coverage for the new error handling and gateway configuration scenarios. - Introduced RemoteConfigSource enum to streamline source tracking for remote configurations.
…hen cache is refused
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dbaa02743d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Greptile SummaryThis PR adds offline support for Pipelex Gateway setup, validation, and dry-run flows. The main changes are:
Confidence Score: 5/5This looks safe to merge.
Important Files Changed
Reviews (8): Last reviewed commit: "fix: re-validate cached payload when pri..." | Re-trigger Greptile |
There was a problem hiding this comment.
8 issues found across 36 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="pipelex/cogt/models/model_manager.py">
<violation number="1" location="pipelex/cogt/models/model_manager.py:143">
P2: Gateway membership collection misses extract/img_gen/search choice defaults, so invalid default handles for those model types are not validated.</violation>
<violation number="2" location="pipelex/cogt/models/model_manager.py:183">
P1: Bare HANDLE references are not resolved through alias/waterfall mappings, which can raise `GatewayUnknownModelError` for valid deck references.</violation>
<violation number="3" location="pipelex/cogt/models/model_manager.py:195">
P1: Add cycle detection in the WATERFALL resolution path; without a visited guard, a self-referential or cyclic waterfall can loop indefinitely and hang setup.</violation>
<violation number="4" location="pipelex/cogt/models/model_manager.py:200">
P1: Waterfall membership validation only inspects the first fallback, causing false unknown-model errors when later fallbacks are valid.</violation>
</file>
<file name="pipelex/cli/commands/init/command.py">
<violation number="1" location="pipelex/cli/commands/init/command.py:82">
P2: Cache priming checks gateway enablement from layered/project-preferred config instead of the init target directory, so global vs local init can prime (or skip) based on the wrong backends.toml.</violation>
</file>
<file name="pipelex/pipelex.py">
<violation number="1" location="pipelex/pipelex.py:219">
P2: Do not mark the dummy no-model-specs path as `FRESH`; setting a source here triggers gateway membership validation against empty placeholder specs and can break commands that intentionally skip spec loading.</violation>
</file>
<file name="pipelex/system/pipelex_service/remote_config_fetcher.py">
<violation number="1" location="pipelex/system/pipelex_service/remote_config_fetcher.py:237">
P1: Handle `OSError` around cache persistence so an unwritable `~/.pipelex/cache` does not fail an otherwise successful fresh remote-config fetch.</violation>
</file>
<file name="tests/e2e/agent_cli/test_offline_run_dry.py">
<violation number="1" location="tests/e2e/agent_cli/test_offline_run_dry.py:83">
P3: Parse and return the last JSON object in the CLI output, not the first decodable one, so preamble JSON fragments don't get mistaken for the final agent envelope.</violation>
</file>
Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
There was a problem hiding this comment.
1 issue found across 11 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="pipelex/cogt/models/model_manager.py">
<violation number="1" location="pipelex/cogt/models/model_manager.py:243">
P2: Cycle detection in `_collect_candidates` uses only the reference name, so alias/waterfall entries with the same identifier are falsely treated as cycles. This can produce an empty candidate list and incorrectly skip gateway membership validation.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
…handling for remote config issues
Adds the `warnings` field to the agent CLI JSON success contract in agent-cli.md (was missing the field this branch introduces), and notes remote-config cache priming in init.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Documents how Pipelex stays usable when the Gateway remote config service is unreachable: BYOK skips the fetch entirely, Gateway mode falls back to the primed on-disk cache, and only live inference still needs the network. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A directory containing a .pipelex/ config dir is now recognized as a project root. Previously such a directory fell through to the global ~/.pipelex/ config, silently ignoring the project's own overrides. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # CHANGELOG.md
Address two P1 review findings on the offline-mode work: - remote_config_fetcher: a cache with a valid wrapper but a malformed raw_config let a raw Pydantic ValidationError escape the offline fallback. Catch it and raise RemoteConfigUnavailableError with the normal remediation. Reword the message to "no usable local cache" so it is accurate for both missing and unusable caches. - preprocess_test_models_cmd: _fetch_gateway_models swallowed require_fresh refusals into empty model lists, letting offline fixture generation proceed without any pipelex_gateway entries. Let the error propagate and surface a clear offline-mode panel. Adds regression tests for both paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A successful remote-config fetch does not guarantee the on-disk cache was written: RemoteConfigFetcher treats the cache write as opportunistic and swallows OSErrors (read-only / full cache dir) with only a stderr warning. attempt_prime_remote_config_cache trusted the fetch result alone, so it could return primed=True while no usable cache existed, making `pipelex-agent init` emit `cache_primed: true` and leaving later offline runs to fail with RemoteConfigUnavailableError. Verify a usable cache exists via RemoteConfigCache.load() after the fetch; report priming failure with a clear remediation message otherwise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="pipelex/cli/commands/init/command.py">
<violation number="1" location="pipelex/cli/commands/init/command.py:112">
P2: The new priming read-back check treats `RemoteConfigCache.load()` as a usability check, but it only validates the cache wrapper. This can still report `primed=True` with an unusable cached payload.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
The priming read-back check treated RemoteConfigCache.load() as a usability check, but load() only validates the cache wrapper, not the inner raw_config payload. A malformed payload could still report primed=True. Now call to_remote_config() and treat a ValidationError as a non-primed result, matching the existing check in RemoteConfigFetcher.fetch_remote_config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
~/.pipelex/cache/remote_config.jsonthat Pipelex Gateway falls back to when the remote config service is unreachable. Setup, validation, andpipelex-agent run bundle --dry-runcomplete normally offline; only the actual inference call still needs the network at runtime. The cache is primed on every successful fetch and atpipelex initwhile online. When the gateway is disabled (BYOK), no remote fetch is attempted at all.RemoteConfigStaleWarning(UserWarning) surfaced on the agent-CLI JSON envelope aswarnings: [{"type": "RemoteConfigStale", ...}]; suppresses telemetry (no-op) on stale cache; refuses cache for doc/fixture generators via a newrequire_fresh=Trueflag.RemoteConfigUnavailableError(offline + cold cache, with two-path remediation) andGatewayUnknownModelError(source-aware messaging when a deck references a gateway handle absent from the fresh-or-cached specs). Both wired through the Rich CLI (error_handlers.py) and the agent CLI (AGENT_ERROR_HINTS/AGENT_ERROR_DOMAINS).RemoteConfigFetcher.fetch_remote_config()now returns aRemoteConfigResult(config, source, cached_at);ModelManager.setup()andBackendLibrary._load_gateway_model_specs()accept agateway_config_source: RemoteConfigSource | Noneparameter so the membership check can branch its error message on FRESH vs CACHED.GatewayConfigstaysextra="forbid"and source-free — provenance is plumbed alongside.PIPELEX_REMOTE_CONFIG_URLenv var to override the default URL (useful for staging/testing).See
TODOS.mdfor the full phased implementation plan (Phases 0–7) and the per-checkpoint status blocks with rationale, decisions, and verification notes.Related
mthds-plugins/wip/codex-sandbox-escalation.md—--dry-runno longer requires escalation once Pipelex has been initialised online once.Deferred follow-ups
Tracked at the bottom of
TODOS.md:pipelex doctorcache reporting (presence, age, missing-cache hint).mthds-plugins/wip/codex-sandbox-escalation.md.pipelex-agent run bundle --output-dir <path>flag for read-only mounted bundles.GatewayUnknownModelErrorend-to-end via a deck override that points a preset at a missing handle.Test plan
make agent-check— clean (ruff fix-imports, ruff format, plxt fmt, ruff lint, plxt lint, pyright, mypy).make agent-test— full suite green.test_byok_offline_succeeds).test_gateway_known_with_cache_succeeds_offline).RemoteConfigUnavailableError(E2Etest_gateway_no_cache_no_network_fails_with_unavailable).GatewayUnknownModelError(integrationtest_gateway_unknown_model.py+ E2E).pipelex-dev update-gateway-modelsoffline → clear refusal, no stale docs written (manual viaPIPELEX_REMOTE_CONFIG_URL=http://127.0.0.1:1/..., pinned intest_require_fresh_refuses_cache).Documentation
Docs updated in this branch:
docs/tools/cli/agent-cli.md— added thewarningsarray to the agent CLI JSON success contract. The branch introduces this top-level envelope field (warnings: [{"type": "RemoteConfigStale", ...}]) but the machine-facing output contract didn't document it. Added a JSON example and noted thatRemoteConfigStaleis emitted on offline cache fallback.docs/tools/cli/init.md— added an "Offline cache priming" note:pipelex initnow primes~/.pipelex/cache/remote_config.jsonwhen the gateway is enabled, and warns (without failing) when run offline.docs/features/gateway.md— added an "Offline Behavior" section explaining BYOK vs Gateway offline modes, the cache fallback, the stale warning, and theRemoteConfigUnavailableErrorcold-cache case.CHANGELOG
[Unreleased]already documents the feature accurately and follows the repo's Keep-a-Changelog style — no voice changes needed.Documentation Debt
Remaining gap — shipped surface with no coverage in the docs site:
PIPELEX_REMOTE_CONFIG_URL— the new env var is not documented underdocs/configuration/. Reference gap; niche (staging/testing override), low priority.No architecture diagrams reference the changed modules — no diagram drift.
🤖 Generated with Claude Code
Summary by cubic
Adds offline setup, validation, and dry-run for the Pipelex Gateway by caching the remote config to disk and falling back when the service is unreachable; only live inference still needs the network. Also surfaces stale-cache warnings in the agent CLI JSON and prefers a local
.pipelex/project config over the global one.New Features
~/.pipelex/cache/remote_config.json, primed on successful fetches and duringpipelex init;RemoteConfigFetcher.fetch_remote_config(require_fresh=False)returnsRemoteConfigResult { config, source (FRESH|CACHED), cached_at }, and doc/fixture tools userequire_fresh=Trueto refuse cached data.RemoteConfigStaleWarningwhen falling back to cache and attach it to the agent CLI success envelope aswarnings: [...]; disable telemetry on cached configs; BYOK skips remote fetch and runs fully offline; supportPIPELEX_REMOTE_CONFIG_URLoverride.GatewayUnknownModelErrorwhen a deck references a missing gateway model, with source-aware hints based on fresh vs cached specs.Bug Fixes
.pipelex/as a project root marker so local config isn’t ignored; harden alias/waterfall resolution with cycle detection and full fallback expansion.RemoteConfigUnavailableError; cached JSON validation issues no longer leak raw errors;preprocess-test-modelsnow propagatesrequire_fresh=Truerefusals instead of silently generating empty entries.pipelex initcache priming now verifies the cache exists and re-validates the cached payload; if the cache write or validation fails, it reports clear remediation instead of misreporting success.Written for commit 2170259. Summary will update on new commits. Review in cubic