feat: telemetry coverage for read-side commands + paid-tier fallback (3.1.0)#82
Merged
Mikola Lysenko (mikolalysenko) merged 11 commits intoMay 26, 2026
Conversation
is_telemetry_disabled() now returns true when SOCKET_OFFLINE is "1" or "true". Airgap mode promises "never contact the network"; the telemetry endpoint is a network call, so honoring SOCKET_OFFLINE here keeps every command (apply, remove, rollback — plus future scan/get/etc.) compliant without requiring per-command gating. Adds three integration tests in telemetry_helpers_e2e.rs and extends the existing test_is_telemetry_disabled unit test with the new branch (including "0" and "" non-truthy values). Assisted-by: Claude Code:opus-4-7
…eeping + vex Extends PatchTelemetryEventType with 12 new variants covering scan, get (emits patch_fetched / patch_fetch_failed for symmetry with the existing apply naming convention), list, repair, setup, unlock, and the new vex (OpenVEX) command. Adds matching convenience tracker functions that funnel through the existing track_patch_event send path — no new HTTP plumbing. The scan/get trackers carry a fallback_to_proxy flag so we can measure how often the auth endpoint downgrades to the public proxy once that fallback path lands. No call sites yet — wiring into each command file follows in subsequent commits so this commit stays a pure data-model addition. Assisted-by: Claude Code:opus-4-7
…air, unlock, vex Each command now fires a success/failure event through the existing track_patch_event send path. Concrete coverage: - list: patch_listed (count surfaced) - setup: patch_setup (detected package manager: npm/pnpm) - unlock: patch_unlocked (was_held + released metadata) + patch_unlock_failed - repair: patch_repaired (downloaded + cleaned counts) + patch_repair_failed - scan: patch_scanned (per-tier counts, can_access_paid, ecosystems, fallback_to_proxy=false placeholder) + patch_scan_failed when every batch errored (previously hidden as "zero patches found") - get (UUID path only for now): patch_fetched on success, patch_fetch_failed on paid_required / not_found / API error. CVE/GHSA/PURL search-error paths also surface patch_fetch_failed. - vex: vex_generated on success, vex_failed via a small async helper that wraps each emit_envelope_error call site. Renamed the unlock tracker's "broken" parameter to "released" — unlock never breaks a held lock (that's `--break-lock` on mutating subcommands); the bool actually describes whether the lock file was removed. No new HTTP plumbing; trackers reuse track_patch_event. Behavior preserved on existing apply/remove/rollback paths. Assisted-by: Claude Code:opus-4-7
Three new cargo tests in scan_invariants.rs covering patch-management behaviors the existing matrix didn't pin down: - scan_prune_keeps_entry_when_package_installed_but_api_silent: a manifest entry must survive --prune when the underlying package is still installed locally but the API has fallen silent on patches for it. Pins the current --prune scope (crawl-absence, not API-absence) so a future regression to over-pruning is loud. - scan_prune_removes_withdrawn_patch_entry: when the underlying package is uninstalled (no longer in crawl results), --prune removes the manifest entry even with a stale blob still on disk. The blob is left for the existing repair-side GC to handle. - scan_detects_update_without_touching_existing_blobs: a newer UUID from the API surfaces in the `updates` array, but scan without --apply must leave the on-disk manifest and blobs byte-for-byte unchanged. Read-only invariant. Assisted-by: Claude Code:opus-4-7
…airgap
New tests/telemetry_e2e.rs spawns the released binary against a
wiremock server that fronts both the patches endpoints AND the
telemetry endpoint, then counts POSTs against
/v0/orgs/{slug}/telemetry filtered by event_type.
Coverage:
- scan_emits_patch_scanned_telemetry_on_success
- list_emits_patch_listed_telemetry_when_telemetry_enabled
- get_emits_patch_fetched_telemetry_on_uuid_lookup_success
(tolerates either fetched/fetch_failed — the apply step is allowed
to fail in the test env; the invariant is that *some* event fires)
- {apply,scan,get,list}_skips_telemetry_in_airgap_mode — confirms the
central is_telemetry_disabled() gate suppresses everything when
SOCKET_OFFLINE=1, regardless of command.
Caught a real test-only bug along the way: send_telemetry_event reads
SOCKET_API_URL from the *environment*, not from the clap --api-url
arg. The test harness now sets both env + flag so the telemetry POST
lands on the same mock recording the API requests.
Assisted-by: Claude Code:opus-4-7
Adds `build_proxy_fallback_client(&overrides)` + `is_fallback_candidate(&err)` in api/client.rs. The constructor builds a public-proxy-mode ApiClient from the same overrides used by `get_api_client_with_overrides`, deliberately dropping the auth token. The classifier flags 401/403 errors as fallback-eligible; everything else (404, 5xx, network, rate-limit, parse) surfaces unchanged. `scan.rs` and `get.rs` (UUID path) catch the first such error from the authenticated endpoint, log a warning to stderr, rebuild the client, retry the same request once, and continue. A new `fallback_to_proxy` bool plumbed through to the existing telemetry trackers carries the incidence into observability. Behavior is deliberately conservative: - Read commands only — `apply`/`remove`/`rollback`/`vex` keep their pre-existing fail-loud-on-auth semantics. - 404, 5xx, network, parse errors do NOT trigger fallback; they surface as before so backend issues stay visible. - Free patches still resolve via the proxy; paid patches return the same "paid_required" structured error the no-token path already emits. Assisted-by: Claude Code:opus-4-7
Two new tests in telemetry_e2e.rs: - scan_falls_back_to_proxy_on_401_and_tags_telemetry: stands up two mock servers (auth endpoint 401s, proxy endpoint succeeds), asserts scan exits 0 after the swap, the fallback warning hits stderr, and the resulting patch_scanned event carries fallback_to_proxy: true in metadata. - scan_does_not_fall_back_on_500: pins the conservative scope of the classifier. A 500 from the auth endpoint must NOT trigger the proxy retry — backend errors should stay visible. Asserts zero hits against the proxy mock and no fallback warning on stderr. Assisted-by: Claude Code:opus-4-7
Workspace Cargo.toml, all npm wrapper + per-platform packages, and PyPI pyproject.toml synced via scripts/version-sync.sh (with manual fixup for the per-platform packages since npm install couldn't process the workspace catalog: protocol). CHANGELOG entry covers: telemetry events across the read-side and housekeeping commands, the 401/403 auth → public-proxy fallback in scan/get, the SOCKET_OFFLINE airgap gate, and the new behavioral + lifecycle test coverage that backs all of it. Assisted-by: Claude Code:opus-4-7
cargo clippy --workspace --all-features -- -D warnings flagged track_patch_scanned at 8/7 args. Grouping the per-tier counts + ecosystems list + fallback flag + auth tuple into a struct would force every call site to build a config object for a single fire-and-forget tracker — worse ergonomics. Annotating the lint is the right call; `track_patch_event` already exists for callers that want full control. Assisted-by: Claude Code:opus-4-7
The dashboard displays an SRI-format hash (`sha512-<base64>`) of each API token for identification — that's the value stored in api_tokens.hash, NOT what to set in SOCKET_API_TOKEN. Users who copy the displayed hash hit a confusing 401 "Invalid API token" with no hint about the mistake. Adds two pure helpers in api/client.rs: - validate_token_shape() — non-authoritative shape check against sktsec_<44>_api / sktsec_<44>_agent. Returns a redacted-preview warning message when the shape is obviously wrong. - looks_like_token_hash() — true for sha256-/sha384-/sha512- prefixes. Wires them into: - get_api_client_with_overrides — warns on stderr before the first network call when the configured token is malformed. - resolve_org_slug's 401 branch — appends a "you set the hash, not the token" hint when both conditions are met (Unauthorized + the token starts with sha###-). Six new unit tests cover the canonical + agent shapes, the SRI hash, short tokens, missing suffix, and the SRI-prefix detector. README's env-var table now spells out the distinction in one sentence. Pure additive — valid tokens see no output. The server's regex remains the source of truth; we only flag values that are obviously malformed client-side so the user doesn't waste a round trip. Assisted-by: Claude Code:opus-4-7
Setting SOCKET_OFFLINE=1, SOCKET_DEBUG=1, or any other bool global
arg via env crashed at clap parse time:
error: invalid value '1' for '--offline'
[possible values: true, false]
clap's default bool parser only accepts "true"/"false". The internal
env-mirroring in apply_env_toggles() already writes "1" when a flag
is passed (so downstream code in telemetry.rs reads "1" via
read_env_with_legacy), and that internal read-side accepts both "1"
and "true". The user-facing input side was the asymmetric piece.
Wires BoolishValueParser (accepts "true"/"false"/"yes"/"no"/"1"/"0"/
"on"/"off"/"y"/"n") onto every bool global with an env attribute:
offline, global, json, verbose, silent, dry_run, yes, break_lock,
debug, no_telemetry.
CLI flag usage (--debug, --offline, etc.) is unchanged. Env var
usage now matches the canonical "1 means yes" convention every
operator expects.
Assisted-by: Claude Code:opus-4-7
Wenxin Jiang (Wenxin-Jiang)
approved these changes
May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PatchTelemetryEventTypevariants + thirteen tracker functions; all flow through the existingtrack_patch_eventsend path.scanandgetautomatically fall back from the authenticated API to the public proxy on 401/403 (a stale/revoked token no longer blocks free patches). Warning to stderr; resulting telemetry event taggedfallback_to_proxy: true. Conservative classifier: 404, 5xx, network, and rate-limit errors do NOT trigger fallback.SOCKET_OFFLINE=1(airgap) now disables telemetry universally viais_telemetry_disabled(), so apply no longer attempts a 5-second telemetry POST againstapi.socket.devwhen the operator explicitly requested airgap.Test plan
cargo test --workspace --all-features— exit 0 locally.tests/telemetry_e2e.rs— apply/scan/get/list each fire telemetry against a wiremock recorder;SOCKET_OFFLINE=1produces zero/telemetryPOSTs for all four; scan falls back on 401 + tags the resulting event; scan does NOT fall back on 500.scan_invariants.rs— withdrawn-patch lifecycle (preserve-on-API-silence, prune-on-uninstall, scan-without-apply-is-read-only).telemetry_helpers_e2e.rs—SOCKET_OFFLINEbranch ofis_telemetry_disabled(truthy + non-truthy values).## [3.1.0]reads accurately.Notes for reviewers
apply/remove/rollback/vexkeep their fail-loud semantics — the proxy fallback is intentionally read-side only.scripts/version-sync.sh(npm workspacecatalog:protocol blocked the npm install step; per-platform packages + pyproject finished manually).Assisted-by: Claude Code:opus-4-7