feat: telemetry coverage for read-side commands + paid-tier fallback (3.1.0) by mikolalysenko · Pull Request #82 · SocketDev/socket-patch

Mikola Lysenko (mikolalysenko) · 2026-05-26T20:32:49Z

Summary

Telemetry now covers every CLI command — scan, get, list, setup, repair, unlock, and the new vex (OpenVEX) — joining the apply/remove/rollback trio that already shipped. Twelve new PatchTelemetryEventType variants + thirteen tracker functions; all flow through the existing track_patch_event send path.
scan and get automatically fall back from the authenticated API to the public proxy on 401/403 (a stale/revoked token no longer blocks free patches). Warning to stderr; resulting telemetry event tagged fallback_to_proxy: true. Conservative classifier: 404, 5xx, network, and rate-limit errors do NOT trigger fallback.
SOCKET_OFFLINE=1 (airgap) now disables telemetry universally via is_telemetry_disabled(), so apply no longer attempts a 5-second telemetry POST against api.socket.dev when the operator explicitly requested airgap.

Test plan

cargo test --workspace --all-features — exit 0 locally.
New cargo coverage:
- tests/telemetry_e2e.rs — apply/scan/get/list each fire telemetry against a wiremock recorder; SOCKET_OFFLINE=1 produces zero /telemetry POSTs for all four; scan falls back on 401 + tags the resulting event; scan does NOT fall back on 500.
- scan_invariants.rs — withdrawn-patch lifecycle (preserve-on-API-silence, prune-on-uninstall, scan-without-apply-is-read-only).
- telemetry_helpers_e2e.rs — SOCKET_OFFLINE branch of is_telemetry_disabled (truthy + non-truthy values).
Reviewer to confirm CHANGELOG entry under ## [3.1.0] reads accurately.

Notes for reviewers

No behavior change for apply/remove/rollback beyond the airgap gate.
apply / remove / rollback / vex keep their fail-loud semantics — the proxy fallback is intentionally read-side only.
Version sync via scripts/version-sync.sh (npm workspace catalog: protocol blocked the npm install step; per-platform packages + pyproject finished manually).

Assisted-by: Claude Code:opus-4-7

is_telemetry_disabled() now returns true when SOCKET_OFFLINE is "1" or "true". Airgap mode promises "never contact the network"; the telemetry endpoint is a network call, so honoring SOCKET_OFFLINE here keeps every command (apply, remove, rollback — plus future scan/get/etc.) compliant without requiring per-command gating. Adds three integration tests in telemetry_helpers_e2e.rs and extends the existing test_is_telemetry_disabled unit test with the new branch (including "0" and "" non-truthy values). Assisted-by: Claude Code:opus-4-7

…eeping + vex Extends PatchTelemetryEventType with 12 new variants covering scan, get (emits patch_fetched / patch_fetch_failed for symmetry with the existing apply naming convention), list, repair, setup, unlock, and the new vex (OpenVEX) command. Adds matching convenience tracker functions that funnel through the existing track_patch_event send path — no new HTTP plumbing. The scan/get trackers carry a fallback_to_proxy flag so we can measure how often the auth endpoint downgrades to the public proxy once that fallback path lands. No call sites yet — wiring into each command file follows in subsequent commits so this commit stays a pure data-model addition. Assisted-by: Claude Code:opus-4-7

…air, unlock, vex Each command now fires a success/failure event through the existing track_patch_event send path. Concrete coverage: - list: patch_listed (count surfaced) - setup: patch_setup (detected package manager: npm/pnpm) - unlock: patch_unlocked (was_held + released metadata) + patch_unlock_failed - repair: patch_repaired (downloaded + cleaned counts) + patch_repair_failed - scan: patch_scanned (per-tier counts, can_access_paid, ecosystems, fallback_to_proxy=false placeholder) + patch_scan_failed when every batch errored (previously hidden as "zero patches found") - get (UUID path only for now): patch_fetched on success, patch_fetch_failed on paid_required / not_found / API error. CVE/GHSA/PURL search-error paths also surface patch_fetch_failed. - vex: vex_generated on success, vex_failed via a small async helper that wraps each emit_envelope_error call site. Renamed the unlock tracker's "broken" parameter to "released" — unlock never breaks a held lock (that's `--break-lock` on mutating subcommands); the bool actually describes whether the lock file was removed. No new HTTP plumbing; trackers reuse track_patch_event. Behavior preserved on existing apply/remove/rollback paths. Assisted-by: Claude Code:opus-4-7

Three new cargo tests in scan_invariants.rs covering patch-management behaviors the existing matrix didn't pin down: - scan_prune_keeps_entry_when_package_installed_but_api_silent: a manifest entry must survive --prune when the underlying package is still installed locally but the API has fallen silent on patches for it. Pins the current --prune scope (crawl-absence, not API-absence) so a future regression to over-pruning is loud. - scan_prune_removes_withdrawn_patch_entry: when the underlying package is uninstalled (no longer in crawl results), --prune removes the manifest entry even with a stale blob still on disk. The blob is left for the existing repair-side GC to handle. - scan_detects_update_without_touching_existing_blobs: a newer UUID from the API surfaces in the `updates` array, but scan without --apply must leave the on-disk manifest and blobs byte-for-byte unchanged. Read-only invariant. Assisted-by: Claude Code:opus-4-7

…airgap New tests/telemetry_e2e.rs spawns the released binary against a wiremock server that fronts both the patches endpoints AND the telemetry endpoint, then counts POSTs against /v0/orgs/{slug}/telemetry filtered by event_type. Coverage: - scan_emits_patch_scanned_telemetry_on_success - list_emits_patch_listed_telemetry_when_telemetry_enabled - get_emits_patch_fetched_telemetry_on_uuid_lookup_success (tolerates either fetched/fetch_failed — the apply step is allowed to fail in the test env; the invariant is that *some* event fires) - {apply,scan,get,list}_skips_telemetry_in_airgap_mode — confirms the central is_telemetry_disabled() gate suppresses everything when SOCKET_OFFLINE=1, regardless of command. Caught a real test-only bug along the way: send_telemetry_event reads SOCKET_API_URL from the *environment*, not from the clap --api-url arg. The test harness now sets both env + flag so the telemetry POST lands on the same mock recording the API requests. Assisted-by: Claude Code:opus-4-7

Adds `build_proxy_fallback_client(&overrides)` + `is_fallback_candidate(&err)` in api/client.rs. The constructor builds a public-proxy-mode ApiClient from the same overrides used by `get_api_client_with_overrides`, deliberately dropping the auth token. The classifier flags 401/403 errors as fallback-eligible; everything else (404, 5xx, network, rate-limit, parse) surfaces unchanged. `scan.rs` and `get.rs` (UUID path) catch the first such error from the authenticated endpoint, log a warning to stderr, rebuild the client, retry the same request once, and continue. A new `fallback_to_proxy` bool plumbed through to the existing telemetry trackers carries the incidence into observability. Behavior is deliberately conservative: - Read commands only — `apply`/`remove`/`rollback`/`vex` keep their pre-existing fail-loud-on-auth semantics. - 404, 5xx, network, parse errors do NOT trigger fallback; they surface as before so backend issues stay visible. - Free patches still resolve via the proxy; paid patches return the same "paid_required" structured error the no-token path already emits. Assisted-by: Claude Code:opus-4-7

Two new tests in telemetry_e2e.rs: - scan_falls_back_to_proxy_on_401_and_tags_telemetry: stands up two mock servers (auth endpoint 401s, proxy endpoint succeeds), asserts scan exits 0 after the swap, the fallback warning hits stderr, and the resulting patch_scanned event carries fallback_to_proxy: true in metadata. - scan_does_not_fall_back_on_500: pins the conservative scope of the classifier. A 500 from the auth endpoint must NOT trigger the proxy retry — backend errors should stay visible. Asserts zero hits against the proxy mock and no fallback warning on stderr. Assisted-by: Claude Code:opus-4-7

Workspace Cargo.toml, all npm wrapper + per-platform packages, and PyPI pyproject.toml synced via scripts/version-sync.sh (with manual fixup for the per-platform packages since npm install couldn't process the workspace catalog: protocol). CHANGELOG entry covers: telemetry events across the read-side and housekeeping commands, the 401/403 auth → public-proxy fallback in scan/get, the SOCKET_OFFLINE airgap gate, and the new behavioral + lifecycle test coverage that backs all of it. Assisted-by: Claude Code:opus-4-7

cargo clippy --workspace --all-features -- -D warnings flagged track_patch_scanned at 8/7 args. Grouping the per-tier counts + ecosystems list + fallback flag + auth tuple into a struct would force every call site to build a config object for a single fire-and-forget tracker — worse ergonomics. Annotating the lint is the right call; `track_patch_event` already exists for callers that want full control. Assisted-by: Claude Code:opus-4-7

The dashboard displays an SRI-format hash (`sha512-<base64>`) of each API token for identification — that's the value stored in api_tokens.hash, NOT what to set in SOCKET_API_TOKEN. Users who copy the displayed hash hit a confusing 401 "Invalid API token" with no hint about the mistake. Adds two pure helpers in api/client.rs: - validate_token_shape() — non-authoritative shape check against sktsec_<44>_api / sktsec_<44>_agent. Returns a redacted-preview warning message when the shape is obviously wrong. - looks_like_token_hash() — true for sha256-/sha384-/sha512- prefixes. Wires them into: - get_api_client_with_overrides — warns on stderr before the first network call when the configured token is malformed. - resolve_org_slug's 401 branch — appends a "you set the hash, not the token" hint when both conditions are met (Unauthorized + the token starts with sha###-). Six new unit tests cover the canonical + agent shapes, the SRI hash, short tokens, missing suffix, and the SRI-prefix detector. README's env-var table now spells out the distinction in one sentence. Pure additive — valid tokens see no output. The server's regex remains the source of truth; we only flag values that are obviously malformed client-side so the user doesn't waste a round trip. Assisted-by: Claude Code:opus-4-7

Setting SOCKET_OFFLINE=1, SOCKET_DEBUG=1, or any other bool global arg via env crashed at clap parse time: error: invalid value '1' for '--offline' [possible values: true, false] clap's default bool parser only accepts "true"/"false". The internal env-mirroring in apply_env_toggles() already writes "1" when a flag is passed (so downstream code in telemetry.rs reads "1" via read_env_with_legacy), and that internal read-side accepts both "1" and "true". The user-facing input side was the asymmetric piece. Wires BoolishValueParser (accepts "true"/"false"/"yes"/"no"/"1"/"0"/ "on"/"off"/"y"/"n") onto every bool global with an env attribute: offline, global, json, verbose, silent, dry_run, yes, break_lock, debug, no_telemetry. CLI flag usage (--debug, --offline, etc.) is unchanged. Env var usage now matches the canonical "1 means yes" convention every operator expects. Assisted-by: Claude Code:opus-4-7

Mikola Lysenko (mikolalysenko) added 9 commits May 26, 2026 15:27

Mikola Lysenko (mikolalysenko) enabled auto-merge (squash) May 26, 2026 20:54

Mikola Lysenko (mikolalysenko) added 2 commits May 26, 2026 17:39

Wenxin Jiang (Wenxin-Jiang) approved these changes May 26, 2026

View reviewed changes

Mikola Lysenko (mikolalysenko) merged commit 1493421 into main May 26, 2026
42 checks passed

Mikola Lysenko (mikolalysenko) deleted the feat/telemetry-coverage-and-paid-fallback branch May 26, 2026 22:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: telemetry coverage for read-side commands + paid-tier fallback (3.1.0)#82

feat: telemetry coverage for read-side commands + paid-tier fallback (3.1.0)#82
Mikola Lysenko (mikolalysenko) merged 11 commits into
mainfrom
feat/telemetry-coverage-and-paid-fallback

Mikola Lysenko (mikolalysenko) commented May 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Mikola Lysenko (mikolalysenko) commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Notes for reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mikola Lysenko (mikolalysenko) commented May 26, 2026 •

edited

Loading