feat: add send_agent_activity endpoint#175
Draft
amascia-gg wants to merge 3 commits into
Draft
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #175 +/- ##
==========================================
- Coverage 95.70% 95.69% -0.01%
==========================================
Files 5 5
Lines 1444 1465 +21
==========================================
+ Hits 1382 1402 +20
- Misses 62 63 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Draft
2 tasks
amascia-gg
added a commit
to GitGuardian/ggshield
that referenced
this pull request
Jun 2, 2026
Extend the AI-agent history framework to ship per-event *usage metadata* (not just MCP PreToolUse calls) to GitGuardian, powering usage dashboards. - Per-agent sources (Claude, Codex, Cursor) walk transcripts/SQLite and, in a fail-closed `serialize()`, emit only an allow-list of safe structured fields (event type, tool name, model, timestamps, …). Free text — prompts, command strings, tool inputs/outputs, file contents — is never sent, so no secret or PII ever leaves the machine. - Home paths are anonymised (`/Users/x` -> `~`); per-record size cap + byte-batching bound payloads. - Base `ActivitySource.serialize` raises by default: a source can never accidentally ship raw content. - Renamed the package raw_history -> agent_activity (it is metadata, not raw). - Review fixes to the original framework: `GGClient` typing + `Detail` error handling in the orchestrator, and the home-path leak fix. - Bump pygitguardian to the commit adding `send_agent_activity()` (GitGuardian/py-gitguardian#175); re-pin once that merges. Builds on #1244 / #1257 (MCP history framework). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
amascia-gg
added a commit
to GitGuardian/ggshield
that referenced
this pull request
Jun 2, 2026
Extend the AI-agent history framework to ship per-event *usage metadata* (not just MCP PreToolUse calls) to GitGuardian, powering usage dashboards. - Per-agent sources (Claude, Codex, Cursor) walk transcripts/SQLite and, in a fail-closed `serialize()`, emit only an allow-list of safe structured fields (event type, tool name, model, timestamps, …). Free text — prompts, command strings, tool inputs/outputs, file contents — is never sent, so no secret or PII ever leaves the machine. - Home paths are anonymised (`/Users/x` -> `~`); per-record size cap + byte-batching bound payloads. - Base `ActivitySource.serialize` raises by default: a source can never accidentally ship raw content. - Renamed the package raw_history -> agent_activity (it is metadata, not raw). - Review fixes to the original framework: `GGClient` typing + `Detail` error handling in the orchestrator, and the home-path leak fix. - Bump pygitguardian to the commit adding `send_agent_activity()` (GitGuardian/py-gitguardian#175); re-pin once that merges. Builds on #1244 / #1257 (MCP history framework). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
amascia-gg
added a commit
to GitGuardian/ggshield
that referenced
this pull request
Jun 2, 2026
Extend the AI-agent history framework to ship per-event *usage metadata* (not just MCP PreToolUse calls) to GitGuardian, powering usage dashboards. - Per-agent sources (Claude, Codex, Cursor) walk transcripts/SQLite and, in a fail-closed `serialize()`, emit only an allow-list of safe structured fields (event type, tool name, model, timestamps, …). Free text — prompts, command strings, tool inputs/outputs, file contents — is never sent, so no secret or PII ever leaves the machine. - Home paths are anonymised (`/Users/x` -> `~`); per-record size cap + byte-batching bound payloads. - Base `ActivitySource.serialize` raises by default: a source can never accidentally ship raw content. - Renamed the package raw_history -> agent_activity (it is metadata, not raw). - Review fixes to the original framework: `GGClient` typing + `Detail` error handling in the orchestrator, and the home-path leak fix. - Bump pygitguardian to the commit adding `send_agent_activity()` (GitGuardian/py-gitguardian#175); re-pin once that merges. Builds on #1244 / #1257 (MCP history framework). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
amascia-gg
added a commit
to GitGuardian/ggshield
that referenced
this pull request
Jun 2, 2026
Extend the AI-agent history framework to ship per-event *usage metadata* (not just MCP PreToolUse calls) to GitGuardian, powering usage dashboards. - Per-agent sources (Claude, Codex, Cursor) walk transcripts/SQLite and, in a fail-closed `serialize()`, emit only an allow-list of safe structured fields (event type, tool name, model, timestamps, …). Free text — prompts, command strings, tool inputs/outputs, file contents — is never sent, so no secret or PII ever leaves the machine. - Home paths are anonymised (`/Users/x` -> `~`); per-record size cap + byte-batching bound payloads. - Base `ActivitySource.serialize` raises by default: a source can never accidentally ship raw content. - Renamed the package raw_history -> agent_activity (it is metadata, not raw). - Review fixes to the original framework: `GGClient` typing + `Detail` error handling in the orchestrator, and the home-path leak fix. - Bump pygitguardian to the commit adding `send_agent_activity()` (GitGuardian/py-gitguardian#175); re-pin once that merges. Builds on #1244 / #1257 (MCP history framework). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
amascia-gg
added a commit
to GitGuardian/ggshield
that referenced
this pull request
Jun 2, 2026
Extend the AI-agent history framework to ship per-event *usage metadata* (not just MCP PreToolUse calls) to GitGuardian, powering usage dashboards. - Per-agent sources (Claude, Codex, Cursor) walk transcripts/SQLite and, in a fail-closed `serialize()`, emit only an allow-list of safe structured fields (event type, tool name, model, timestamps, …). Free text — prompts, command strings, tool inputs/outputs, file contents — is never sent, so no secret or PII ever leaves the machine. - Home paths are anonymised (`/Users/x` -> `~`); per-record size cap + byte-batching bound payloads. - Base `ActivitySource.serialize` raises by default: a source can never accidentally ship raw content. - Renamed the package raw_history -> agent_activity (it is metadata, not raw). - Review fixes to the original framework: `GGClient` typing + `Detail` error handling in the orchestrator, and the home-path leak fix. - Bump pygitguardian to the commit adding `send_agent_activity()` (GitGuardian/py-gitguardian#175); re-pin once that merges. Builds on #1244 / #1257 (MCP history framework). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
d8f54ff to
d0dbef6
Compare
amascia-gg
added a commit
to GitGuardian/ggshield
that referenced
this pull request
Jun 2, 2026
…i discover --history` Ship full AI-agent session activity (not just MCP calls) to a raw staging table, keeping the client "dumb" and storing no detected secrets: - ggshield ships the agent's RAW transcript lines / DB rows verbatim — the data shape never depends on the ggshield version (no per-agent field extraction). - Before sending, each batch is scanned via the GitGuardian secret-scan API (multi_content_scan); detected secret spans are redacted client-side and home paths anonymised. Fail-closed: a batch that can't be scanned is dropped, never shipped. - Per-record size cap + byte-batching; re-scan every run (offset-skip is a follow-up). - Review fixes to the framework: GGClient typing + Detail handling in the orchestrator; POSIX source paths. - Bumps pygitguardian to send_agent_activity (GitGuardian/py-gitguardian#175). Aligns with the design doc's staging → canonical structure; the canonical typed table + per-agent adapters are a follow-up MR. Builds on #1244 / #1257. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
amascia-gg
added a commit
to GitGuardian/ggshield
that referenced
this pull request
Jun 2, 2026
…i discover --history` Ship full AI-agent session activity (not just MCP calls) to a raw staging table, keeping the client "dumb" and storing no detected secrets: - ggshield ships the agent's RAW transcript lines / DB rows verbatim — the data shape never depends on the ggshield version (no per-agent field extraction). - Before sending, each batch is scanned via the GitGuardian secret-scan API (multi_content_scan); detected secret spans are redacted client-side and home paths anonymised. Fail-closed: a batch that can't be scanned is dropped, never shipped. - Per-record size cap + byte-batching; re-scan every run (offset-skip is a follow-up). - Review fixes to the framework: GGClient typing + Detail handling in the orchestrator; POSIX source paths. - Bumps pygitguardian to send_agent_activity (GitGuardian/py-gitguardian#175). Aligns with the design doc's staging → canonical structure; the canonical typed table + per-agent adapters are a follow-up MR. Builds on #1244 / #1257. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a client method to ship batches of AI-agent activity records (one per transcript line or database row) to GitGuardian. Records are opaque dicts sent verbatim; GitGuardian scans them and strips secrets server-side before storing them. Returns ingested/duplicate counts via the new AgentActivityResponse model, or a Detail on error. This is the SDK side of ggshield's `ai discover --history` agent-activity collection. Issue: NHI-1628 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an optional user (serialised UserInfo) to send_agent_activity; when given it is posted alongside the events so the server can store the machine_id with each record and attribute activity / correlate it with the machine inventory. Issue: NHI-1628 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
34ce988 to
9e9c9e6
Compare
The server now reports how many records it could not scan and dropped (never stored). Parse it into AgentActivityResponse.dropped, defaulting to 0 so older servers that omit the field still load. Issue: NHI-1628 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9e9c9e6 to
72f9e86
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
SDK side of ggshield's
ggshield ai discover --history: ggshield ships raw AI-agent activity records (transcript lines / DB rows) to GitGuardian, with secrets and home paths stripped client-side.What has been done
send_agent_activity(events): posts a batch of opaque records toPOST /v1/nhi/ai/activity, returns ingested/duplicate counts.AgentActivityResponsemodel + schema.Related
ward-runs-appMR (staging table + endpoint).🤖 Generated with Claude Code