Skip to content

feat: add send_agent_activity endpoint#175

Draft
amascia-gg wants to merge 3 commits into
masterfrom
amascia/agent-activity-endpoint
Draft

feat: add send_agent_activity endpoint#175
amascia-gg wants to merge 3 commits into
masterfrom
amascia/agent-activity-endpoint

Conversation

@amascia-gg
Copy link
Copy Markdown
Member

@amascia-gg amascia-gg commented Jun 1, 2026

Context

SDK side of ggshield's ggshield ai discover --history: ggshield ships raw AI-agent activity records (transcript lines / DB rows) to GitGuardian, with secrets and home paths stripped client-side.

What has been done

  • New client method send_agent_activity(events): posts a batch of opaque records to POST /v1/nhi/ai/activity, returns ingested/duplicate counts.
  • New AgentActivityResponse model + schema.
  • Records are kept opaque by the SDK — the client strips secrets before sending; they land in the server staging table.
  • Tests (model round-trip + client cassette) + changelog fragment.

Related

Draft.

🤖 Generated with Claude Code

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 1, 2026

Codecov Report

❌ Patch coverage is 95.23810% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 95.69%. Comparing base (d3152b7) to head (72f9e86).

Files with missing lines Patch % Lines
pygitguardian/client.py 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #175      +/-   ##
==========================================
- Coverage   95.70%   95.69%   -0.01%     
==========================================
  Files           5        5              
  Lines        1444     1465      +21     
==========================================
+ Hits         1382     1402      +20     
- Misses         62       63       +1     
Flag Coverage Δ
unittests 95.69% <95.23%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@amascia-gg amascia-gg self-assigned this Jun 2, 2026
amascia-gg added a commit to GitGuardian/ggshield that referenced this pull request Jun 2, 2026
Extend the AI-agent history framework to ship per-event *usage metadata*
(not just MCP PreToolUse calls) to GitGuardian, powering usage dashboards.

- Per-agent sources (Claude, Codex, Cursor) walk transcripts/SQLite and, in a
  fail-closed `serialize()`, emit only an allow-list of safe structured fields
  (event type, tool name, model, timestamps, …). Free text — prompts, command
  strings, tool inputs/outputs, file contents — is never sent, so no secret or
  PII ever leaves the machine.
- Home paths are anonymised (`/Users/x` -> `~`); per-record size cap +
  byte-batching bound payloads.
- Base `ActivitySource.serialize` raises by default: a source can never
  accidentally ship raw content.
- Renamed the package raw_history -> agent_activity (it is metadata, not raw).
- Review fixes to the original framework: `GGClient` typing + `Detail` error
  handling in the orchestrator, and the home-path leak fix.
- Bump pygitguardian to the commit adding `send_agent_activity()`
  (GitGuardian/py-gitguardian#175); re-pin once that merges.

Builds on #1244 / #1257 (MCP history framework).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
amascia-gg added a commit to GitGuardian/ggshield that referenced this pull request Jun 2, 2026
Extend the AI-agent history framework to ship per-event *usage metadata*
(not just MCP PreToolUse calls) to GitGuardian, powering usage dashboards.

- Per-agent sources (Claude, Codex, Cursor) walk transcripts/SQLite and, in a
  fail-closed `serialize()`, emit only an allow-list of safe structured fields
  (event type, tool name, model, timestamps, …). Free text — prompts, command
  strings, tool inputs/outputs, file contents — is never sent, so no secret or
  PII ever leaves the machine.
- Home paths are anonymised (`/Users/x` -> `~`); per-record size cap +
  byte-batching bound payloads.
- Base `ActivitySource.serialize` raises by default: a source can never
  accidentally ship raw content.
- Renamed the package raw_history -> agent_activity (it is metadata, not raw).
- Review fixes to the original framework: `GGClient` typing + `Detail` error
  handling in the orchestrator, and the home-path leak fix.
- Bump pygitguardian to the commit adding `send_agent_activity()`
  (GitGuardian/py-gitguardian#175); re-pin once that merges.

Builds on #1244 / #1257 (MCP history framework).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
amascia-gg added a commit to GitGuardian/ggshield that referenced this pull request Jun 2, 2026
Extend the AI-agent history framework to ship per-event *usage metadata*
(not just MCP PreToolUse calls) to GitGuardian, powering usage dashboards.

- Per-agent sources (Claude, Codex, Cursor) walk transcripts/SQLite and, in a
  fail-closed `serialize()`, emit only an allow-list of safe structured fields
  (event type, tool name, model, timestamps, …). Free text — prompts, command
  strings, tool inputs/outputs, file contents — is never sent, so no secret or
  PII ever leaves the machine.
- Home paths are anonymised (`/Users/x` -> `~`); per-record size cap +
  byte-batching bound payloads.
- Base `ActivitySource.serialize` raises by default: a source can never
  accidentally ship raw content.
- Renamed the package raw_history -> agent_activity (it is metadata, not raw).
- Review fixes to the original framework: `GGClient` typing + `Detail` error
  handling in the orchestrator, and the home-path leak fix.
- Bump pygitguardian to the commit adding `send_agent_activity()`
  (GitGuardian/py-gitguardian#175); re-pin once that merges.

Builds on #1244 / #1257 (MCP history framework).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
amascia-gg added a commit to GitGuardian/ggshield that referenced this pull request Jun 2, 2026
Extend the AI-agent history framework to ship per-event *usage metadata*
(not just MCP PreToolUse calls) to GitGuardian, powering usage dashboards.

- Per-agent sources (Claude, Codex, Cursor) walk transcripts/SQLite and, in a
  fail-closed `serialize()`, emit only an allow-list of safe structured fields
  (event type, tool name, model, timestamps, …). Free text — prompts, command
  strings, tool inputs/outputs, file contents — is never sent, so no secret or
  PII ever leaves the machine.
- Home paths are anonymised (`/Users/x` -> `~`); per-record size cap +
  byte-batching bound payloads.
- Base `ActivitySource.serialize` raises by default: a source can never
  accidentally ship raw content.
- Renamed the package raw_history -> agent_activity (it is metadata, not raw).
- Review fixes to the original framework: `GGClient` typing + `Detail` error
  handling in the orchestrator, and the home-path leak fix.
- Bump pygitguardian to the commit adding `send_agent_activity()`
  (GitGuardian/py-gitguardian#175); re-pin once that merges.

Builds on #1244 / #1257 (MCP history framework).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
amascia-gg added a commit to GitGuardian/ggshield that referenced this pull request Jun 2, 2026
Extend the AI-agent history framework to ship per-event *usage metadata*
(not just MCP PreToolUse calls) to GitGuardian, powering usage dashboards.

- Per-agent sources (Claude, Codex, Cursor) walk transcripts/SQLite and, in a
  fail-closed `serialize()`, emit only an allow-list of safe structured fields
  (event type, tool name, model, timestamps, …). Free text — prompts, command
  strings, tool inputs/outputs, file contents — is never sent, so no secret or
  PII ever leaves the machine.
- Home paths are anonymised (`/Users/x` -> `~`); per-record size cap +
  byte-batching bound payloads.
- Base `ActivitySource.serialize` raises by default: a source can never
  accidentally ship raw content.
- Renamed the package raw_history -> agent_activity (it is metadata, not raw).
- Review fixes to the original framework: `GGClient` typing + `Detail` error
  handling in the orchestrator, and the home-path leak fix.
- Bump pygitguardian to the commit adding `send_agent_activity()`
  (GitGuardian/py-gitguardian#175); re-pin once that merges.

Builds on #1244 / #1257 (MCP history framework).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@amascia-gg amascia-gg force-pushed the amascia/agent-activity-endpoint branch from d8f54ff to d0dbef6 Compare June 2, 2026 10:56
amascia-gg added a commit to GitGuardian/ggshield that referenced this pull request Jun 2, 2026
…i discover --history`

Ship full AI-agent session activity (not just MCP calls) to a raw staging
table, keeping the client "dumb" and storing no detected secrets:

- ggshield ships the agent's RAW transcript lines / DB rows verbatim — the data
  shape never depends on the ggshield version (no per-agent field extraction).
- Before sending, each batch is scanned via the GitGuardian secret-scan API
  (multi_content_scan); detected secret spans are redacted client-side and home
  paths anonymised. Fail-closed: a batch that can't be scanned is dropped, never
  shipped.
- Per-record size cap + byte-batching; re-scan every run (offset-skip is a
  follow-up).
- Review fixes to the framework: GGClient typing + Detail handling in the
  orchestrator; POSIX source paths.
- Bumps pygitguardian to send_agent_activity (GitGuardian/py-gitguardian#175).

Aligns with the design doc's staging → canonical structure; the canonical typed
table + per-agent adapters are a follow-up MR. Builds on #1244 / #1257.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
amascia-gg added a commit to GitGuardian/ggshield that referenced this pull request Jun 2, 2026
…i discover --history`

Ship full AI-agent session activity (not just MCP calls) to a raw staging
table, keeping the client "dumb" and storing no detected secrets:

- ggshield ships the agent's RAW transcript lines / DB rows verbatim — the data
  shape never depends on the ggshield version (no per-agent field extraction).
- Before sending, each batch is scanned via the GitGuardian secret-scan API
  (multi_content_scan); detected secret spans are redacted client-side and home
  paths anonymised. Fail-closed: a batch that can't be scanned is dropped, never
  shipped.
- Per-record size cap + byte-batching; re-scan every run (offset-skip is a
  follow-up).
- Review fixes to the framework: GGClient typing + Detail handling in the
  orchestrator; POSIX source paths.
- Bumps pygitguardian to send_agent_activity (GitGuardian/py-gitguardian#175).

Aligns with the design doc's staging → canonical structure; the canonical typed
table + per-agent adapters are a follow-up MR. Builds on #1244 / #1257.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
amascia-gg and others added 2 commits June 3, 2026 22:20
Add a client method to ship batches of AI-agent activity records (one per
transcript line or database row) to GitGuardian. Records are opaque dicts sent
verbatim; GitGuardian scans them and strips secrets server-side before storing
them. Returns ingested/duplicate counts via the new AgentActivityResponse
model, or a Detail on error.

This is the SDK side of ggshield's `ai discover --history` agent-activity
collection.

Issue: NHI-1628

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an optional user (serialised UserInfo) to send_agent_activity; when given
it is posted alongside the events so the server can store the machine_id with
each record and attribute activity / correlate it with the machine inventory.

Issue: NHI-1628

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@amascia-gg amascia-gg force-pushed the amascia/agent-activity-endpoint branch from 34ce988 to 9e9c9e6 Compare June 3, 2026 20:22
The server now reports how many records it could not scan and dropped (never
stored). Parse it into AgentActivityResponse.dropped, defaulting to 0 so older
servers that omit the field still load.

Issue: NHI-1628

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@paulpetit-gg-ext paulpetit-gg-ext force-pushed the amascia/agent-activity-endpoint branch from 9e9c9e6 to 72f9e86 Compare June 4, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants