-
Notifications
You must be signed in to change notification settings - Fork 0
Provenance Composition Model
Schema version: v1.1.8
The PCM records how files changed (where, by whom, and by what kind of action) without storing your source code.
Teams want trustworthy insight into file composition (e.g., human vs. AI effort, paste vs. edit) without risking source code exposure. PCM captures events about edits and produces per-file snapshots with counts and ranges—never raw text.
-
Events: Each edit is an event (insert, replace, delete, paste, AI apply, format, tooling).
-
Where: Byte ranges for “before” and “after” (precise positions inside a file).
-
How much: Lines and character counts, plus a content hash (no text).
-
Who: An actor (e.g., user/bot/system), using opaque identifiers.
-
Origin: Was it human, ai, or untracked?
- ai — AI suggestion/application
- human — typing, paste, manual edits
- observed — observed tool output
- untracked — when origin can't be determined
- external — external edits or unknown attribution
- Event — One edit operation to a file.
- Actor — Who performed the edit (e.g., a user or a tool); use non-sensitive IDs.
-
Origin — Broad classification of the edit:
human,ai, oruntracked. -
Snapshot — A per-file JSON summary under
.coderoot/v1/snapshots/that shows spans and totals by origin. - Span — A region of a file (by byte range) with an origin and timestamps.
| Operation | Typical meaning | Size fields present |
|---|---|---|
insert |
New content added | introduced |
replace |
Old content replaced by new |
deleted + introduced
|
delete |
Content removed | deleted |
paste |
Pasted content (treated as a human action) | introduced |
ai_apply |
AI suggestion applied |
introduced (and sometimes deleted) |
format |
Automated formatting | Usually size-neutral |
tooling |
Tool-driven change (e.g., refactor) | Varies |
rename |
File renamed | N/A |
move |
File moved | N/A |
-
Actor: keep it simple and private—opaque ID, optional display name (no emails/tokens).
-
Origin:
- human — typing, paste, manual edits
- ai — AI suggestion/application
- untracked — when origin can’t be determined
Every tracked file can have a snapshot at:
.coderoot/v1/snapshots/<relative-path>.pcm.json
A snapshot includes:
- Spans: regions with an origin and timestamps
-
Summary:
lines_total,lines_by_origin,chars_by_origin, and other safe counts
These power reports (e.g., “% human vs. AI”) without exposing any code.
- ❌ No source code text
- ❌ No clipboard contents
- ❌ No credentials or personal emails
- ❌ No tool internals or proprietary CI details
Event (illustrative):
{
"schema_version": "1.1.8",
"record_type": "pcm_event",
"event_id": "e-123",
"file_path": "src/example.txt",
"op": "insert",
"origin": "human",
"actor": { "id": "u-abc" },
"after": { "range": { "startByte": 0, "endByte": 12 } },
"introduced": {
"lines": 2,
"chars_total": 12,
"hash": { "algo": "ws-sha256", "value": "…" }
}
}Snapshot (illustrative):
{
"schema_version": "1.1.8",
"file_path": "src/example.txt",
"updated_at": "2025-10-07T00:00:00Z",
"spans": [
{ "span_id": "s-1",
"range": {"startByte": 0, "endByte": 12},
"origin":"human",
"introduced_at":"2025-10-07T00:00:00Z",
"last_modified_at":"2025-10-07T00:00:00Z" }
],
"summary": {
"lines_total": 2,
"lines_by_origin": { "human": 2, "ai": 0, "untracked": 0 },
"chars_by_origin": { "human": 12, "ai": 0, "untracked": 0 }
}
}Note: Hashes are shown as
…on purpose. PCM uses hashes to verify content without storing it.
-
Make a tiny change in any file (add one short line).
-
Generate snapshots with your editor integration or CLI.
-
Open the snapshot at
.coderoot/v1/snapshots/<relative-path>.pcm.json. -
Confirm:
-
summary.lines_totalincreased as expected -
lines_by_origin.human(orai) reflects your change - No raw text—only counts, ranges, hashes
-
- Prefer hash-only handling for clipboard-related data.
- Keep
actoridentifiers opaque and local to the repo. - Review snapshots locally; publish only when numbers match expectations.
- Schema: v1.1.7
- Readers are tolerant of earlier 1.1.x data and will normalize older field names where reasonable.
- This page is a public summary. Implementation details live in the (separate) spec and schema files included in this repo.
Does PCM store my code? No. PCM stores ranges, counts, and hashes—never raw text.
What if I paste content?
It’s recorded as a paste operation, and the origin is human.
What if AI applies a change?
It’s recorded as ai_apply with origin ai.
Why byte ranges? They’re precise and resilient to line ending differences. Lines/columns may appear as hints, but byte ranges are the source of truth.
Where do I find the data?
Per-file snapshots live under .coderoot/v1/snapshots/.