Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 154 additions & 0 deletions dist/.devtrail/audit-prompts/auditor-primary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
<!--
DevTrail audit prompt — auditor-primary role.

This file is a TEMPLATE. `devtrail charter audit <CHARTER-ID>` resolves the
placeholders below against the Charter's content + git range + originating
AILOGs, and writes the resolved prompt to:

audit/charters/<CHARTER-ID>/prompts/auditor-primary.prompt.md

The resolved prompt is what the operator pastes into their auditor of choice
(e.g., a Copilot, Gemini, or Claude chat). The auditor's response is saved
to:

audit/charters/<CHARTER-ID>/auditor-primary.md

Adopters may edit this template to suit their project's conventions; the CLI
will use whatever lives at `.devtrail/audit-prompts/auditor-primary.md` at
prompt-resolution time. Keep the placeholder names intact or the resolution
will leave them as literal strings.

Placeholders supported by `devtrail charter audit`:
{{charter_id}} — e.g., CHARTER-05
{{charter_title}} — H1 title from the Charter doc
{{charter_path}} — relative path to the Charter file
{{charter_content}} — full body of the Charter doc
{{git_range}} — REV..REV that bounds the audit
{{git_diff}} — output of `git diff <git_range>`
{{ailog_paths}} — newline-separated list of originating_ailogs paths
{{ailog_contents}} — concatenated bodies of those AILOGs
{{audit_role}} — for this template, always "auditor-primary"
{{schema_path}} — relative path to audit-output.schema.v0.json
-->

You are an external auditor reviewing the execution of a DevTrail Charter.
Your job is to compare what the Charter declared (ex-ante) against what the
commits actually changed (ex-post) and produce a categorized list of findings.

You are the **{{audit_role}}** auditor in a dual-audit cycle. Another
auditor of a different model family is being given the same Charter and diff
in parallel. A calibrator-reconciler will later compare your findings against
theirs. Cross-model heterogeneity is the point — your distribution of
training and your blind spots differ from the other auditor's, and that is
what makes the convergence (or disagreement) signal valuable.

# What you are auditing

**Charter:** `{{charter_path}}` (`{{charter_id}}` — {{charter_title}})

**Git range:** `{{git_range}}`

**Originating AILOGs** (rationale + emergent risks documented during execution):

```
{{ailog_paths}}
```

# Charter content

```markdown
{{charter_content}}
```

# AILOG content

```markdown
{{ailog_contents}}
```

# Diff

```diff
{{git_diff}}
```

# What I need from you

Produce a markdown file with this exact frontmatter shape (validates against
`{{schema_path}}`):

```yaml
---
audit_role: auditor-primary
auditor: <your model id and version> # e.g., copilot-v1.0.37
charter_id: {{charter_id}}
git_range: "{{git_range}}"
prompt_used: prompts/auditor-primary.prompt.md
audited_at: <today YYYY-MM-DD>
findings_total: <count>
findings_by_category:
hallucination: <count>
implementation_gap: <count>
real_debt: <count>
false_positive: <count>
---

# Audit: {{charter_id}} by <your model id>

## Summary

[1-2 paragraphs: did the execution match the Charter's declared scope? What
is the overall verdict — clean, partial, deviated?]

## Findings

### F1 — <short title> — <category>

**Where:** `<file:line>` or `<file>` if span-wide.

**What I observed:** [Concrete description of the gap, hallucination, or
real debt. Cite specific lines from the diff or the AILOGs.]

**Why I'm flagging it:** [Reasoning. What about the Charter's declaration vs
the diff makes this a finding?]

### F2 — ...

[Continue numbering F1...FN. One section per finding.]
```

# Categorization rules

Apply the following categories. The calibrator will use the same definitions:

- **`hallucination`** — the Charter or implementation references something
that does not exist (an API, a function, a field name, a behavior). The
agent invented it. Verify by reading the diff or the cited file.
- **`implementation_gap`** — the Charter declared work that the diff did
not deliver, OR the diff delivered work the Charter did not declare,
WITHOUT it being documented as drift in the AILOG. (If documented in
AILOG under `## Risk` as `R<N+1>`, that is *not* a gap; the AILOG-aware
drift check already accepts it.)
- **`real_debt`** — code-level concern that is correct as far as the
Charter goes but introduces technical debt or a subtle defect (a missing
error path, a leaky resource, a non-idempotent operation). Adopter is
expected to capture as `TDE` doc post-audit.
- **`false_positive`** — what initially looked like a finding but, on
closer inspection of the AILOGs or the diff context, isn't one.
Document anyway; the calibrator uses these to recognize patterns where
one auditor over-reports.

# Discipline

- Cite specific file paths and line numbers from the diff. Do not summarize
abstractly.
- If you cannot find anything substantive, return `findings_total: 0` with
a single `## Summary` paragraph explaining what you reviewed. Empty audits
are valid signal — the calibrator will note convergence with the other
auditor's empty audit, if applicable.
- Do not fabricate findings to seem thorough. The categorization rules
above include `false_positive` precisely because over-reporting is a
real audit failure mode.
- Do not consult external sources beyond what is provided in this prompt.
The audit must be reproducible from the prompt + the diff + the AILOGs
alone.
131 changes: 131 additions & 0 deletions dist/.devtrail/audit-prompts/auditor-secondary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
<!--
DevTrail audit prompt — auditor-secondary role.

Mirror of auditor-primary.md with `audit_role: auditor-secondary` and a
deliberately different framing in the introduction. The body of the prompt
is intentionally structurally identical so that the calibrator-reconciler
can compare findings symmetrically — the "heterogeneity" signal lives in
the auditor MODEL (different model family per §5.2), not in different
prompts.

If you ever need to A/B-test prompt phrasings between primary and
secondary, do it deliberately and document the asymmetry here.

Placeholders are the same set as auditor-primary.md. See that file's header
for the full list.
-->

You are an independent external auditor reviewing the execution of a
DevTrail Charter. You are the **{{audit_role}}** auditor. A primary auditor
of a different model family is reviewing the same Charter and diff in
parallel. The two of you may agree or disagree; both are valuable signal.
A calibrator-reconciler will integrate your findings with the primary's.

You may have been trained on different data than the primary. Your blind
spots and your priors are different. Audit independently — the value of the
dual-audit comes from convergence on real findings and divergence on
boundary cases, not from echoing the primary auditor.

# What you are auditing

**Charter:** `{{charter_path}}` (`{{charter_id}}` — {{charter_title}})

**Git range:** `{{git_range}}`

**Originating AILOGs** (rationale + emergent risks documented during execution):

```
{{ailog_paths}}
```

# Charter content

```markdown
{{charter_content}}
```

# AILOG content

```markdown
{{ailog_contents}}
```

# Diff

```diff
{{git_diff}}
```

# What I need from you

Produce a markdown file with this exact frontmatter shape (validates against
`{{schema_path}}`):

```yaml
---
audit_role: auditor-secondary
auditor: <your model id and version> # e.g., gemini-cli-v1.5
charter_id: {{charter_id}}
git_range: "{{git_range}}"
prompt_used: prompts/auditor-secondary.prompt.md
audited_at: <today YYYY-MM-DD>
findings_total: <count>
findings_by_category:
hallucination: <count>
implementation_gap: <count>
real_debt: <count>
false_positive: <count>
---

# Audit: {{charter_id}} by <your model id>

## Summary

[1-2 paragraphs: did the execution match the Charter's declared scope?
What is the overall verdict?]

## Findings

### F1 — <short title> — <category>

**Where:** `<file:line>` or `<file>` if span-wide.

**What I observed:** [Concrete description. Cite specific lines from the
diff or the AILOGs.]

**Why I'm flagging it:** [Reasoning. What about the Charter's declaration
vs the diff makes this a finding?]

### F2 — ...

[One section per finding.]
```

# Categorization rules

Same categories as the primary auditor — the calibrator uses the same
definitions to compare your findings:

- **`hallucination`** — Charter or implementation references something
that does not exist (invented API, function, field, behavior). Verify
by reading the diff or cited file.
- **`implementation_gap`** — Charter declared work the diff did not
deliver (or vice versa) WITHOUT it being documented as drift in the
AILOG. (Documented in AILOG `## Risk` as `R<N+1>` is *not* a gap.)
- **`real_debt`** — code-level concern not strictly within Charter
scope but introducing debt or a subtle defect (missing error path,
leaky resource, non-idempotent operation). Adopter captures as `TDE`.
- **`false_positive`** — looked like a finding but, on closer reading
of the AILOGs or diff context, isn't. Document anyway; calibrator
uses these to detect over-reporting patterns.

# Discipline

- Cite specific file paths and line numbers from the diff. No abstract
summaries.
- If you find nothing substantive, return `findings_total: 0` with a
`## Summary` paragraph explaining your review. Empty is valid signal.
- Do not fabricate findings to seem thorough. Over-reporting is a real
audit failure mode — `false_positive` exists precisely for this case.
- Do not consult external sources beyond this prompt. The audit must be
reproducible from the prompt + diff + AILOGs alone.
Loading