st verdict

st-verdict — Score AI accuracy and explain the results

Reads fact-check data from a container and produces two outputs: a stacked bar chart comparing each AI's accuracy (true · ~true · opinion · ~false · false breakdown per author), and an optional written analysis — a short caption on by default, or a longer caption, summary, or story via the --ai-* flags. Run after st-fact for a single-AI verdict, or after st-cross to compare all providers head-to-head.

Run after: st-fact st-cross

Multi-model (0.9.0+): when same-make agents author distinct stories in the same container, the chart and score_authors() rank each agent as a separate author (Opus and Sonnet rated independently). See Multi-Model.

st-verdict workflow

Example output

st-verdict bar chart

st-verdict --ai-caption projector_sonos_options.json — example caption (Gemini):

This chart presents a cross-AI fact-check on stories detailing RV and portable entertainment systems. XAI's Grok-4-1-fast-reasoning emerged as the most accurate AI author, with approximately 82% of its claims verified as True or ~True — significantly outperforming Anthropic's Claude-opus-4-5 at ~66%. Grok's superior performance stems from a high proportion of unequivocally True claims (54%) and a remarkably low False or ~False rate of just 3%, while Claude exhibited a combined inaccuracy rate of 15%.

Options

Option	Description
`file.json`	Path to the JSON container
`--cache`	Enable API cache (default: enabled)
`--no-cache`	Disable API cache
`-v`, `--verbose`	Verbose output
`-q`, `--quiet`	Minimal output

Chart output

Option	Description
`--display`	Display chart on screen (default: on)
`--no-display`	Suppress on-screen display
`--file`	Save chart PNG to file (default: off)
`--path PATH`	Output directory for PNG (default: `./tmp/`)

AI content generation

Option	Description
`--ai-title`	Generate a ≤10-word title → stdout
`--ai-short`	Generate a ≤80-word short caption → stdout (default: on when no other `--ai-*` flag is given)
`--no-ai-short`	Suppress the automatic short caption
`--ai-caption`	Generate a 100–160-word detailed caption → stdout
`--ai-summary`	Generate a 120–200-word summary → stdout
`--ai-story`	Generate an 800–1200-word narrative → stdout
`--agent AI`	AI provider for content generation (default: `xai`)

What-is lens — focused claim breakdown

Switches the AI from "summarise the chart" to "summarise the claims that fall on one side of the truth ledger" — or, with --what-is-missing, to identify what the report failed to cover at all. Aggregates per-claim verdicts and explanations across all fact-checkers in the container, then asks one AI to synthesize them into a focused report. Pair with any --ai-* flag to control level of detail; --ai-summary is auto-enabled if no detail flag is given.

Option	Description
`--what-is-false`	Aggregate every claim marked `false` / `partially_false` and produce a focused breakdown of what is inaccurate or disputed
`--what-is-true`	Aggregate every claim marked `true` / `partially_true` and produce a focused breakdown of what is verified or supported
`--what-is-missing`	Identify what important aspects of the prompt the report failed to mention (omissions / coverage gaps)
`-s N`, `--story N`	Story index to analyse with the lens (default: `1`)

The three lenses are mutually exclusive (and also exclusive with --how-to-fix below).

# Detailed breakdown of inaccurate claims (e.g. "is this fake news?")
st-verdict -s 1 --what-is-false --ai-summary subject.json

# Positive-evidence summary — what the report got right
st-verdict -s 1 --what-is-true --ai-caption subject.json

# Coverage-gap analysis — what the report failed to mention
st-verdict -s 1 --what-is-missing --ai-summary subject.json

# Long-form analysis suitable for sharing as feedback
st-verdict -s 1 --what-is-false --ai-story --no-display subject.json

Each lens analyses one story at a time (the AI author at index N). Since cross-st 0.8.0 the default is the highest-scoring story per score_authors() (composite Coverage / Completeness / Accuracy / Calibration); pass -s 1, -s 2, … to inspect a specific author. To compare across authors, run the command once per index. Override the scoring weights with --score-weights cov=…,comp=…,acc=…,cal=….

The lens reads story[N].fact[] entries — run st-cross first so multiple AIs have fact-checked the report. The more checkers that flagged the same claim, the stronger the signal in the resulting analysis. The --what-is-missing lens additionally reads the original prompt (data[0].prompt) and the report markdown so the AI can reason about what should be there but isn't.

See also: Showcase Workflows — copy-pastable transcripts for each lens.

Recommendation lens — `--how-to-fix`

After looking at the chart and the lens output, the natural next question is "so what do I do about it?" The recommendation lens asks the AI to read the score breakdown, the verdict mix, and (at --ai-summary / --ai-story detail levels) the report itself, then recommend exactly one next action. The recommendation is human-facing prose; st-verdict never auto-invokes the suggested tool.

Option	Description
`--how-to-fix`	Recommend exactly one of: `st-fix`, `st-bang -N`, `st-merge`, or `publish-as-is`. Default detail level: `--ai-short` (single concrete recommendation sentence).

The four candidate actions and when each is recommended:

Recommendation	Triggered when
`st-fix subject.json`	Report is mostly sound but has clusters of false / partially_false claims
`st-bang -N subject.json`	Sample size too small (one story) or scores vary wildly across fact-checkers
`st-merge subject.json`	Multiple stories already present; combining their strongest sections beats any single one
`publish-as-is`	Per-author scores are high (avg ≥ 1.5) AND zero false / partially_false claims

The output's last line always has the exact shape Recommendation: <command> — <reason>. so it is easy to spot in a terminal scrollback.

# One-line recommendation (default)
st-verdict -s 1 --how-to-fix subject.json

# Three-paragraph technical recommendation with alternatives considered
st-verdict -s 1 --how-to-fix --ai-summary subject.json

# Long-form recommendation analysis suitable for archival
st-verdict -s 1 --how-to-fix --ai-story --no-display subject.json

Architectural note: As of cross-st 0.7.0, all interpretive --ai-*, --what-is-*, and --how-to-fix flags live here in st-verdict. st-fact is now a pure verifier (it produces fact-check data; st-verdict interprets it). This is the GATHER → VERIFY → INTERPRET division of responsibility.

Related: Three Stages Showcase Workflows st-cross st-heatmap st-analyze

For developers

Built on mmd_plot.py. Data comes from mmd_data_analysis.get_flattened_fc_data().

st verdict

st-verdict — Score AI accuracy and explain the results

Example output

Options

Chart output

AI content generation

What-is lens — focused claim breakdown

Recommendation lens — --how-to-fix

For developers

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Recommendation lens — `--how-to-fix`