# st-heatmap — Plot a cross-product fact-check score heatmap Generates a color-coded grid showing how every AI-pair scored in the cross-product fact-check. **Rows** are evaluator AIs; **columns** are target story authors. Darker cells = higher veracity scores. The diagonal shows self-evaluation scores. **Run after:** `st-cross` ## Examples ```bash st-heatmap --display subject.json # show chart on screen st-heatmap --file subject.json # save PNG to ./tmp/ st-heatmap --display --ai-caption subject.json # chart + AI narrative (default AI) st-heatmap --file --ai-caption --agent openai subject.json # save PNG + caption via openai st-heatmap --file --ai-title --agent gemini subject.json # save PNG + title via gemini st-heatmap --display --file --ai-summary subject.json # screen + save PNG + summary st-heatmap --display --file --ai-story subject.json # screen + save PNG + full story ``` ## Options ### Chart output | Flag | Description | Default | |------|-------------|---------| | `--display` | Display heatmap on screen | off | | `--file` | Save heatmap as PNG | off | | `--path PATH` | Output directory for PNG | `./tmp` | ### AI content generation All AI flags print generated text to stdout after the chart is produced. Combine with `--file` to save the chart and generate content in one command. | Flag | Output | Word limit | |------|--------|------------| | `--ai-title` | Short title for the heatmap | ≤ 10 words | | `--ai-short` | Short caption | ≤ 80 words | | `--ai-caption` | Detailed caption with patterns and outliers | 100–160 words | | `--ai-summary` | Concise summary | 120–200 words | | `--ai-story` | Comprehensive narrative | 800–1,200 words | Use `--agent NAME` to select the provider (default: your `DEFAULT_AGENT` setting). Supported: `anthropic`, `xai`, `gemini`, `openai`, `ollama`. ### Other | Flag | Description | |------|-------------| | `--agent AI` | AI provider for content generation (default: `xai`) | | `--cache` | Enable API response cache (default: on) | | `--no-cache` | Disable API cache for this run | | `-v`, `--verbose` | Verbose output | | `-q`, `--quiet` | Minimal output | --- ## Example output The heatmap below was generated from a pizza dough fact-checking run across 5 AI providers using `--file --ai-caption`: ```bash st-heatmap --agent openai --ai-caption --file pizza_dough.json ``` ![Cross-product heatmap — pizza dough domain](st-heatmap-example.png) **AI-generated caption** (`--ai-caption`, via `openai`): > This heatmap evaluates the veracity of AI-generated content focused on the domain > of crafting homemade pizza dough, with scores ranging from 0.9 to 1.8. In terms > of column patterns, the target AI "gemini:gemini-2.5-flash" boasts a consistently > dark column, indicating it is highly trusted across all evaluators. Conversely, > "perplexity:sonar-pro" shows a lighter column, suggesting that its stories are > viewed with more skepticism. Evaluator-wise, "gemini:gemini-2.5-flash" emerges as > the most lenient, consistently giving higher scores. In contrast, > "perplexity:sonar-pro" appears stricter, as evidenced by its lower scores across > the board, reflecting a more conservative assessment of truthfulness. > > Examining the diagonal reveals a slightly darker hue compared to the surrounding > cells, suggesting a mild self-promotion bias, where AIs rate their own work more > favorably. A notable outlier is "perplexity:sonar-pro" evaluating itself with a > score of 0.9, the lowest in the matrix. This score is explained by the presence of > five partially false and two false counts, indicating a critical self-assessment. > For technical readers interested in selecting an AI for pizza dough content, > "gemini:gemini-2.5-flash" offers a reliable choice, consistently earning high trust > scores from various evaluators. --- ## Reading the heatmap - **Dark column** — that AI's reports are consistently trusted by all other evaluators - **Light column** — that AI's reports are viewed with more scepticism - **Dark row** — that evaluator is lenient (gives high scores to everyone) - **Light row** — that evaluator is strict (gives low scores to everyone) - **Diagonal** — self-evaluation; a darker diagonal suggests mild self-promotion bias - **Single outlier cell** — one AI finds another's work significantly more or less credible Scores above **1.5** indicate strong factual agreement. Scores below **1.0** suggest frequent disagreement or identified errors. --- **Related:** [st-cross](st-cross) · [st-verdict](st-verdict) · [st-speed](st-speed) · [st-analyze](st-analyze) --- ## For developers Uses `mmd_data_analysis.get_flattened_fc_data()` to build the score matrix, then renders with `mmd_plot`. AI content flags (`--ai-caption` etc.) call `process_prompt()` from `ai_handler` with the flattened score data as context. The `--path` directory is created automatically if it does not exist.