Skip to content

st speed

b2o2i edited this page May 10, 2026 · 7 revisions

st-speed — Analyze AI provider performance and speed

Compares AI provider performance across a container: generation time, tokens per second, fact-checking throughput, and consistency. Useful for choosing a provider when speed matters.

Run after: st-bang st-cross

Related: st-stones st-cross st-heatmap Multi-Model

Multi-model (0.9.0+): when same-make agents (e.g. anthropic-opus and anthropic-sonnet) appear in a container, st-speed shows one row per agent with make:model labels for disambiguation. See Multi-Model.


Usage

st-speed report.json                          # Full performance summary (all AIs)
st-speed --agent gemini report.json              # Filter display to one AI
st-speed --agent openai --ai-caption report.json # All-AI summary + caption written by OpenAI
st-speed --ai-short report.json               # All-AI summary + short caption (default AI)
st-speed --history crypto/*.json              # Trend analysis across multiple files
st-speed --csv report.json                    # Export raw timing data to CSV

Example output

Basic performance summary

st-speed projector_sonos_options.json

Performance Summary: projector_sonos_options.json
======================================================================

Story Generation:
AI            Time    Tokens    Tok/s    Samples

openai        00:18     1631    88.57          1
perplexity    00:23     2725   115.57          1
gemini        00:53    11141   207.56          1
anthropic     01:17     4421    57.37          1
xai (cache)            3687        —          1

Fact-Checking Performance:
AI            Avg     Median    Min     Max     StdDev    Samples    Segments

openai        01:53    01:43    00:50   03:09    50.4s          5    29/job
perplexity    02:59    02:31    01:41   05:16    83.2s          5    29/job
gemini        05:21    04:58    02:51   08:32   124.4s          5    29/job
xai           06:16    05:53    03:22   09:50   140.0s          5    29/job
anthropic     10:15    09:46    05:04   16:45   251.9s          5    29/job

Note: Each sample is one complete fact-check job.
      'Segments' shows avg AI calls per job (typically 20-50 paragraphs).
======================================================================

With AI-generated caption

Using --agent openai --ai-caption generates the caption with OpenAI but still shows all providers in the performance table:

st-speed --agent openai --ai-caption projector_sonos_options.json

Performance Summary: projector_sonos_options.json
======================================================================

Story Generation:
AI            Time    Tokens    Tok/s    Samples

openai        00:18     1631    88.57          1
perplexity    00:23     2725   115.57          1
gemini        00:53    11141   207.56          1
anthropic     01:17     4421    57.37          1
xai (cache)            3687        —          1

Fact-Checking Performance:
AI            Avg     Median    Min     Max     StdDev    Samples    Segments

openai        01:53    01:43    00:50   03:09    50.4s          5    29/job
perplexity    02:59    02:31    01:41   05:16    83.2s          5    29/job
gemini        05:21    04:58    02:51   08:32   124.4s          5    29/job
xai           06:16    05:53    03:22   09:50   140.0s          5    29/job
anthropic     10:15    09:46    05:04   16:45   251.9s          5    29/job

Note: Each sample is one complete fact-check job.
      'Segments' shows avg AI calls per job (typically 20-50 paragraphs).

Detailed Caption (generated by openai):
──────────────────────────────────────────────────────────────────────
OpenAI leads the speed race on both fronts — wrapping up story
generation in under 20 seconds and completing a full 29-segment
fact-check in under 2 minutes on average. Perplexity is a close
second for generation but falls to nearly 3 minutes per fact-check.
Gemini, xAI, and Anthropic bring up the rear, with Anthropic averaging
over 10 minutes per job and a standard deviation of over 4 minutes —
making it the least predictable choice. For time-sensitive workflows,
OpenAI or Perplexity are the clear picks; Gemini sits in the middle
ground. Anthropic's high variance suggests it should only be used
where response time is not a constraint.
──────────────────────────────────────────────────────────────────────
======================================================================

Key behaviour: --agent selects which AI writes the caption — it does not filter the performance table. All providers are always shown so you get the full comparison. Use --agent without any --ai-* flag if you want to filter the display to a single provider.


AI content options

Flag Output Length
--ai-title Punchy headline ≤ 10 words
--ai-short One-paragraph summary ≤ 80 words
--ai-caption Two-paragraph detailed caption 100–160 words
--ai-summary Technical summary with recommendations 120–200 words
--ai-story Full narrative report (saved to JSON) 800–1200 words

Combine any content flag with --agent <provider> to choose who writes it:

st-speed --agent anthropic --ai-summary report.json
st-speed --agent gemini --ai-story report.json

Options

Option Description
file.json [file.json …] Path to one or more JSON container files
--agent AI AI for content generation (default: auto). When used with --ai-* flags, selects which AI generates the content but does not filter the performance display. Without --ai-* flags, also filters the display to one provider.
--csv CSV Export raw timing data to a CSV file
--history Analyze trends across multiple files
--cache Enable API response caching (default for AI content generation)
--no-cache Disable API response caching (forces fresh AI calls)
-q, --quiet Minimal output
-v, --verbose Verbose output (show generation details)

For developers

Reads timing{} dicts from data[] entries (generation) and fact[].timing dicts (fact-checking). Timing is written by st-gen / st-fact on every non-cached call and is absent on cache hits.

For fact-checks, elapsed time is extrapolated from fresh segments so that partially-cached runs remain comparable with fully-fresh ones (see extract_fact_check_timing() in source).

Clone this wiki locally