Releases: llmci-cli/llmci
Release list
llmci 0.4.1
Patch release: proxy cost pricing for direct targets.
Added
settings.price_overrides— per-modelinput_per_token/output_per_tokenUSD rates when litellm cannot compute cost (internal LLM proxies).
Install: pip install llmci==0.4.1
See CHANGELOG.
llmci 0.4.0
Cross-provider migration, few-shot strategy, and PII allow-list for safety gates.
Added
- Cross-provider migration —
llmci migrateacceptsprovider/modelrefs with per-side base URLs - Few-shot migration strategy —
--strategy few_shotinlines train examples as demos - PII allow-list —
pii_leakagecriteria acceptallow_list(literal orregex:entries)
See CHANGELOG for full details.
llmci 0.3.0
Post-0.2.0 follow-ups: deeper gate trust, RAG faithfulness, red-team mutation, and multimodal evals.
Highlights
- Composite judge caching — agent outcome/trajectory LLM calls share
.llmci/cache/judges/ - Calibration trend history —
--save-snapshotappends to a history log with trend table - Gate warnings —
llmci runwarns on missing baselines or significance misconfig - Per-claim faithfulness — RAG
decompose_claims: truefor atomic grounding checks - LLM attack mutation —
llmci redteam generate --mutatefor broader adversarial coverage - Multimodal targets —
images/audiofields on dataset rows for direct API evals - Example 18 — multimodal vision eval (
examples/18-multimodal-vision)
Install: pip install llmci==0.3.0
Full changelog: https://github.com/llmci-cli/llmci/blob/main/CHANGELOG.md#030---2026-06-06
llmci 0.2.0
Major release: CI gate trust, deeper eval quality, safety/red-team, plugin API, and seventeen runnable examples.
Highlights
CI gate hardening
- Flake resistance (
samples_per_example, significance gating) - Response caching for direct API targets
- Cost/token metrics (
cost_mean,tokens_*) - Portable reports: JUnit, SARIF, JSON, HTML
Eval quality
- RAG judge (faithfulness, relevance, retrieval metrics)
- Pairwise judge with position-swap bias control
- Judge calibration & drift detection (per-criterion support)
- Output diffs vs baseline in reports
- Structured-output (JSON Schema) judge
Safety & plugins
- Safety judge (PII, toxicity, jailbreak)
- Red-team attack generator (
llmci redteam generate) - Plugin API: custom judges, metrics, and report sinks
Examples
examples/11–17, including integrated pre-merge gate with committed baselines
Install: pip install llmci==0.2.0
Full changelog: https://github.com/llmci-cli/llmci/blob/main/CHANGELOG.md#020---2026-06-06
llmci 0.1.9
Added
- Release metadata consistency check for package version, action install version, and changelog links.
- Manual real-LLM example workflow for API-key-dependent examples.
- GitHub Action inputs for explicit config paths, discovered config runs, and baseline updates.
Fixed
- Duplicate llmci PR comments from parallel matrix jobs are merged into one canonical comment and stale duplicates are cleaned up.
Install: pip install llmci==0.1.9
llmci 0.1.8
Added
--includeand--excludefilters forllmci discoverandllmci run --all.
Changed
- Dogfood matrix evals now use
LLMCI_REPORT_SLICEso PR comments merge into one combined report.
llmci 0.1.7
Added
llmci discoverto list config files in a repository.llmci run --allto run every discovered config.
llmci 0.1.6
Added
llmci run --config <path>to run evals from an alternate config file.
llmci 0.1.5
Fixed
- S3 dataset URI validation runs before the optional
boto3import. - PyPI publish workflow grants
contents: readso checkout works alongside trusted publishing.
Install: pip install llmci==0.1.5
llmci 0.1.4
Added
- Remote eval datasets —
datasetacceptss3://andhttps://URIs (string or{source, cache}). S3 requirespip install 'llmci[s3]'. Cached under.llmci/cache/datasets/by default.
Changed
- Repository and package metadata URLs updated for the
llmci-cliGitHub organization.
Install: pip install llmci==0.1.4