README revamp: fix broken commands, add demo GIF, correct competitor table, polished by agentecobuilder · Pull Request #173 · LayerLens/stratix-python

agentecobuilder · 2026-05-21T14:48:37Z

Summary

README revamp to fix several broken user flows, improve clarity, and tighten product positioning. All commands and code blocks were tested locally on a fresh macOS environment to confirm they work.

Substantive fixes (issues that would affect users)

Quick Start now uses commands that actually ship. Removed references to stratix init, which doesn't exist in the SDK, and replaced it with the working PublicClient flow.
Fixed the broken "Next Steps" section, which had the same non-existent stratix init command. Replaced it with the actual working install, API key, and first-call sequence.
Quoted "layerlens[cli]" in install commands. It was failing in zsh (default macOS shell) because of the unquoted brackets.
Switched python to python3. Default macOS no longer ships a python alias, so the original command was failing.
Clarified the two-client model. Added a callout explaining when to use PublicClient vs Stratix, so users aren't confused when pc appears in code before it's introduced.
Promoted the early-access install warning. The note about needing --extra-index-url is now a GitHub callout instead of a line that's easy to skim past.
Removed the broken CI badge that was rendering as a broken image icon.

Competitor table updates

DeepEval: "~14 metrics" → "50+ metrics" (per their official live site)
Langfuse prompt-level comparison: "Not built-in" → "Prompt experiments + side-by-side (UI)"

Additions

Demo GIF at the top of the README, showing the SDK in action, listing 217 frontier models from 4 different vendors in 5 lines of Python.
Pricing section between the comparison table and Installation, clarifying the free vs. Stratix Premium split.
Replaced a dead link embed with the richer samples/ directory.
Replaced previous Examples table with more rich Samples table

Update competitor table

pricing section

Replace Examples section with Samples (surface 70+ samples per HoP feedback)

mmercuri · 2026-05-25T07:22:50Z

Accuracy review — README revamp (PR #173)

Reviewed against the live repo (stratix-python at HEAD), atlas-app, and the public docs of each cited competitor. Net: most of the substantive fixes check out, but the PR introduces three dead links in the new "Next steps" block that should land green before merge, plus one self-inconsistent count.

✓ Verified accurate

Claim	Verification
`stratix init` doesn't ship	Confirmed — `src/layerlens/cli/commands/` contains `auth, bulk, ci, evaluate, evaluations, integration, judge, replay, scorer, space, synthetic, trace`. No `init.py`.
`LAYERLENS_STRATIX_API_KEY` is the env var `PublicClient()` reads	Confirmed — `src/layerlens/_public_client.py:67` does `os.environ.get("LAYERLENS_STRATIX_API_KEY")` when no `api_key` is passed.
`"layerlens[cli]"` quoting fixes the zsh failure	Confirmed — unquoted brackets are zsh `extendedglob` chars.
`python3` over `python` on macOS	Confirmed — macOS 13+ ships no `python` alias by default.
CI badge was broken	Confirmed — `.github/workflows/` has `run-tests.yaml`, `check-format.yaml`, `publish-sdk.yaml`, etc. — there is no `ci.yml`, so the old badge was pointing at a non-existent workflow.
`samples/` exists; `examples/` removed	Confirmed — 13 sample subdirs (`cicd, claude-code, cli, copilotkit, core, cowork, data, industry, instrument, integrations, mcp, modalities, openclaw`); `examples/` is gone.
"70+ production-ready samples"	Conservative — actual count is 131 `.py` files at depth 3 across the 13 subdirs (core: 20, instrument: 54, openclaw: 30, industry: 10, cowork: 5, integrations: 4, copilotkit: 3, modalities: 3, cicd: 2, mcp: 1).
DeepEval: `~14 metrics` → `50+ metrics`	✓ Validated against deepeval.com/docs/metrics-introduction — their own page says "50+ SOTA, ready-to-use metrics." Categories enumerated: RAG (5), Agents (6), Chatbots (4), Safety (6), Image (5), Others (4) + Custom (G-Eval, DAG, Arena G-Eval, conversational variants).
Langfuse: `Not built-in` → `Prompt experiments + side-by-side (UI Supported)`	✓ Validated against langfuse.com/docs/prompts/experiments and `/docs/playground` — both pages confirm "Prompt Experiments," "Playground," "side-by-side," and "compare versions."
Logo URL	`https://layerlens-public-assets.s3.us-east-1.amazonaws.com/logo-full.png` → HTTP 200.
Live URLs in README	`https://stratix.layerlens.ai`, `https://app.layerlens.ai`, `https://sdk.layerlens.ai/package` all return HTTP 200.

✗ Bugs introduced by this PR (please fix before merge)

1. Three dead links in the new "Next steps" block — the PR explicitly removes examples/ (replaced by samples/), but the new Next steps still points at ./examples/ paths that don't exist:

-- **[Run a custom evaluation](./examples/)** ➡️ score your own model on any benchmark
-- **[Gate CI/CD on eval results](./examples/ci-gate)** ➡️ `layerlens ci run --threshold 0.8` in your pipeline
-- **[Upload and evaluate agent traces](./examples/agent-traces)** ➡️ multi-step trace analysis

Real paths:

./examples/ → ./samples/
./examples/ci-gate → ./samples/cicd/ (note: subdir is cicd, not ci-gate)
./examples/agent-traces → no such subdir exists in samples/. Either create it or point at the closest match (e.g., ./samples/instrument/ or ./samples/openclaw/).

Given the PR's own headline is "fix broken commands," dead links here are extra worth catching.

2. Self-inconsistent benchmark count

Hero strapline: "Evaluate 200+ models across 100+ benchmarks"
Compete table row: "100+ benchmarks, 200+ models"
New Pricing section: "query 200+ models, 50+ benchmarks, and run head-to-head comparisons"

Pick one number and use it consistently across all three locations.

⚠ Worth re-checking (couldn't verify from source)

Claim	What to do
`217 frontier models` / `4 different vendors` (GIF caption)	Atlas-app catalog lives in `docker/db/mongodb/seed/seed.js` and is loaded into Mongo — I couldn't enumerate from a checkout. Recommend running `pc.models.get(page_size=500)` against prod and counting `total_count` + distinct providers right before merge so the caption matches reality.
Benchmark ID `"aime2024"` in the Quick Start code	Same — couldn't enumerate against atlas-app source. The PR body says "all commands tested locally," which is good; just confirm the slug still resolves at merge time.
Model IDs `"openai/gpt-4o"`, `"anthropic/claude-opus-4"`	These are older IDs. Current frontier in the catalog (per platform memory) includes GPT-5.3, Claude Opus 4.6, Gemini 3.1 Pro/Flash. Either confirm `gpt-4o` and `claude-opus-4` still resolve, or swap to current frontier IDs so the Quick Start lands green for anyone who copies it.

Light suggestions

Demo GIF caption ("217 frontier models from 4 different vendors in 5 lines of Python") will age — consider keeping the caption claim-free (e.g., "list frontier models in 5 lines of Python") so the model count rotating doesn't make the README stale.
The "Get a key from app.layerlens.ai → Settings → API Keys" path should be screenshot-verified at merge time to make sure the nav matches the current dashboard.

Nice work overall on the compete fixes — both DeepEval and Langfuse updates landed accurately against their respective public docs.

Address accuracy review: fix dead links, unify benchmark count, fix Samples nav anchor, de-claim GIF caption, update Quick Start to 1.8.0 SDK API

agentecobuilder · 2026-05-26T15:30:59Z

Pushed fixes

Dead links: Three ./examples/ paths in Next steps --> ./samples/core/, ./samples/cicd/, ./samples/instrument/
Benchmark count: Unified to 50+ across hero, "what makes it click" bullet, compete table, and pricing. Verified via pc.benchmarks.get
GIF caption: Dropped 217 so it ages well
Bonus Fix: Top nav "Examples" link was directed to non-existent #examples anchor, i've updated to "Samples" --> #samples
Quick Start: Updated to the new 1.8.0 SDK API --> benchmark_key/model_key_1/modelkey_2parameters, withanthropic/claude-3.5-haiku(valid model with AIME 2024 data) replacing the previously invalidanthropic/claude-opus-4... verified the snippet returns a realComparisonResponse` end-to-end against prod.

Ready for another review

agentecobuilder added 10 commits May 20, 2026 14:33

Update README.md

1a67969

Add demo GIF

80b651e

Update README.md

b5f4957

Update README.md

e379f6a

Update README.md

bbb523f

Update competitor table

Update README.md

da59399

pricing section

Update README.md

2dfe9a2

Update README.md

f03116b

Update README.md

1c61df6

Update README.md

312021b

Replace Examples section with Samples (surface 70+ samples per HoP feedback)

Update README.md

1d6c0ac

Address accuracy review: fix dead links, unify benchmark count, fix Samples nav anchor, de-claim GIF caption, update Quick Start to 1.8.0 SDK API

m-peko approved these changes May 26, 2026

View reviewed changes

m-peko merged commit 731872f into LayerLens:main May 26, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README revamp: fix broken commands, add demo GIF, correct competitor table, polished#173

README revamp: fix broken commands, add demo GIF, correct competitor table, polished#173
m-peko merged 11 commits into
LayerLens:mainfrom
agentecobuilder:main

agentecobuilder commented May 21, 2026

Uh oh!

mmercuri commented May 25, 2026

Uh oh!

agentecobuilder commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

agentecobuilder commented May 21, 2026

Summary

Substantive fixes (issues that would affect users)

Competitor table updates

Additions

Uh oh!

mmercuri commented May 25, 2026

Accuracy review — README revamp (PR #173)

✓ Verified accurate

✗ Bugs introduced by this PR (please fix before merge)

⚠ Worth re-checking (couldn't verify from source)

Light suggestions

Uh oh!

agentecobuilder commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants