Skip to content

README revamp: fix broken commands, add demo GIF, correct competitor table, polished#173

Merged
m-peko merged 11 commits into
LayerLens:mainfrom
agentecobuilder:main
May 26, 2026
Merged

README revamp: fix broken commands, add demo GIF, correct competitor table, polished#173
m-peko merged 11 commits into
LayerLens:mainfrom
agentecobuilder:main

Conversation

@agentecobuilder

Copy link
Copy Markdown
Contributor

Summary

README revamp to fix several broken user flows, improve clarity, and tighten product positioning. All commands and code blocks were tested locally on a fresh macOS environment to confirm they work.

Substantive fixes (issues that would affect users)

  • Quick Start now uses commands that actually ship. Removed references to stratix init, which doesn't exist in the SDK, and replaced it with the working PublicClient flow.
  • Fixed the broken "Next Steps" section, which had the same non-existent stratix init command. Replaced it with the actual working install, API key, and first-call sequence.
  • Quoted "layerlens[cli]" in install commands. It was failing in zsh (default macOS shell) because of the unquoted brackets.
  • Switched python to python3. Default macOS no longer ships a python alias, so the original command was failing.
  • Clarified the two-client model. Added a callout explaining when to use PublicClient vs Stratix, so users aren't confused when pc appears in code before it's introduced.
  • Promoted the early-access install warning. The note about needing --extra-index-url is now a GitHub callout instead of a line that's easy to skim past.
  • Removed the broken CI badge that was rendering as a broken image icon.

Competitor table updates

  • DeepEval: "~14 metrics" → "50+ metrics" (per their official live site)
  • Langfuse prompt-level comparison: "Not built-in" → "Prompt experiments + side-by-side (UI)"

Additions

  • Demo GIF at the top of the README, showing the SDK in action, listing 217 frontier models from 4 different vendors in 5 lines of Python.
  • Pricing section between the comparison table and Installation, clarifying the free vs. Stratix Premium split.
  • Replaced a dead link embed with the richer samples/ directory.
  • Replaced previous Examples table with more rich Samples table

@mmercuri

Copy link
Copy Markdown
Contributor

Accuracy review — README revamp (PR #173)

Reviewed against the live repo (stratix-python at HEAD), atlas-app, and the public docs of each cited competitor. Net: most of the substantive fixes check out, but the PR introduces three dead links in the new "Next steps" block that should land green before merge, plus one self-inconsistent count.

✓ Verified accurate

Claim Verification
stratix init doesn't ship Confirmed — src/layerlens/cli/commands/ contains auth, bulk, ci, evaluate, evaluations, integration, judge, replay, scorer, space, synthetic, trace. No init.py.
LAYERLENS_STRATIX_API_KEY is the env var PublicClient() reads Confirmed — src/layerlens/_public_client.py:67 does os.environ.get("LAYERLENS_STRATIX_API_KEY") when no api_key is passed.
"layerlens[cli]" quoting fixes the zsh failure Confirmed — unquoted brackets are zsh extendedglob chars.
python3 over python on macOS Confirmed — macOS 13+ ships no python alias by default.
CI badge was broken Confirmed — .github/workflows/ has run-tests.yaml, check-format.yaml, publish-sdk.yaml, etc. — there is no ci.yml, so the old badge was pointing at a non-existent workflow.
samples/ exists; examples/ removed Confirmed — 13 sample subdirs (cicd, claude-code, cli, copilotkit, core, cowork, data, industry, instrument, integrations, mcp, modalities, openclaw); examples/ is gone.
"70+ production-ready samples" Conservative — actual count is 131 .py files at depth 3 across the 13 subdirs (core: 20, instrument: 54, openclaw: 30, industry: 10, cowork: 5, integrations: 4, copilotkit: 3, modalities: 3, cicd: 2, mcp: 1).
DeepEval: ~14 metrics50+ metrics ✓ Validated against deepeval.com/docs/metrics-introduction — their own page says "50+ SOTA, ready-to-use metrics." Categories enumerated: RAG (5), Agents (6), Chatbots (4), Safety (6), Image (5), Others (4) + Custom (G-Eval, DAG, Arena G-Eval, conversational variants).
Langfuse: Not built-inPrompt experiments + side-by-side (UI Supported) ✓ Validated against langfuse.com/docs/prompts/experiments and /docs/playground — both pages confirm "Prompt Experiments," "Playground," "side-by-side," and "compare versions."
Logo URL https://layerlens-public-assets.s3.us-east-1.amazonaws.com/logo-full.png → HTTP 200.
Live URLs in README https://stratix.layerlens.ai, https://app.layerlens.ai, https://sdk.layerlens.ai/package all return HTTP 200.

✗ Bugs introduced by this PR (please fix before merge)

1. Three dead links in the new "Next steps" block — the PR explicitly removes examples/ (replaced by samples/), but the new Next steps still points at ./examples/ paths that don't exist:

-- **[Run a custom evaluation](./examples/)** ➡️ score your own model on any benchmark
-- **[Gate CI/CD on eval results](./examples/ci-gate)** ➡️ `layerlens ci run --threshold 0.8` in your pipeline
-- **[Upload and evaluate agent traces](./examples/agent-traces)** ➡️ multi-step trace analysis

Real paths:

  • ./examples/./samples/
  • ./examples/ci-gate./samples/cicd/ (note: subdir is cicd, not ci-gate)
  • ./examples/agent-tracesno such subdir exists in samples/. Either create it or point at the closest match (e.g., ./samples/instrument/ or ./samples/openclaw/).

Given the PR's own headline is "fix broken commands," dead links here are extra worth catching.

2. Self-inconsistent benchmark count

  • Hero strapline: "Evaluate 200+ models across 100+ benchmarks"
  • Compete table row: "100+ benchmarks, 200+ models"
  • New Pricing section: "query 200+ models, 50+ benchmarks, and run head-to-head comparisons"

Pick one number and use it consistently across all three locations.

⚠ Worth re-checking (couldn't verify from source)

Claim What to do
217 frontier models / 4 different vendors (GIF caption) Atlas-app catalog lives in docker/db/mongodb/seed/seed.js and is loaded into Mongo — I couldn't enumerate from a checkout. Recommend running pc.models.get(page_size=500) against prod and counting total_count + distinct providers right before merge so the caption matches reality.
Benchmark ID "aime2024" in the Quick Start code Same — couldn't enumerate against atlas-app source. The PR body says "all commands tested locally," which is good; just confirm the slug still resolves at merge time.
Model IDs "openai/gpt-4o", "anthropic/claude-opus-4" These are older IDs. Current frontier in the catalog (per platform memory) includes GPT-5.3, Claude Opus 4.6, Gemini 3.1 Pro/Flash. Either confirm gpt-4o and claude-opus-4 still resolve, or swap to current frontier IDs so the Quick Start lands green for anyone who copies it.

Light suggestions

  • Demo GIF caption ("217 frontier models from 4 different vendors in 5 lines of Python") will age — consider keeping the caption claim-free (e.g., "list frontier models in 5 lines of Python") so the model count rotating doesn't make the README stale.
  • The "Get a key from app.layerlens.ai → Settings → API Keys" path should be screenshot-verified at merge time to make sure the nav matches the current dashboard.

Nice work overall on the compete fixes — both DeepEval and Langfuse updates landed accurately against their respective public docs.

Address accuracy review: fix dead links, unify benchmark count, fix Samples nav anchor, de-claim GIF caption, update Quick Start to 1.8.0 SDK API
@agentecobuilder

Copy link
Copy Markdown
Contributor Author

Pushed fixes

  • Dead links: Three ./examples/ paths in Next steps --> ./samples/core/, ./samples/cicd/, ./samples/instrument/
  • Benchmark count: Unified to 50+ across hero, "what makes it click" bullet, compete table, and pricing. Verified via pc.benchmarks.get
  • GIF caption: Dropped 217 so it ages well
  • Bonus Fix: Top nav "Examples" link was directed to non-existent #examples anchor, i've updated to "Samples" --> #samples
  • Quick Start: Updated to the new 1.8.0 SDK API --> benchmark_key/model_key_1/modelkey_2parameters, withanthropic/claude-3.5-haiku(valid model with AIME 2024 data) replacing the previously invalidanthropic/claude-opus-4... verified the snippet returns a realComparisonResponse` end-to-end against prod.

Ready for another review

@m-peko m-peko merged commit 731872f into LayerLens:main May 26, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants