LayerLens · m-peko · Apr 21, 2026 · Mar 30, 2026 · Mar 30, 2026 · Mar 30, 2026
diff --git a/.gitignore b/.gitignore
@@ -19,4 +19,5 @@ codegen.log
 Brewfile.lock.json
 
 .DS_Store
-.coverage
+.coveragedocs/review/
+marc-only/
diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
@@ -34,6 +34,19 @@
   * [Judges and Traces](examples/judges-and-traces.md)
   * [Public API](examples/public-api.md)
 
+## Samples & Tutorials
+* [Samples Overview](samples-guide.md)
+  * [Core SDK Operations](samples-guide.md#core-sdk-operations-18-samples) -- Traces, judges, evaluations, results, models, benchmarks, async
+  * [Industry Solutions](samples-guide.md#industry-solutions-10-samples) -- Healthcare, finance, legal, government, insurance, retail
+  * [Multi-Agent Evaluation](samples-guide.md#multi-agent-evaluation-5-samples) -- Cowork and Agent Teams patterns
+  * [Content-Type Evaluations](samples-guide.md#content-type-evaluations-3-samples) -- Text, brand, document
+  * [CI/CD Integration](samples-guide.md#cicd-integration-2-samples--workflow) -- Quality gates, pre-commit hooks, GitHub Actions
+  * [LLM Provider Integrations](samples-guide.md#llm-provider-integrations-2-samples) -- OpenAI, Anthropic
+  * [OpenClaw Agent Evaluation](samples-guide.md#openclaw-agent-evaluation-10-demos--skill) -- Cage match, code gate, safety audit, red-team
+  * [MCP Server](samples-guide.md#mcp-server-1-sample) -- LayerLens as tools for Claude and other MCP clients
+  * [CopilotKit Integration](samples-guide.md#copilotkit-integration-2-agents--ui-components) -- LangGraph CoAgents, React components
+  * [Claude Code Skills](samples-guide.md#claude-code-skills-6-skills) -- Slash commands for CLI and desktop
+
 ## Troubleshooting
 * [Overview](troubleshooting/README.md)
   * [Common Issues](troubleshooting/common-issues.md)

diff --git a/docs/examples/README.md b/docs/examples/README.md
@@ -1,41 +1,34 @@
-# Examples
+# Code Examples
 
-This section provides practical code examples for common SDK use cases. All examples are available as runnable scripts in the [`examples/`](../../examples/) directory.
+This section provides practical code examples for common SDK use cases. All examples are available as runnable scripts in the [`samples/`](../../samples/) directory.
 
 ## Quick Reference
 
-| Example | Description |
-| ------- | ----------- |
-| [`client_simple.py`](../../examples/client_simple.py) | Minimal sync client usage |
-| [`client.py`](../../examples/client.py) | Full sync evaluation workflow |
-| [`async_client_simple.py`](../../examples/async_client_simple.py) | Minimal async client usage |
-| [`async_client.py`](../../examples/async_client.py) | Full async evaluation workflow |
-| [`async_run_evaluations.py`](../../examples/async_run_evaluations.py) | Run multiple evaluations in parallel |
-| [`get_models.py`](../../examples/get_models.py) | Filter models by name, company, region, type |
-| [`get_benchmarks.py`](../../examples/get_benchmarks.py) | Filter benchmarks by name and type |
-| [`get_evaluation.py`](../../examples/get_evaluation.py) | Fetch an evaluation by ID |
-| [`evaluation_sorting.py`](../../examples/evaluation_sorting.py) | Sort and filter evaluations |
-| [`compare_evaluations.py`](../../examples/compare_evaluations.py) | Compare two models on a benchmark |
-| [`paginated_results.py`](../../examples/paginated_results.py) | Paginate through evaluation results |
-| [`all_results_no_pagination.py`](../../examples/all_results_no_pagination.py) | Fetch all results at once |
-| [`fetch_results_async.py`](../../examples/fetch_results_async.py) | Fetch results for multiple evaluations concurrently |
-| [`create_custom_model.py`](../../examples/create_custom_model.py) | Create a custom model with an OpenAI-compatible API |
-| [`create_custom_benchmark.py`](../../examples/create_custom_benchmark.py) | Create a custom benchmark from a JSONL file |
-| [`create_smart_benchmark.py`](../../examples/create_smart_benchmark.py) | Create an AI-generated benchmark from documents |
-| [`manage_project_models_benchmarks.py`](../../examples/manage_project_models_benchmarks.py) | Add/remove models and benchmarks from a project |
-| [`judges.py`](../../examples/judges.py) | Create, list, update, and delete judges |
-| [`traces.py`](../../examples/traces.py) | Upload, list, get, and delete traces |
-| [`trace_evaluations.py`](../../examples/trace_evaluations.py) | Run judges on traces, estimate cost, get results |
-| [`async_judges_and_traces.py`](../../examples/async_judges_and_traces.py) | Async judge and trace evaluation workflow |
-| [`judge_optimizations.py`](../../examples/judge_optimizations.py) | Estimate, run, and apply judge optimizations |
-| [`public_models.py`](../../examples/public_models.py) | Browse, search, and filter public models |
-| [`public_benchmarks.py`](../../examples/public_benchmarks.py) | Browse public benchmarks and download prompts |
-| [`public_evaluations.py`](../../examples/public_evaluations.py) | Get public evaluation details and results |
+| Sample | Description |
+|--------|-------------|
+| [`benchmark_evaluation.py`](../../samples/core/benchmark_evaluation.py) | Run a model against a benchmark, wait for completion, retrieve results |
+| [`quickstart.py`](../../samples/core/quickstart.py) | Minimal end-to-end trace evaluation |
+| [`async_workflow.py`](../../samples/core/async_workflow.py) | Full async evaluation workflow with concurrent operations |
+| [`async_results.py`](../../samples/core/async_results.py) | Fetch results for multiple evaluations concurrently |
+| [`model_benchmark_management.py`](../../samples/core/model_benchmark_management.py) | Filter models by name/company/region, add/remove from project |
+| [`evaluation_filtering.py`](../../samples/core/evaluation_filtering.py) | Sort and filter evaluations by status, accuracy, date |
+| [`compare_evaluations.py`](../../samples/core/compare_evaluations.py) | Compare two models on a benchmark with outcome filtering |
+| [`paginated_results.py`](../../samples/core/paginated_results.py) | Paginate through results or fetch all at once |
+| [`custom_model.py`](../../samples/core/custom_model.py) | Register a custom model with an OpenAI-compatible API |
+| [`custom_benchmark.py`](../../samples/core/custom_benchmark.py) | Create custom and smart benchmarks from data files |
+| [`create_judge.py`](../../samples/core/create_judge.py) | Create, list, update, and delete judges |
+| [`basic_trace.py`](../../samples/core/basic_trace.py) | Upload, list, get, and delete traces |
+| [`trace_evaluation.py`](../../samples/core/trace_evaluation.py) | Run judges on traces, estimate cost, get results with steps |
+| [`judge_optimization.py`](../../samples/core/judge_optimization.py) | Estimate, run, and apply judge optimizations |
+| [`public_catalog.py`](../../samples/core/public_catalog.py) | Browse public models, benchmarks, evaluations, and prompts |
+| [`integration_management.py`](../../samples/core/integration_management.py) | List, inspect, and test configured integrations |
 
 ## Guides
 
-- [Creating Evaluations](creating-evaluations.md) - Sync, async, and parallel evaluations
-- [Retrieving Results](retrieving-results.md) - Paginated, bulk, and concurrent result fetching
-- [Models and Benchmarks](models-and-benchmarks.md) - Filtering, custom models, custom/smart benchmarks, project management
-- [Judges and Traces](judges-and-traces.md) - Judge CRUD, trace uploads, trace evaluations, and optimizations
-- [Public API](public-api.md) - Public models, benchmarks, evaluations, and comparisons
+- [Creating Evaluations](creating-evaluations.md) -- Sync, async, and parallel evaluations
+- [Retrieving Results](retrieving-results.md) -- Paginated, bulk, and concurrent result fetching
+- [Models and Benchmarks](models-and-benchmarks.md) -- Filtering, custom models, custom/smart benchmarks, project management
+- [Judges and Traces](judges-and-traces.md) -- Judge CRUD, trace uploads, trace evaluations, and optimizations
+- [Public API](public-api.md) -- Public models, benchmarks, evaluations, and comparisons
+
+For the complete samples catalog including industry solutions, OpenClaw agent evaluation, CI/CD integration, and more, see the [Samples Guide](../samples-guide.md).
diff --git a/docs/examples/creating-evaluations.md b/docs/examples/creating-evaluations.md
@@ -8,7 +8,7 @@ Examples for creating evaluations on the Stratix platform using the LayerLens Py
 
 ### Using Synchronous Client
 
-> Source: [`examples/client.py`](../../examples/client.py)
+> Source: [`samples/core/benchmark_evaluation.py`](../../samples/core/benchmark_evaluation.py)
 
 ```python
 from layerlens import Stratix
@@ -49,7 +49,7 @@ else:
 
 ### Minimal Sync Example
 
-> Source: [`examples/client_simple.py`](../../examples/client_simple.py)
+> Source: [`samples/core/benchmark_evaluation.py`](../../samples/core/benchmark_evaluation.py)
 
 ```python
 from layerlens import Stratix
@@ -70,7 +70,7 @@ evaluation = client.evaluations.create(
 
 ### Using Async Client
 
-> Source: [`examples/async_client_simple.py`](../../examples/async_client_simple.py)
+> Source: [`samples/core/async_workflow.py`](../../samples/core/async_workflow.py)
 
 ```python
 import asyncio
@@ -106,7 +106,7 @@ if __name__ == "__main__":
 
 ## Sorting and Filtering Evaluations
 
-> Source: [`examples/evaluation_sorting.py`](../../examples/evaluation_sorting.py)
+> Source: [`samples/core/evaluation_filtering.py`](../../samples/core/evaluation_filtering.py)
 
 ```python
 import asyncio
@@ -163,7 +163,7 @@ if __name__ == "__main__":
 
 ## Comparing Evaluations
 
-> Source: [`examples/compare_evaluations.py`](../../examples/compare_evaluations.py)
+> Source: [`samples/core/compare_evaluations.py`](../../samples/core/compare_evaluations.py)
 
 ```python
 from layerlens import PublicClient
@@ -200,7 +200,7 @@ comparison = client.comparisons.compare(
 
 ## Running Multiple Evaluations in Parallel
 
-> Source: [`examples/async_run_evaluations.py`](../../examples/async_run_evaluations.py)
+> Source: [`samples/core/async_results.py`](../../samples/core/async_results.py)
 
 ```python
 import asyncio
@@ -253,7 +253,7 @@ if __name__ == "__main__":
 
 ### Paginated Results
 
-> Source: [`examples/paginated_results.py`](../../examples/paginated_results.py)
+> Source: [`samples/core/paginated_results.py`](../../samples/core/paginated_results.py)
 
 ```python
 import asyncio
@@ -298,7 +298,7 @@ if __name__ == "__main__":
 
 ### All Results Without Pagination
 
-> Source: [`examples/all_results_no_pagination.py`](../../examples/all_results_no_pagination.py)
+> Source: [`samples/core/paginated_results.py`](../../samples/core/paginated_results.py)
 
 ```python
 import asyncio
@@ -326,7 +326,7 @@ if __name__ == "__main__":
 
 ### Fetch Results for Multiple Evaluations Concurrently
 
-> Source: [`examples/fetch_results_async.py`](../../examples/fetch_results_async.py)
+> Source: [`samples/core/async_results.py`](../../samples/core/async_results.py)
 
 ```python
 import asyncio
@@ -385,3 +385,11 @@ except layerlens.NotFoundError:
 except layerlens.APIError as e:
     print(f"API error: {e}")
 ```
+
+## Related Samples
+
+- [`samples/core/benchmark_evaluation.py`](../../samples/core/benchmark_evaluation.py) -- Full model+benchmark evaluation workflow with result pagination
+- [`samples/core/run_evaluation.py`](../../samples/core/run_evaluation.py) -- Evaluation lifecycle management
+- [`samples/core/trace_evaluation.py`](../../samples/core/trace_evaluation.py) -- Trace-level evaluation with judges
+- [`samples/core/async_results.py`](../../samples/core/async_results.py) -- Concurrent async evaluation and result fetching
+- [`samples/core/compare_evaluations.py`](../../samples/core/compare_evaluations.py) -- Side-by-side evaluation comparison
diff --git a/docs/examples/judges-and-traces.md b/docs/examples/judges-and-traces.md
@@ -4,7 +4,7 @@ Examples for working with judges, traces, and trace evaluations on the Stratix p
 
 ## Creating and Managing Judges
 
-> Source: [`examples/judges.py`](../../examples/judges.py)
+> Source: [`samples/core/create_judge.py`](../../samples/core/create_judge.py)
 
 ```python
 import time
@@ -51,7 +51,7 @@ print(f"Deleted judge {deleted.id}")
 
 ## Uploading and Managing Traces
 
-> Source: [`examples/traces.py`](../../examples/traces.py)
+> Source: [`samples/core/basic_trace.py`](../../samples/core/basic_trace.py)
 
 ```python
 import os
@@ -94,7 +94,7 @@ print(f"Deleted: {deleted}")
 
 ## Running Trace Evaluations
 
-> Source: [`examples/trace_evaluations.py`](../../examples/trace_evaluations.py)
+> Source: [`samples/core/trace_evaluation.py`](../../samples/core/trace_evaluation.py)
 
 ```python
 import time
@@ -150,7 +150,7 @@ client.judges.delete(judge.id)
 
 ## Judge Optimizations
 
-> Source: [`examples/judge_optimizations.py`](../../examples/judge_optimizations.py)
+> Source: [`samples/core/judge_optimization.py`](../../samples/core/judge_optimization.py)
 
 Optimization requires that the judge has at least 10 annotations (trace evaluation results). Run trace evaluations first to build up annotation data.
 
@@ -221,7 +221,7 @@ client.judges.delete(judge.id)
 
 ## Async Judges and Traces
 
-> Source: [`examples/async_judges_and_traces.py`](../../examples/async_judges_and_traces.py)
+> Source: [`samples/core/async_results.py`](../../samples/core/async_results.py)
 
 ```python
 import os

diff --git a/docs/examples/models-and-benchmarks.md b/docs/examples/models-and-benchmarks.md
@@ -4,7 +4,7 @@ Examples for browsing, filtering, creating, and managing models and benchmarks u
 
 ## Filtering Models
 
-> Source: [`examples/get_models.py`](../../examples/get_models.py)
+> Source: [`samples/core/model_benchmark_management.py`](../../samples/core/model_benchmark_management.py)
 
 ```python
 import asyncio
@@ -56,7 +56,7 @@ if __name__ == "__main__":
 
 ## Filtering Benchmarks
 
-> Source: [`examples/get_benchmarks.py`](../../examples/get_benchmarks.py)
+> Source: [`samples/core/model_benchmark_management.py`](../../samples/core/model_benchmark_management.py)
 
 ```python
 import asyncio
@@ -98,7 +98,7 @@ if __name__ == "__main__":
 
 ## Creating a Custom Model
 
-> Source: [`examples/create_custom_model.py`](../../examples/create_custom_model.py)
+> Source: [`samples/core/custom_model.py`](../../samples/core/custom_model.py)
 
 Custom models let you evaluate any model accessible via an OpenAI-compatible chat completions endpoint.
 
@@ -139,7 +139,7 @@ if __name__ == "__main__":
 
 ## Creating a Custom Benchmark
 
-> Source: [`examples/create_custom_benchmark.py`](../../examples/create_custom_benchmark.py)
+> Source: [`samples/core/custom_benchmark.py`](../../samples/core/custom_benchmark.py)
 
 Custom benchmarks are created from JSONL files with `input` and `truth` fields.
 
@@ -197,7 +197,7 @@ Optional field: `subset` (for grouping prompts into categories).
 
 ## Creating a Smart Benchmark
 
-> Source: [`examples/create_smart_benchmark.py`](../../examples/create_smart_benchmark.py)
+> Source: [`samples/core/custom_benchmark.py`](../../samples/core/custom_benchmark.py)
 
 Smart benchmarks use AI to automatically generate benchmark prompts from uploaded documents. Supported file types: `.txt`, `.pdf`, `.html`, `.docx`, `.csv`, `.json`, `.jsonl`, `.parquet`.
 
@@ -238,7 +238,7 @@ if __name__ == "__main__":
 
 ## Managing Project Models and Benchmarks
 
-> Source: [`examples/manage_project_models_benchmarks.py`](../../examples/manage_project_models_benchmarks.py)
+> Source: [`samples/core/model_benchmark_management.py`](../../samples/core/model_benchmark_management.py)
 
 Add and remove public models and benchmarks from your project.
 

diff --git a/docs/examples/public-api.md b/docs/examples/public-api.md
@@ -17,7 +17,7 @@ public = PublicClient()
 
 ## Public Models
 
-> Source: [`examples/public_models.py`](../../examples/public_models.py)
+> Source: [`samples/core/public_catalog.py`](../../samples/core/public_catalog.py)
 
 ```python
 from layerlens import PublicClient
@@ -79,7 +79,7 @@ if __name__ == "__main__":
 
 ## Public Benchmarks
 
-> Source: [`examples/public_benchmarks.py`](../../examples/public_benchmarks.py)
+> Source: [`samples/core/public_catalog.py`](../../samples/core/public_catalog.py)
 
 ```python
 from layerlens import PublicClient
@@ -144,7 +144,7 @@ if __name__ == "__main__":
 
 ## Public Evaluations
 
-> Source: [`examples/public_evaluations.py`](../../examples/public_evaluations.py)
+> Source: [`samples/core/public_catalog.py`](../../samples/core/public_catalog.py)
 
 ```python
 from layerlens import PublicClient
@@ -207,7 +207,7 @@ if __name__ == "__main__":
 
 ## Comparing Evaluations
 
-> Source: [`examples/compare_evaluations.py`](../../examples/compare_evaluations.py)
+> Source: [`samples/core/compare_evaluations.py`](../../samples/core/compare_evaluations.py)
 
 Compare how two models perform on the same benchmark, prompt by prompt.
 

diff --git a/docs/examples/retrieving-results.md b/docs/examples/retrieving-results.md
@@ -4,7 +4,7 @@ Examples for fetching evaluation results using the LayerLens Python SDK, includi
 
 ## Paginated Results
 
-> Source: [`examples/paginated_results.py`](../../examples/paginated_results.py)
+> Source: [`samples/core/paginated_results.py`](../../samples/core/paginated_results.py)
 
 Walk through results page by page with full control over page size.
 
@@ -83,7 +83,7 @@ if __name__ == "__main__":
 
 ## All Results Without Pagination
 
-> Source: [`examples/all_results_no_pagination.py`](../../examples/all_results_no_pagination.py)
+> Source: [`samples/core/paginated_results.py`](../../samples/core/paginated_results.py)
 
 Use `get_all()` to fetch every result in a single call. Simpler but loads everything into memory.
 
@@ -122,7 +122,7 @@ if __name__ == "__main__":
 
 ## Fetch Results for Multiple Evaluations Concurrently
 
-> Source: [`examples/fetch_results_async.py`](../../examples/fetch_results_async.py)
+> Source: [`samples/core/async_results.py`](../../samples/core/async_results.py)
 
 Use `asyncio.gather` to load results for several evaluations in parallel.