Skip to content

docs(optimizers): DSPy prompt optimization — user docs, example notebook, and navigation#5853

Open
MoShiha wants to merge 13 commits into
crewAIInc:mainfrom
MoShiha:feat/dspy-docs
Open

docs(optimizers): DSPy prompt optimization — user docs, example notebook, and navigation#5853
MoShiha wants to merge 13 commits into
crewAIInc:mainfrom
MoShiha:feat/dspy-docs

Conversation

@MoShiha
Copy link
Copy Markdown

@MoShiha MoShiha commented May 18, 2026

Summary

This is PR 3 of 3 in the DSPy prompt optimization feature. It adds all user-facing documentation for the DSPyOptimizer API introduced in PR 2 (#5842).

Dependency: This PR must be merged after #5842. All new API types it documents (DSPyOptimizer, OptimizationResult, AgentInstructions, optimization events) are introduced in that PR.

Changes

  • docs/en/concepts/dspy-optimization.mdx — Mintlify concept page with:

    • Quickstart (15-line copy-paste example)
    • Sequence diagram showing the full compile() flow
    • Algorithm comparison table (MIPROv2 vs BootstrapFewShot vs GEPA)
    • Metric writing guide: keyword metric + pairwise LLM-judge pattern
    • API reference for DSPyOptimizer, OptimizationResult, AgentInstructions
    • Observability section (event bus integration)
    • Limitations and Troubleshooting accordions
    • See Also cards linking to Training, Testing, Memory
  • examples/dspy_optimization.ipynb — 9-cell end-to-end notebook:

    • Two-agent email-drafting crew (researcher + writer)
    • Pairwise LLM-judge metric using claude-haiku-4-5-20251001 with position-bias mitigation
    • 10 labeled training examples covering diverse business email types
    • Baseline measurement → compile (MIPROv2, num_trials=20) → result inspection → comparison
    • No hardcoded API keys; all loaded from environment
  • docs/docs.json — adds dspy-optimization to Core Concepts navigation between training and memory

  • lib/crewai/src/crewai/optimizers/dspy_optimizer.py — removes import-not-found from # type: ignore comment; fixes mypy unused-ignore error when dspy is installed

Proof of quality

uv run pytest tests/optimizers/ -q           → 28 passed ✓
uvx ruff check src/crewai/optimizers/        → All checks passed! ✓
uv run mypy src/crewai/optimizers/           → Success: no issues in 3 source files ✓
grep -r 'sk-' examples/dspy_optimization.ipynb → 0 matches ✓

What is explicitly deferred

  • VCR cassette tests (Spec 07): require real LLM API calls to record; planned as a follow-up after merge
  • Security review (Spec 08): threat model written; /prompt-injection-audit pass runs after merge

Test plan

  • Preview docs/en/concepts/dspy-optimization.mdx in Mintlify dev server — all components render
  • Verify dspy-optimization appears in Core Concepts sidebar between Training and Memory
  • Open examples/dspy_optimization.ipynb — cells run top-to-bottom with API keys set
  • Confirm pip install 'crewai[dspy]' installs cleanly in a fresh venv

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added an optional prompt-optimization tool for automating multi-agent instruction tuning (available via an install extra)
    • API to render an agent's effective system prompt
    • Emitted optimization lifecycle events for observability (start/trial/completed/failed)
  • Documentation

    • New comprehensive guide on prompt optimization, algorithms, metrics, observability, and troubleshooting
    • Example notebook demonstrating end-to-end multi-agent optimization
  • Tests

    • Expanded test coverage for optimizer behavior and agent instruction APIs

Review Change Stack

mohamedalishiha and others added 10 commits May 17, 2026 17:26
Confirm role, goal, backstory, system_template, and prompt_template as
stable public API (safe to read and write after construction) by updating
their Field descriptions in BaseAgent.

Add Agent.get_effective_system_prompt() -> str, which returns the fully
rendered system prompt as it would be sent to the LLM. Respects
system_template and prompt_template overrides, and immediately reflects
in-place writes to role, goal, and backstory.

This is the read/write seam that DSPy optimizers and other instrumentation
tools need to inspect and update agent instructions without patching internals.
See: crewAIInc#5818

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Implements DSPy-based prompt optimization as an optional extra
(pip install 'crewai[dspy]'). DSPyOptimizer wraps a Crew as a
dspy.Module and runs MIPROv2, BootstrapFewShot, or GEPA to
algorithmically improve agent backstory/goal instructions, then
writes the optimized instructions back to the crew in-place.

- Add crewai.optimizers package with DSPyOptimizer, OptimizationResult,
  and AgentInstructions types; lazy import guards against missing dspy
- Add 4 optimizer lifecycle events (started/trial/completed/failed)
  integrated with the crewai_event_bus
- Before-LLM-call hook injects few-shot demos from compiled module
- Hooks always cleaned up in finally block; supports MIPROv2,
  BootstrapFewShot, GEPA; compatible with dspy>=2.5 (tested on 3.2.1)
- 28 unit tests; all quality gates pass (ruff, mypy --strict)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- pyproject.toml: tighten dspy constraint to >=2.5,<4 (untested future
  major versions should not be auto-adopted)
- optimizer_events.py: add crew_name/algorithm to
  OptimizationTrialCompletedEvent for event correlation across concurrent runs
- dspy_optimizer.py: encode only agent.backstory in the DSPy signature
  (not backstory+goal) so the optimized text writes back to backstory without
  duplicating goal text in the rendered system prompt
- dspy_optimizer.py: reset self._compiled_module = None at the start of
  compile() so a second call never reads stale demos from the previous run
- base_agent.py: add optimizer-update note to goal/backstory field
  descriptions to match the existing note on role (consistency)

Not implemented (with reasoning):
- Empty-prompt defensive check in get_effective_system_prompt: Pydantic
  validators already enforce non-empty role/goal/backstory
- Set _compiled_module = crew_module before teleprompter.compile(): would
  inject uncompiled demos into optimization trials, corrupting scores
- Per-trial OptimizationTrialCompletedEvent emission: DSPy has no
  per-trial callback; event is reserved for future implementation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… contract

DSPy teleprompters do not expose a per-trial callback, so the event can
never be emitted accurately. The class is kept in optimizer_events.py for
forward compatibility and will be re-exported when DSPy adds callback
support. Tests updated to import the class directly from its module.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… assertions

Async event delivery doesn't guarantee that the first captured event belongs
to the current compile run. Use next() filtered by event content so the
assertion finds the specific event rather than assuming its position.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds one-liner docstrings to all source functions/classes and test
functions that were missing them. Brings combined coverage across PR
files from ~22% to 100%, satisfying CodeRabbit's pre-merge check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…elds

Replace bare isinstance any() check with next() filtered by algorithm and
num_trials, matching the pattern used for Started and Failed event tests.
Consistent with CR comment on async assertion brittleness.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace length-only check with uuid.UUID() parsing and .version == 4
assertion to reject any 36-char non-UUID string.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 412b5b62-e319-4d11-98cb-e92a942b01f2

📥 Commits

Reviewing files that changed from the base of the PR and between 1062023 and a8e461d.

📒 Files selected for processing (5)
  • docs/docs.json
  • docs/en/concepts/dspy-optimization.mdx
  • examples/dspy_optimization.ipynb
  • lib/crewai/src/crewai/optimizers/dspy_optimizer.py
  • lib/crewai/tests/optimizers/test_dspy_optimizer.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • docs/en/concepts/dspy-optimization.mdx
  • lib/crewai/tests/optimizers/test_dspy_optimizer.py
  • lib/crewai/src/crewai/optimizers/dspy_optimizer.py
  • examples/dspy_optimization.ipynb

📝 Walkthrough

Walkthrough

Adds DSPy-based offline prompt optimization (DSPyOptimizer) with teleprompter algorithms, optimizer lifecycle events, Agent.get_effective_system_prompt(), docs, example notebook, tests, and an optional crewai[dspy] extra.

Changes

DSPy Optimizer Implementation

Layer / File(s) Summary
Event Lifecycle & Data Types
lib/crewai/src/crewai/optimizers/types.py, lib/crewai/src/crewai/events/types/optimizer_events.py, lib/crewai/src/crewai/events/types/__init__.py
Adds optimizer event classes (started, trial-completed, completed, failed) and dataclasses (AgentInstructions, OptimizationResult with score_delta).
Agent System Prompt Introspection & Field Stability
lib/crewai/src/crewai/agent/core.py, lib/crewai/src/crewai/agents/agent_builder/base_agent.py
New Agent.get_effective_system_prompt() and expanded Pydantic field descriptions marking role/goal/backstory as stable public API that optimizers may update.
DSPyOptimizer Core Implementation
lib/crewai/src/crewai/optimizers/dspy_optimizer.py
Implements DSPy availability detection, Crew->Module wrapper, predictor signature helpers, teleprompter selection (MIPROv2/BootstrapFewShot/GEPA), before-LLM hook injection, baseline/optimized scoring, compiled-module extraction, in-place agent writeback, event emission, and hook cleanup.
Package Structure & Optional Dependency
lib/crewai/pyproject.toml, lib/crewai/src/crewai/optimizers/__init__.py
Adds crewai[dspy] optional extra and lazy-loads DSPyOptimizer via __getattr__ to avoid importing dspy at package import time.

Agent Instruction API Tests

Layer / File(s) Summary
Agent Instruction Stability Tests
lib/crewai/tests/agents/test_agent_instruction_api.py
Tests readability/writability of role/goal/backstory and templates, get_effective_system_prompt() returns non-empty string and reflects in-place writes, and custom templates render without raw placeholders.

DSPyOptimizer Tests

Layer / File(s) Summary
Data Types, Events & Mock Setup
lib/crewai/tests/optimizers/test_dspy_optimizer.py (ranges 1–320)
Validates AgentInstructions and OptimizationResult invariants, optimizer event contracts, and establishes mock dspy/crew fixtures for deterministic testing.
Optimizer Behavior & Lifecycle
lib/crewai/tests/optimizers/test_dspy_optimizer.py (ranges 322–659)
Covers constructor/compile validation (empty trainset, non-callable metric, duplicate roles), hook cleanup across success/exception paths, event-bus integration (started/completed/failed), algorithm selection, and optimized instruction writeback to agent backstory.

Documentation & Example

Layer / File(s) Summary
DSPy Optimization Documentation
docs/en/concepts/dspy-optimization.mdx
Adds a comprehensive guide: overview, installation (optional extra), quickstart, lifecycle, algorithm guidance, metric examples (keyword and LLM-judge), API reference, multi-agent patterns, observability, limitations, and troubleshooting.
Email Optimization Example Notebook
examples/dspy_optimization.ipynb
Notebook demonstrating a 2-agent email crew, Anthropic pairwise judge metric, labeled training set, MIPROv2 compilation, and baseline-vs-optimized comparison.

Sequence Diagram (high-level compilation flow):

sequenceDiagram
  participant User
  participant DSPyOptimizer
  participant DSPyTeleprompter
  participant Crew
  participant Agent
  participant EventBus
  User->>DSPyOptimizer: compile(trainset, num_trials, algorithm)
  DSPyOptimizer->>Crew: run baseline kickoff
  DSPyOptimizer->>EventBus: emit OptimizationStartedEvent
  loop per trial
    DSPyOptimizer->>DSPyTeleprompter: compile/run trial
    DSPyTeleprompter->>Crew: run with injected demos (before-LLM hook)
    Crew->>Agent: generate outputs using backstory
  end
  DSPyOptimizer->>Agent: write optimized instructions into agents
  DSPyOptimizer->>Crew: run kickoff to measure optimized score
  DSPyOptimizer->>EventBus: emit OptimizationCompletedEvent
  DSPyOptimizer->>User: return OptimizationResult
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Suggested labels

size/L

"I'm a rabbit in the code patch light,
I nibble prompts and make them bright,
DSPy hops in, few-shots take flight,
Agents learn to write emails right." 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding documentation (docs), an example notebook, and navigation for DSPy prompt optimization, which aligns with all major file additions in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
examples/dspy_optimization.ipynb (1)

101-102: ⚡ Quick win

Use a holdout topic for baseline vs optimized output comparison.

The current comparison uses a topic already present in trainset, which can overstate gains in the demo output.

Proposed fix
-baseline_output = crew.kickoff(inputs={"topic": "Q1 earnings call"})
+eval_topic = "customer webinar follow-up"
+baseline_output = crew.kickoff(inputs={"topic": eval_topic})
...
-optimized_output = result.crew.kickoff(inputs={"topic": "Q1 earnings call"})
+optimized_output = result.crew.kickoff(inputs={"topic": eval_topic})

Also applies to: 329-329

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/dspy_optimization.ipynb` around lines 101 - 102, The baseline
comparison uses a topic present in trainset which can bias results; change the
input to crew.kickoff (the baseline_input used to produce baseline_output) to a
true holdout topic not present in trainset (e.g., define holdout_topic and use
that string for both baseline and optimized runs), update the other occurrence
at the second kickoff call around line 329 to use the same holdout_topic
variable, and ensure any references to trainset or training examples remain
unchanged so the holdout stays unseen during evaluation.
docs/en/concepts/dspy-optimization.mdx (1)

176-177: ⚡ Quick win

Consider using an env-configurable judge model for documentation examples.

While claude-haiku-4-5-20251001 is currently supported, using an environment variable with a current default would improve maintainability and reduce future updates when models are deprecated or rotated. Alternatively, you could use the stable alias claude-haiku-4-5 instead of the dated version.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/concepts/dspy-optimization.mdx` around lines 176 - 177, Replace the
hard-coded dated model string "claude-haiku-4-5-20251001" with an
environment-configurable judge model so docs use a current default; update the
example to reference an env var (e.g., JUDGE_MODEL) fallback to a stable alias
like "claude-haiku-4-5" so the example shows using process.env.JUDGE_MODEL ||
"claude-haiku-4-5" (or the equivalent in the repo's docs examples) instead of
the fixed "claude-haiku-4-5-20251001" value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/dspy_optimization.ipynb`:
- Around line 41-42: The notebook uses assert statements to check environment
variables OPENAI_API_KEY and ANTHROPIC_API_KEY which can be skipped when Python
is run with -O; replace these asserts with explicit runtime checks: call
os.getenv for "OPENAI_API_KEY" and "ANTHROPIC_API_KEY" and if a value is
missing, raise a clear exception (e.g., RuntimeError or SystemExit) or call
sys.exit with a descriptive message so the notebook fails immediately and with a
helpful error; update the two lines that reference os.getenv("OPENAI_API_KEY")
and os.getenv("ANTHROPIC_API_KEY") accordingly.

In `@lib/crewai/src/crewai/optimizers/dspy_optimizer.py`:
- Around line 79-82: Detect and prevent duplicate agent.role values before
building self.agent_predictors by adding a validation in compile(): iterate
crew.agents and collect agent.role into a set, raise a clear error if a
duplicate is found (or alternatively use a guaranteed-unique identifier such as
agent.id when constructing the dict), then build self.agent_predictors =
{agent.role: _dspy.ChainOfThought(_build_signature_for_agent(agent)) for agent
in crew.agents} only after the uniqueness check so demo injection/writeback that
uses self.agent_predictors, agent.role, and methods referenced in compile()
won't misroute.

In `@lib/crewai/tests/optimizers/test_dspy_optimizer.py`:
- Around line 460-544: Flush the global event bus before subscribing to clear
stale events (call crewai_event_bus.flush() before crewai_event_bus.on(...)) and
scope assertions to the current optimizer instance by capturing the event source
in your handlers (the first arg `src` in `_on_started`, `_on_completed`,
`_on_failed`) and only accept events where `src is optimizer` (or filter
received events by `src == optimizer`) when searching for the expected
OptimizationStartedEvent/OptimizationCompletedEvent/OptimizationFailedEvent;
keep the existing checks on algorithm/num_trials/error in addition to the source
check.

---

Nitpick comments:
In `@docs/en/concepts/dspy-optimization.mdx`:
- Around line 176-177: Replace the hard-coded dated model string
"claude-haiku-4-5-20251001" with an environment-configurable judge model so docs
use a current default; update the example to reference an env var (e.g.,
JUDGE_MODEL) fallback to a stable alias like "claude-haiku-4-5" so the example
shows using process.env.JUDGE_MODEL || "claude-haiku-4-5" (or the equivalent in
the repo's docs examples) instead of the fixed "claude-haiku-4-5-20251001"
value.

In `@examples/dspy_optimization.ipynb`:
- Around line 101-102: The baseline comparison uses a topic present in trainset
which can bias results; change the input to crew.kickoff (the baseline_input
used to produce baseline_output) to a true holdout topic not present in trainset
(e.g., define holdout_topic and use that string for both baseline and optimized
runs), update the other occurrence at the second kickoff call around line 329 to
use the same holdout_topic variable, and ensure any references to trainset or
training examples remain unchanged so the holdout stays unseen during
evaluation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ac9439ae-2f3d-4e79-99c5-1df1f6e93a22

📥 Commits

Reviewing files that changed from the base of the PR and between e8aa870 and c409e04.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • docs/docs.json
  • docs/en/concepts/dspy-optimization.mdx
  • examples/dspy_optimization.ipynb
  • lib/crewai/pyproject.toml
  • lib/crewai/src/crewai/agent/core.py
  • lib/crewai/src/crewai/agents/agent_builder/base_agent.py
  • lib/crewai/src/crewai/events/types/__init__.py
  • lib/crewai/src/crewai/events/types/optimizer_events.py
  • lib/crewai/src/crewai/optimizers/__init__.py
  • lib/crewai/src/crewai/optimizers/dspy_optimizer.py
  • lib/crewai/src/crewai/optimizers/types.py
  • lib/crewai/tests/agents/test_agent_instruction_api.py
  • lib/crewai/tests/optimizers/__init__.py
  • lib/crewai/tests/optimizers/test_dspy_optimizer.py

Comment thread examples/dspy_optimization.ipynb Outdated
Comment thread lib/crewai/src/crewai/optimizers/dspy_optimizer.py
Comment thread lib/crewai/tests/optimizers/test_dspy_optimizer.py
mohamedalishiha and others added 3 commits May 18, 2026 20:13


Three targeted fixes responding to coderabbitai review:

1. dspy_optimizer.py: add duplicate-role guard in compile() — two agents
   sharing the same role collide in agent_predictors (keyed by role), causing
   silent misrouting of few-shot demo injection and backstory writeback.
   Raises ValueError with the duplicate role names before the optimization
   loop starts.

2. test_dspy_optimizer.py: scope event bus assertions to the current optimizer
   instance — subscribe after creating the optimizer, call flush() before
   subscribing to drain stale events, capture (src, event) tuples, and filter
   with s is optimizer so tests cannot pass on events from parallel tests.

3. dspy_optimizer.py: drop redundant type: ignore[import-not-found] comment
   that triggers a mypy unused-ignore error when dspy is installed.

29 optimizer tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PR 3 of the DSPy prompt optimization feature: user-facing documentation
and a runnable end-to-end example demonstrating the optimizer API.

Changes:
- docs/en/concepts/dspy-optimization.mdx: Mintlify page covering quickstart,
  sequence diagram, algorithm comparison, metric writing guide, API reference,
  observability, limitations, and troubleshooting
- examples/dspy_optimization.ipynb: 9-cell notebook with a two-agent
  email-drafting crew, pairwise LLM-judge metric, and 10 training examples
- docs/docs.json: add dspy-optimization to Core Concepts navigation (between
  training and memory)
- lib/crewai/src/crewai/optimizers/dspy_optimizer.py: drop redundant
  type: ignore[import-not-found] comment that caused mypy unused-ignore error

All 28 optimizer unit tests pass. ruff clean. mypy clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Four targeted fixes responding to coderabbitai review on the docs PR:

1. examples/dspy_optimization.ipynb: replace assert statements for env var
   checks with explicit RuntimeError so the notebook fails immediately with
   a clear message even when Python is run with -O optimizations enabled.

2. examples/dspy_optimization.ipynb: use a held-out topic (HOLDOUT_TOPIC =
   customer webinar follow-up) for baseline and optimized crew comparison so
   the demo comparison is fair and does not overstate gains by using a topic
   that appears in the training set.

3. examples/dspy_optimization.ipynb: use os.getenv(JUDGE_MODEL, claude-haiku-4-5)
   stable alias instead of pinned dated version so the notebook keeps working
   across minor model updates; users can still pin via env var.

4. docs/en/concepts/dspy-optimization.mdx: same JUDGE_MODEL env var pattern
   with claude-haiku-4-5 stable alias in the LLM-judge metric code example.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@MoShiha
Copy link
Copy Markdown
Author

MoShiha commented May 18, 2026

All 3 actionable items and 2 nitpicks from the CodeRabbit review are addressed:

Actionable (implemented):

  1. assertRuntimeError (notebook cell 2) — commit a8e461da6. Replaced both assert statements with an explicit RuntimeError that lists all missing env vars and is safe with -O optimization flags.
  2. Duplicate agent role guard (dspy_optimizer.py) — commit d6fd8a6f1. Added upfront validation in compile() that detects duplicate agent.role values and raises ValueError with the full list before the optimization loop starts.
  3. Event bus source scoping (test file) — commit d6fd8a6f1. All three event tests now create the optimizer before subscribing, call flush() to drain stale events, capture (src, event) tuples, and filter with s is optimizer.

Nitpicks (implemented):
4. Holdout topic (notebook cell 4/9) — commit a8e461da6. Introduced HOLDOUT_TOPIC = "customer webinar follow-up" for the baseline and optimized comparison; this topic is absent from all 10 trainset examples.
5. Stable judge model alias (notebook + docs) — commit a8e461da6. Changed "claude-haiku-4-5-20251001" to os.getenv("JUDGE_MODEL", "claude-haiku-4-5") in both the notebook and the docs LLM-judge example so the code stays working across minor model updates while still allowing users to pin a specific version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants