Feature Area
Core functionality
Is your feature request related to a an existing bug? Please link it here.
NA
Describe the solution you'd like
Problem
Developers building production CrewAI applications spend significant manual effort tuning role, goal, backstory, and task description fields by hand — iterating on prompts without a systematic method to measure improvement. This is the classic "prompt engineering treadmill": changes are based on intuition, results are hard to reproduce, and there is no objective signal for when a crew is actually better.
The community has been solving this with workarounds for over a year:
Every one of these workarounds monkey-patches the internal LLM call chain because there is no stable, documented seam to hook into — meaning they break on every CrewAI release.
Proposed Solution
Ship a crewai[dspy] optional extra that provides a DSPyOptimizer class — a thin adapter between CrewAI's existing infrastructure and DSPy's optimization algorithms (MIPROv2, BootstrapFewShot, GEPA, etc.).
Key insight: the infrastructure already exists
The LLM hooks system introduced in #1875 provides almost everything needed:
# crewai/hooks/llm_hooks.py — already shipped
from crewai.hooks.llm_hooks import (
register_before_llm_call_hook,
register_after_llm_call_hook,
LLMCallHookContext,
)
# LLMCallHookContext already exposes:
# context.messages — the full composed prompt (mutable in-place)
# context.agent — the agent (role, goal, backstory, system_template)
# context.task — the task (description, expected_output)
# context.crew — the crew instance
# context.response — the LLM's response (in after hooks)
A crewai[dspy] adapter would use these hooks to:
- During optimization runs: capture
(messages, response) pairs and score them against a developer-provided metric function
- After convergence: write optimized instructions back to
agent.role, agent.goal, agent.backstory, or agent.system_template
- At inference time: inject optimized few-shot examples into
context.messages via a before hook
This is structurally identical to how crewai[mem0] plugs into the memory system — a framework-level optional extra, not a runtime tool.
Developer experience: before vs. after
Before (current workaround — breaks on CrewAI updates):
import dspy
from crewai import Crew, Agent, Task
# Monkey-patch the internal LLM method (version-coupled, fragile)
original_call = crew.agents[0].llm._call
def patched_call(prompt, **kwargs):
response = original_call(prompt, **kwargs)
dspy_module.update(prompt, response)
return response
crew.agents[0].llm._call = patched_call
# Run optimizer separately with no awareness of crew structure
optimizer = dspy.MIPROv2(metric=my_metric)
# ... glue code to connect DSPy signatures to CrewAI agents ...
After (proposed crewai[dspy]):
from crewai import Crew, Agent, Task
from crewai.optimizers.dspy import DSPyOptimizer # crewai[dspy]
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, write_task],
)
def quality_metric(example, prediction) -> float:
"""Score output quality — any callable returning 0.0–1.0."""
judge = dspy.ChainOfThought("output -> score: float")
return float(judge(output=prediction.final_output).score)
optimizer = DSPyOptimizer(
crew=crew,
metric=quality_metric,
algorithm="MIPROv2", # or "BootstrapFewShot", "GEPA"
)
# Run optimization against labeled examples
result = optimizer.compile(trainset=my_examples, num_trials=20)
# Optimized crew — same interface, better prompts
optimized_crew = result.crew
optimized_crew.kickoff(inputs={"topic": "climate change"})
# Inspect what changed
print(result.score_delta) # +0.18
print(result.optimized_instructions) # dict of agent_role -> new instructions
Implementation Scope
This proposal is scoped to three independently-reviewable PRs to stay within the contributing guide's size/XL threshold:
PR 1 — Stable read/write access to agent instructions (in core)
Confirm or add a documented, public path to read and write the effective instructions of an agent after construction. Today agent.role, agent.goal, agent.backstory, agent.system_template, and agent.prompt_template are all writable Pydantic fields, but this is undocumented as a public API.
Ask: Add a doc comment confirming these fields are stable and intended for programmatic rewrite, and add a helper agent.get_effective_system_prompt() -> str if one doesn't already exist.
Files: lib/crewai/src/crewai/agent/core.py
Size: < 50 lines
PR 2 — DSPyOptimizer as crewai[dspy] optional extra (in core)
Add lib/crewai/src/crewai/optimizers/dspy_optimizer.py and declare the optional extra:
# lib/crewai/pyproject.toml
[project.optional-dependencies]
dspy = ["dspy>=2.5,<3"]
The DSPyOptimizer class:
- Registers before/after LLM call hooks during the optimization loop
- Uses
before_kickoff_callbacks / after_kickoff_callbacks on Crew to demarcate runs
- Delegates to
crew.train() mechanics for the outer training loop
- Writes optimized instructions back via the documented agent fields from PR 1
- Returns an
OptimizationResult dataclass with crew, score_delta, optimized_instructions, version_id
Files: lib/crewai/src/crewai/optimizers/__init__.py, lib/crewai/src/crewai/optimizers/dspy_optimizer.py, lib/crewai/pyproject.toml
Size: ~300 lines
PR 3 — Example in crewai-examples
End-to-end notebook: email-drafting crew optimized with MIPROv2 against an LLM-judge metric. Adapts the working monkey-patch tutorial at Ronoh4/dspy_crewai_course into the clean API.
What This Is NOT
To be explicit about scope, given the history of related closed issues:
- Not an "auto-improve" feature:
DSPyOptimizer does not run automatically or connect to any hosted service. It is an offline, developer-invoked, local optimization loop — no different in kind from crew.train().
- Not a replacement for manual prompt crafting: It is a tool for developers who want to measure and improve their crews against a metric they define.
- Not a hosted prompt-management product: It stores optimized configs locally. It does not touch CrewAI Enterprise, AMP, or any cloud observability feature.
- Not a new dependency in the default install:
dspy is only installed when a developer explicitly runs pip install crewai[dspy].
Acceptance Criteria
Additional Context
Prior art in CrewAI:
- #1875 — DSPy-style callbacks accepted and shipped → the contribution shape this proposal follows
crew.train() / TaskEvaluator — the existing training loop this optimizer extends
crewai[mem0] — the optional-extra packaging pattern this follows
Prior art elsewhere:
Willingness to contribute: Yes — happy to submit PR 1 and PR 2 if the maintainer team signals interest in the approach. Would appreciate a comment confirming the proposed file locations and optional-extra name before starting.
Filed against: crewAIInc/crewAI main branch
Related: #1875, #3280, #3015
Label suggestion: feature-request, integration, Core functionality
Describe alternatives you've considered
Alternatives Considered
1. Leave it to the community (status quo)
The workaround ecosystem (courses, monkey-patch tutorials, third-party optimizers) handles it. Rejected: the workarounds break on every CrewAI release because they patch internals. A stable hook surface in core prevents this fragility — even if CrewAI never ships DSPyOptimizer itself, the hooks protect the community from churn.
2. Standalone crewai-dspy package (not in core)
Ship the adapter entirely outside the monorepo. Considered: this is viable and has precedent (LangChain's approach). Not preferred: the crewai[mem0] pattern (optional extra in core) gives tighter CI coupling — changes to hooks or agent internals are caught in the same test suite that tests the optimizer, rather than silently breaking a downstream package. The path for crewai-tools (external → absorbed into monorepo) suggests the maintainers prefer eventual consolidation.
3. Only add the hook documentation (PR 1 only)
Document the existing LLM hooks as the official stable seam without shipping an adapter. Also acceptable as a first step if the maintainer team prefers to let the community build the adapter first.
Additional context
No response
Willingness to Contribute
Yes, I'd be happy to submit a pull request
Feature Area
Core functionality
Is your feature request related to a an existing bug? Please link it here.
NA
Describe the solution you'd like
Problem
Developers building production CrewAI applications spend significant manual effort tuning
role,goal,backstory, and taskdescriptionfields by hand — iterating on prompts without a systematic method to measure improvement. This is the classic "prompt engineering treadmill": changes are based on intuition, results are hard to reproduce, and there is no objective signal for when a crew is actually better.The community has been solving this with workarounds for over a year:
crewai's LLM call chain (tested against CrewAI 0.152.0 + DSPy 2.6.27)Every one of these workarounds monkey-patches the internal LLM call chain because there is no stable, documented seam to hook into — meaning they break on every CrewAI release.
Proposed Solution
Ship a
crewai[dspy]optional extra that provides aDSPyOptimizerclass — a thin adapter between CrewAI's existing infrastructure and DSPy's optimization algorithms (MIPROv2, BootstrapFewShot, GEPA, etc.).Key insight: the infrastructure already exists
The LLM hooks system introduced in #1875 provides almost everything needed:
A
crewai[dspy]adapter would use these hooks to:(messages, response)pairs and score them against a developer-provided metric functionagent.role,agent.goal,agent.backstory, oragent.system_templatecontext.messagesvia a before hookThis is structurally identical to how
crewai[mem0]plugs into the memory system — a framework-level optional extra, not a runtime tool.Developer experience: before vs. after
Before (current workaround — breaks on CrewAI updates):
After (proposed
crewai[dspy]):Implementation Scope
This proposal is scoped to three independently-reviewable PRs to stay within the contributing guide's
size/XLthreshold:PR 1 — Stable read/write access to agent instructions (in core)
Confirm or add a documented, public path to read and write the effective instructions of an agent after construction. Today
agent.role,agent.goal,agent.backstory,agent.system_template, andagent.prompt_templateare all writable Pydantic fields, but this is undocumented as a public API.Ask: Add a doc comment confirming these fields are stable and intended for programmatic rewrite, and add a helper
agent.get_effective_system_prompt() -> strif one doesn't already exist.Files:
lib/crewai/src/crewai/agent/core.pySize: < 50 lines
PR 2 —
DSPyOptimizerascrewai[dspy]optional extra (in core)Add
lib/crewai/src/crewai/optimizers/dspy_optimizer.pyand declare the optional extra:The
DSPyOptimizerclass:before_kickoff_callbacks/after_kickoff_callbacksonCrewto demarcate runscrew.train()mechanics for the outer training loopOptimizationResultdataclass withcrew,score_delta,optimized_instructions,version_idFiles:
lib/crewai/src/crewai/optimizers/__init__.py,lib/crewai/src/crewai/optimizers/dspy_optimizer.py,lib/crewai/pyproject.tomlSize: ~300 lines
PR 3 — Example in
crewai-examplesEnd-to-end notebook: email-drafting crew optimized with MIPROv2 against an LLM-judge metric. Adapts the working monkey-patch tutorial at Ronoh4/dspy_crewai_course into the clean API.
What This Is NOT
To be explicit about scope, given the history of related closed issues:
DSPyOptimizerdoes not run automatically or connect to any hosted service. It is an offline, developer-invoked, local optimization loop — no different in kind fromcrew.train().dspyis only installed when a developer explicitly runspip install crewai[dspy].Acceptance Criteria
pip install crewai[dspy]succeeds without errorsDSPyOptimizer(crew, metric).compile(trainset)runs an optimization loop and returns anOptimizationResultresult.crewproduces measurably better outputs on the training metric than the baselinecrewai[dspy]is not installed (no import at module level)compile()completes (no global state leak)crewai-examplesAdditional Context
Prior art in CrewAI:
crew.train()/TaskEvaluator— the existing training loop this optimizer extendscrewai[mem0]— the optional-extra packaging pattern this followsPrior art elsewhere:
DSPyOptimizercould emit events compatible withWillingness to contribute: Yes — happy to submit PR 1 and PR 2 if the maintainer team signals interest in the approach. Would appreciate a comment confirming the proposed file locations and optional-extra name before starting.
Filed against:
crewAIInc/crewAImain branchRelated: #1875, #3280, #3015
Label suggestion:
feature-request,integration,Core functionalityDescribe alternatives you've considered
Alternatives Considered
1. Leave it to the community (status quo)
The workaround ecosystem (courses, monkey-patch tutorials, third-party optimizers) handles it. Rejected: the workarounds break on every CrewAI release because they patch internals. A stable hook surface in core prevents this fragility — even if CrewAI never ships
DSPyOptimizeritself, the hooks protect the community from churn.2. Standalone
crewai-dspypackage (not in core)Ship the adapter entirely outside the monorepo. Considered: this is viable and has precedent (LangChain's approach). Not preferred: the
crewai[mem0]pattern (optional extra in core) gives tighter CI coupling — changes to hooks or agent internals are caught in the same test suite that tests the optimizer, rather than silently breaking a downstream package. The path forcrewai-tools(external → absorbed into monorepo) suggests the maintainers prefer eventual consolidation.3. Only add the hook documentation (PR 1 only)
Document the existing LLM hooks as the official stable seam without shipping an adapter. Also acceptable as a first step if the maintainer team prefers to let the community build the adapter first.
Additional context
No response
Willingness to Contribute
Yes, I'd be happy to submit a pull request