In this example, we will see GEPA evolve the whole DSPy program (not just the instruction), including modifying the structure/dataflow of the program. We will use GEPA to tune a simple dspy.ChainOfThought module for MATH questions into a full DSPy program.

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Verify OPENAI_API_KEY is loaded
if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("OPENAI_API_KEY not found in .env file")

In [None]:
import dspy

In [None]:
import random

from dspy.datasets import MATH

dataset = MATH(subset="algebra")

# Shuffle the train and dev sets
random.Random(0).shuffle(dataset.train)
random.Random(0).shuffle(dataset.dev)

print(len(dataset.train), len(dataset.dev), len(dataset.test))

Let's inspect an example from the training set.

In [None]:
example = dataset.train[0]
print("Question:", example.question)
print("Reasoning:", example.reasoning)
print("Answer:", example.answer)

Let's define a simple DSPy program to solve this task.

Unlike dspy.GEPA that can take an instantiated DSPy module as input, here, we want to evolve the full DSPy program. Hence, a candidate here is the source code as string. The seed program does not need to be sophisticated, it just needs to demonstrate what the expected input/output interface is, and possibly the available tools. You can also include any additional information about the environment as a comment.

In [None]:
program_src = """import dspy
program = dspy.ChainOfThought("question -> answer")"""

GEPA interfaces with external frameworks through an adapter. In this case, we integrate GEPA with a DspyAdapter.

In [None]:
from gepa.adapters.dspy_full_program_adapter.full_program_adapter import DspyAdapter

In [None]:
def metric_fn(example, pred, trace=None):
    score = dataset.metric(example, pred)
    if score:
        feedback_text = f"The provided answer '{pred.answer}' is correct."
    else:
        feedback_text = f"The provided answer '{pred.answer}' is incorrect. The correct answer is '{example.answer}'. Here's the step by step solution:\n{example.reasoning}"
    return dspy.Prediction(score=score, feedback=feedback_text)

In [None]:
from gepa.adapters.dspy_full_program_adapter.full_program_adapter import DspyAdapter
from lm import get_dspy_lm

reflection_lm = get_dspy_lm("openai/gpt-4.1", max_tokens=32000)  # temperature=1
adapter = DspyAdapter(
    task_lm=get_dspy_lm("openai/gpt-4.1-nano", max_tokens=32000),
    metric_fn=metric_fn,
    num_threads=80,
    reflection_lm=lambda x: reflection_lm(x)[0],
)

Let's evaluate the base program

In [None]:
o = adapter.evaluate(dataset.test, {"program": program_src})

The base program obtains a score of 67.1%

Let's launch the GEPA optimization.

In [None]:
from gepa import optimize

o = optimize(
    seed_candidate={"program": program_src},
    trainset=dataset.train,
    valset=dataset.dev[:200],
    adapter=adapter,
    reflection_lm=lambda x: reflection_lm(x)[0],
    max_metric_calls=2000,
    display_progress_bar=True,
)

Let's see the DSPy program found by GEPA

In [None]:
print(o.best_candidate["program"])

Evaluating the optimized program

In [None]:
_ = adapter.evaluate(dataset.test, o.best_candidate)

We see it going from **67% to 93%** in just a few rounds of optimization!