Python tools for testing and running GEPA optimization on extraction tasks.
uv run cli.pyFeatures:
- Interactive example set selection
- Visual configuration with preset selection
- Live optimization progress
- Results display with baseline vs optimized scores
- Keyboard navigation (arrow keys, Enter, Escape, q to quit)
# With presets
uv run python cli.py -e event -n 10 --preset fast|balanced|thorough
# Custom parameters
uv run python cli.py -e event -n 10 -a light -r 3 -b 4 -t 2Test compiled GEPA programs on new inputs:
# Single text input
uv run python infer.py --program results/path/to/optimized_program.json --schema event --text "Meeting tomorrow at 10am"
# Single example file
uv run python infer.py --program optimized_program.json --schema event --example examples/event/conference.json
# Test example set (with accuracy metrics)
uv run python infer.py --program optimized_program.json --schema event --example-set event
# Batch processing
uv run python infer.py --program optimized_program.json --schema event --batch inputs.jsonlAvailable sets: contact_details, event, or add your own in ./examples/
Requirements:
- Schema file:
./schemas/<name>.json - Example JSON files:
./examples/<name>/*.json
| Preset | Auto Mode | Reflection | Eval | Use Case |
|---|---|---|---|---|
| fast | light | 2 | 3 | Development |
| balanced | light | 3 | 4 | General use |
| thorough | medium | 5 | 6 | Production |
- Auto Mode (-a): Optimization intensity (light/medium/heavy)
- Reflection Batch (-r): Examples for prompt reflection
- Eval Batch (-b): Examples for evaluation
- Num Examples (-n): Training examples (fewer = more exploration)
- Threads (-t): Parallel execution
Configure via .env file (copy from .env.example):
# MLflow tracking (optional)
MLFLOW_URL=http://ubuntuserver:5000
# Teacher model (generates optimized prompts)
TEACHER_API_KEY="not-needed"
TEACHER_MODEL="genaihulk:8080/v1"
TEACHER_BASE_URL="gpt-oss-120B"
# Student model (executes extraction tasks)
STUDENT_API_KEY="not-needed"
STUDENT_MODEL="http://localhost:1234/v1"
STUDENT_BASE_URL="qwen3-coder-30b"
# Paths (optional - defaults to repo directories)
XTR_SCHEMAS_DIR=./schemas
XTR_EXAMPLES_DIR=./examples
XTR_OUTPUT_DIR=./resultsRun unit tests:
uv run python -m unittest discover testsPrecision/recall-based F-beta metric with the following components:
- Base parse score: 0.2 (valid JSON)
- Base schema score: 0.2 (passes schema validation)
- Field quality: 0.5 weight (F1.5 score of correct fields)
- Coverage bonus: 0.1 weight (recall of expected fields)
Scoring examples:
- 0.0: JSON parse failure
- 0.16: Schema validation failed (base parse × 0.8 penalty)
- 0.4: Valid JSON matching schema (no fields matched)
- 0.4-1.0: Partial to full field matches
- 1.0: Exact match
# Install dependencies
uv sync
# Add new dependency
uv add <package-name>Verify models are running:
curl http://genaihulk:8080/v1/models # Teacher
curl http://localhost:1234/v1/models # StudentCheck example files:
ls ./examples/ ./schemas/