reval

reval correlates your Langfuse eval sessions with your git history and uses a multi-agent LLM pipeline to pinpoint which code changes caused which metric regressions. It produces a report with explanations, evidence, and suggested fixes.

Installation

From PyPI:

pip install reval-cli

From source:

git clone https://github.com/calebevans/reval.git
cd reval
pip install .

For development (includes pytest, mypy, ruff, pre-commit):

pip install ".[dev]"

Requires Python 3.10+.

Quick Start

Generate a starter config:

reval init

Set your Langfuse credentials (or add them to reval.yaml):

export LANGFUSE_BASE_URL="https://cloud.langfuse.com"
export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."

Run an analysis against a Langfuse eval session:

reval analyze --eval-results <session-id>

To compare two sessions (current vs. baseline) and correlate regressions with code changes:

reval analyze \
  --eval-results <current-session-id> \
  --eval-baseline <baseline-session-id> \
  --base main

Configuration

reval is configured through a reval.yaml file in your project root. Every field has a sensible default, so the file is optional for simple use cases.

langfuse:
  api_url: https://cloud.langfuse.com
  public_key: pk-...
  secret_key: sk-...
  project_id: ""                  # auto-detected if omitted
  current_session_id: ""          # or use --eval-results
  baseline_session_id: ""         # or use --eval-baseline
  publish: false                  # post results back to Langfuse

metrics:
  - name: answer_relevancy
    threshold: 0.05               # flag if score drops by more than this
  - name: faithfulness
    threshold: 0.05

relevance:
  include_patterns: []            # empty = include all non-ignored files
  ignore_patterns:
    - "**/tests/**"
    - "**/__pycache__/**"
    - "*.md"
    - "*.lock"
  category_mappings:
    prompt:
      - "**/prompts/**"
      - "**/*.prompt"
    model_config:
      - "**/config/model*"
      - "**/*llm_config*"
    retrieval:
      - "**/retrieval/**"
      - "**/rag/**"
    tool_definition:
      - "**/tools/**"
      - "**/functions/**"
    output_parsing:
      - "**/parsers/**"
      - "**/schema*"
    eval_config:
      - "**/eval*"

llm:
  model: openai/gpt-4o            # any LiteLLM model identifier
  temperature: 0.2
  max_tokens: 4096
  context_window: null             # override the model's default context window
  diff_model: null                 # use a different model for diff analysis
  eval_model: null                 # use a different model for eval analysis
  synthesis_model: null            # use a different model for synthesis

git:
  base: HEAD                       # base commit ref
  head: working                    # "working" = uncommitted changes

Configuration Sections

langfuse - Connection settings for your Langfuse instance. Credentials can also be set through environment variables (see below). Set publish: true to write analysis results back to Langfuse as comments.

metrics - List of metric names and their regression thresholds. A metric is flagged as regressed when current_score - baseline_score falls below -threshold. Defaults to 0.05 if not specified.

relevance - Controls which files from the git diff are included in analysis. Files matching ignore_patterns are excluded. If include_patterns is non-empty, only files matching at least one include pattern (and no ignore pattern) are kept. The category_mappings section maps glob patterns to semantic categories (prompt, model_config, retrieval, etc.) so the analysis agents understand the role of each changed file.

llm - Model configuration. The model field accepts any LiteLLM model identifier (e.g. openai/gpt-4o, anthropic/claude-sonnet-4-20250514, vertex_ai/gemini-2.0-flash). You can assign different models to each analysis agent using diff_model, eval_model, and synthesis_model.

git - The commit refs to diff. Set head to working to diff uncommitted changes against base, or set both to commit SHAs/branch names.

Environment Variables

Langfuse credentials can be provided through environment variables instead of (or in addition to) reval.yaml. Environment variables take precedence when the corresponding config field is left empty.

Variable	Config equivalent	Description
`LANGFUSE_BASE_URL`	`langfuse.api_url`	Langfuse API URL
`LANGFUSE_PUBLIC_KEY`	`langfuse.public_key`	Langfuse public key
`LANGFUSE_SECRET_KEY`	`langfuse.secret_key`	Langfuse secret key
`LANGFUSE_PROJECT_ID`	`langfuse.project_id`	Langfuse project ID (auto-detected if omitted)

CLI Reference

`reval init`

Generate a starter reval.yaml with interactive prompts.

reval init [--output PATH]

Option	Default	Description
`--output`	`reval.yaml`	Path for the generated config file

`reval analyze`

Run the analysis pipeline. This is the main command.

reval analyze [OPTIONS]

Option	Default	Description
`--eval-results`		Langfuse session ID for the current eval run (required)
`--eval-baseline`		Langfuse session ID for the baseline run (omit for single-session mode)
`--base`	From config or `HEAD`	Base commit ref
`--head`	From config or `working`	Head ref (`working` for uncommitted changes)
`--config`	`reval.yaml`	Path to config file
`--output`	`terminal`	Output format: `terminal`, `json`, or `markdown`
`--output-file`		Write the report to a file instead of stdout
`--threshold`	`0.05`	Global regression threshold (overrides per-metric config)
`--model`	From config	LLM model to use (overrides config)
`--publish / --no-publish`	From config	Publish results back to Langfuse
`--verbose`	`false`	Show debug information

`reval report`

Re-render a previously saved JSON report in a different format.

reval report REPORT_FILE [OPTIONS]

Option	Default	Description
`--output`	`terminal`	Output format: `terminal`, `json`, or `markdown`
`--output-file`		Write the report to a file instead of stdout

Example: save a JSON report, then render it as markdown later:

reval analyze --eval-results sess-123 --output json --output-file report.json
reval report report.json --output markdown

Analysis Modes

Compare mode

Activated when you provide both --eval-results and --eval-baseline. reval fetches both sessions from Langfuse, diffs the git history between --base and --head, and runs three agents:

Diff agent examines code changes in isolation and forms hypotheses about their potential eval impact.
Eval agent investigates each regressed test case by comparing outputs, scores, and evaluator reasoning between current and baseline runs.
Synthesis agent correlates the diff and eval findings into a final report with explanations and suggested fixes.

Single-session mode

Activated when you omit --eval-baseline. reval analyzes a single eval session without a baseline comparison. It loads source files matching your relevance patterns, runs the eval agent on any test cases that fall below threshold, and produces findings about what may be going wrong.

Output Formats

Format	Flag	Description
Terminal	`--output terminal`	Rich tables and panels with color-coded diffs (default)
JSON	`--output json`	Machine-readable output, can be re-rendered with `reval report`
Markdown	`--output markdown`	Tables and fenced diff blocks, suitable for PRs or documentation

All formats can be written to a file with --output-file PATH.

Publishing to Langfuse

When --publish is passed (or langfuse.publish is set to true in config), reval posts its analysis results back to Langfuse:

A session comment with the full markdown report is added to the current session.
A trace comment with relevant findings is added to each failed trace.

This makes it easy to review reval's analysis directly in the Langfuse UI alongside your eval results.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
prompts		prompts
src/reval		src/reval
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reval

Installation

Quick Start

Configuration

Configuration Sections

Environment Variables

CLI Reference

`reval init`

`reval analyze`

`reval report`

Analysis Modes

Compare mode

Single-session mode

Output Formats

Publishing to Langfuse

About

Uh oh!

Releases 1

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

reval

Installation

Quick Start

Configuration

Configuration Sections

Environment Variables

CLI Reference

reval init

reval analyze

reval report

Analysis Modes

Compare mode

Single-session mode

Output Formats

Publishing to Langfuse

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Contributors

Uh oh!

Languages

`reval init`

`reval analyze`

`reval report`