Evergreen: Efficient Claim Verification for Semantic Aggregates

This repository contains the code and experiments for Evergreen: Efficient Claim Verification for Semantic Aggregates.

Prerequisites

Python 3.12+
uv
DuckDB
Snowflake account for Cortex AI language and embedding model access
Yelp Open Dataset

Setup

Install dependencies:

curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync --all-groups

Configure Snowflake connection by creating ~/.snowflake/connections.toml:

[evergreen]
account = "<account identifier>"
user = "<login name>"
password = "<programmatic access token>"
role = "<role>"
database = "<database>"
schema = "<schema>"
warehouse = "<warehouse>"

Set cache directory:

export EVERGREEN_CACHE_DIR_ROOT=~/.cache/evergreen/

Run tests (which depend on the docs) to ensure correct setup:

uv run mkdocs build
uv run pytest

Data Preparation

Download the Yelp Open Dataset and extract to data/yelp_dataset/. Extract evaluation datasets using DuckDB.

duckdb -c ".read experiments/scripts/yelp_restaurant_reviews/johns_roast_pork.sql"
duckdb -c ".read experiments/scripts/yelp_restaurant_reviews/mcdonalds_mo.sql"
duckdb -c ".read experiments/scripts/yelp_restaurant_reviews/village_whiskey.sql"

Add embeddings:

uv run python -m experiments.scripts.add_embeddings \
    --dataset_path data/yelp_restaurant_reviews/johns_roast_pork.jsonl \
    --fields text

uv run python -m experiments.scripts.add_embeddings \
    --dataset_path data/yelp_restaurant_reviews/mcdonalds_mo.jsonl \
    --fields text

uv run python -m experiments.scripts.add_embeddings \
    --dataset_path data/yelp_restaurant_reviews/village_whiskey.jsonl \
    --fields text

Reproducing Results

Run the full evaluation:

uv run python -m experiments.scripts.run_claim_evaluators \
    --impls base_rm rag_agent \
    --lms claude-opus-4-6 claude-sonnet-4-6 claude-haiku-4-5

uv run python -m experiments.scripts.run_claim_evaluators \
    --impls evg_ref

uv run python -m experiments.scripts.run_claim_evaluators \
    --impls evg_opt \
    --lms claude-opus-4-6 claude-sonnet-4-6 claude-haiku-4-5 llama4-maverick llama4-scout llama3.1-8b

uv run python -m experiments.scripts.run_claim_evaluators \
    --impls evg_unopt \
    --lms claude-opus-4-6

uv run python -m experiments.scripts.run_claim_evaluators \
    --impls evg_abl_no_es evg_abl_no_rs evg_abl_no_est evg_abl_no_fus evg_abl_no_sf evg_abl_no_cache \
    --lms claude-haiku-4-5

uv run python -m experiments.scripts.run_claim_evaluators --eval_sim_filter

Consider moving any existing results in experiments/results/ to a separate directory to avoid overwriting or double counting results.

Generate figures:

uv run python -m experiments.scripts.plot_results

Figures are saved to experiments/figures/.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
experiments		experiments
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evergreen: Efficient Claim Verification for Semantic Aggregates

Prerequisites

Setup

Data Preparation

Reproducing Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Evergreen: Efficient Claim Verification for Semantic Aggregates

Prerequisites

Setup

Data Preparation

Reproducing Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages