Skip to content

brown-db/evergreen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evergreen: Efficient Claim Verification for Semantic Aggregates

This repository contains the code and experiments for Evergreen: Efficient Claim Verification for Semantic Aggregates.

Prerequisites

Setup

Install dependencies:

curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync --all-groups

Configure Snowflake connection by creating ~/.snowflake/connections.toml:

[evergreen]
account = "<account identifier>"
user = "<login name>"
password = "<programmatic access token>"
role = "<role>"
database = "<database>"
schema = "<schema>"
warehouse = "<warehouse>"

Set cache directory:

export EVERGREEN_CACHE_DIR_ROOT=~/.cache/evergreen/

Run tests (which depend on the docs) to ensure correct setup:

uv run mkdocs build
uv run pytest

Data Preparation

Download the Yelp Open Dataset and extract to data/yelp_dataset/. Extract evaluation datasets using DuckDB.

duckdb -c ".read experiments/scripts/yelp_restaurant_reviews/johns_roast_pork.sql"
duckdb -c ".read experiments/scripts/yelp_restaurant_reviews/mcdonalds_mo.sql"
duckdb -c ".read experiments/scripts/yelp_restaurant_reviews/village_whiskey.sql"

Add embeddings:

uv run python -m experiments.scripts.add_embeddings \
    --dataset_path data/yelp_restaurant_reviews/johns_roast_pork.jsonl \
    --fields text

uv run python -m experiments.scripts.add_embeddings \
    --dataset_path data/yelp_restaurant_reviews/mcdonalds_mo.jsonl \
    --fields text

uv run python -m experiments.scripts.add_embeddings \
    --dataset_path data/yelp_restaurant_reviews/village_whiskey.jsonl \
    --fields text

Reproducing Results

Run the full evaluation:

uv run python -m experiments.scripts.run_claim_evaluators \
    --impls base_rm rag_agent \
    --lms claude-opus-4-6 claude-sonnet-4-6 claude-haiku-4-5

uv run python -m experiments.scripts.run_claim_evaluators \
    --impls evg_ref

uv run python -m experiments.scripts.run_claim_evaluators \
    --impls evg_opt \
    --lms claude-opus-4-6 claude-sonnet-4-6 claude-haiku-4-5 llama4-maverick llama4-scout llama3.1-8b

uv run python -m experiments.scripts.run_claim_evaluators \
    --impls evg_unopt \
    --lms claude-opus-4-6

uv run python -m experiments.scripts.run_claim_evaluators \
    --impls evg_abl_no_es evg_abl_no_rs evg_abl_no_est evg_abl_no_fus evg_abl_no_sf evg_abl_no_cache \
    --lms claude-haiku-4-5

uv run python -m experiments.scripts.run_claim_evaluators --eval_sim_filter

Consider moving any existing results in experiments/results/ to a separate directory to avoid overwriting or double counting results.

Generate figures:

uv run python -m experiments.scripts.plot_results

Figures are saved to experiments/figures/.

About

Evergreen: Efficient Verification for Semantic Aggregates

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages