CloudEval

CloudEval is a CLI for running model evals, comparing models, and generating shareable reports.

It is designed for:

Cloudflare dogfooding
public feedback loops
small, opinionated team evals
eventually, a broader OSS audience

Why this exists

When you want to compare a model like workers-ai/@cf/zai-org/glm-5.1 against a baseline, you should be able to:

run the same dataset against both models
score the outputs consistently
generate a report your team can read quickly
explain the result in plain English
share the run in Braintrust when needed

CloudEval does that.

Quick start

cd cloudeval
cp .env.example .env
source ~/.nvm/nvm.sh && nvm use 22
node ./bin/cloudeval.mjs doctor
node ./bin/cloudeval.mjs run --dataset agent-quality --models workers-ai/@cf/zai-org/glm-5.1,baseline

Commands

cloudeval doctor — validate Node, config, and env
cloudeval init — scaffold a starter config and sample datasets
cloudeval run — run an eval locally and write a JSON result
cloudeval report — render a JSON result as markdown
cloudeval explain — turn a JSON result into a plain-English summary
cloudeval compare — compare two result files
cloudeval run --braintrust — generate and execute Braintrust evals

Example

node ./bin/cloudeval.mjs run \
  --dataset agent-quality \
  --models workers-ai/@cf/zai-org/glm-5.1,baseline \
  --braintrust

That will:

generate Braintrust eval scripts
run the task model(s)
score the outputs
write a shareable summary to .cloudeval/braintrust/

Config

CloudEval looks for evals.config.mjs. If it is missing, it falls back to the built-in Cloudflare preset.

Relevant env vars:

CLOUDFLARE_ACCOUNT_ID
CLOUDFLARE_API_TOKEN
BRAINTRUST_API_KEY

Extending CloudEval

To add a dataset:

create a file under src/datasets/
export { name, rows }
reference it from evals.config.mjs

To add a scorer:

add a rubric in src/scorers/registry.mjs
wire it into the runner/generator
add a test

To add a provider:

add an adapter under src/providers/
keep the provider boundary thin
preserve the local/reporting flow

Architecture

src/cli.mjs — command entrypoint
src/runners/ — local eval execution
src/report/ — markdown + explanation output
src/providers/ — model/provider adapters
src/scorers/ — judging logic and rubrics
src/braintrust/ — Braintrust script generation
src/datasets/ — sample datasets
src/presets/ — Cloudflare and generic presets

Testing

node --test

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
bin		bin
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
evals.config.mjs		evals.config.mjs
justfile		justfile
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CloudEval

Why this exists

Quick start

Commands

Example

Config

Extending CloudEval

Architecture

Testing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CloudEval

Why this exists

Quick start

Commands

Example

Config

Extending CloudEval

Architecture

Testing

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages