Bitbench

A donation-funded benchmark platform for comparing AI models on Bitcoin-related tasks. Users donate BSV to fund benchmark runs, and results are published transparently.

Overview

The project consists of two main parts:

bench: CLI tool for running benchmarks against 40+ AI models
visualizer: Next.js web app for viewing results and donating to fund benchmarks

How It Works

Funding Flow

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   FUNDING   │ ──▶ │   PENDING   │ ──▶ │   RUNNING   │ ──▶ │  COMPLETED  │
│             │     │             │     │             │     │             │
│ Users donate│     │ Goal reached│     │ Admin runs  │     │ Results     │
│ BSV to suite│     │ awaiting run│     │ benchmarks  │     │ published   │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

Fund: Users donate BSV to test suite addresses via the visualizer
Pending: When funding goal is reached, suite enters pending state
Run: Admin checks funding status and runs benchmarks locally
Publish: Results are committed to repo and deployed to visualizer

Address Derivation

Each test suite has a unique donation address derived using Type 42 key derivation:

Master WIF → Suite ID → Deterministic BSV address
Same master key always produces the same addresses
Suite ID as invoice number ensures addresses survive renames

Test Suites

Suite	Description	Est. Cost
Bitcoin SPV & Data Protocols	OP_RETURN, Ordinals, Runes, BRC-20	~$30
Bitcoin Script & Transactions	Bitcoin Script, SegWit, Taproot	~$30
Bitcoin Libraries	@bsv/sdk, bitcoinjs-lib, etc.	~$30
Bitcoin Parsing	Transaction and block parsing	~$30
Protocol Parsing	MAP, AIP, B protocol parsing	~$35
sCrypt Smart Contracts	sCrypt language and tooling	~$30
Stratum Mining Protocol	Mining pool protocol	~$30
Type 42 Key Derivation	BIP-42 style key derivation	~$30

Getting Started

Prerequisites

Bun runtime
OpenRouter API key (all models routed through OpenRouter)
Master WIF for donation address derivation

Running the CLI

cd bench
bun install
cp .env.example .env  # Add your API keys

# Interactive CLI with funding checks
bun run cli

# Quick funding status check
bun run funding

# Force run (bypass funding check)
bun run cli --force

CLI Features

The CLI provides:

Main Menu: Choose between viewing funding status or running benchmarks
Funding Status Table: See all suites with their funding progress
Unfunded Protection: Warning when selecting unfunded suites
Real-time Progress: Live benchmark progress with model stats
Force Flag: --force bypasses funding requirements

Running the Visualizer

cd visualizer
bun install
bun dev

Environment Variables

Visualizer (Vercel)

KV_REST_API_URL=         # Upstash Redis REST URL
KV_REST_API_TOKEN=       # Upstash Redis REST token
MASTER_WIF=              # Master private key for address derivation

Bench CLI

OPENROUTER_API_KEY=      # All models routed through OpenRouter
MASTER_WIF=              # Same master key as visualizer for address matching

Architecture

Key Components

Wallet Integration: Yours Wallet Provider for BSV donations
Address Derivation: Type 42 using @bsv/sdk (one address per suite)
Data Storage: Upstash Redis for suites, donations, and run metadata
Balance Checking: WhatsOnChain API
Benchmark Results: Static JSON committed to repo

Directory Structure

bitbench/
├── bench/
│   ├── cli.tsx           # Interactive CLI with Ink
│   ├── index.ts          # Benchmark engine
│   ├── funding.ts        # Funding status module
│   ├── check-funding.ts  # Quick funding check script
│   ├── constants.ts      # Model definitions
│   └── tests/            # Test suite JSON files
├── visualizer/
│   ├── app/              # Next.js pages
│   ├── components/       # React components
│   ├── lib/              # Business logic
│   │   ├── addresses.ts  # Type 42 derivation
│   │   ├── suites.ts     # Suite management
│   │   ├── kv.ts         # Redis helpers
│   │   └── types.ts      # TypeScript types
│   └── data/
│       └── benchmark-results.json
└── README.md

Admin Workflow

1. Check Funding Status

cd bench
bun run funding

This shows a table of all suites with their:

Donation address
Current balance (USD)
Funding goal
Progress bar
Funded status (✓/✗)

2. Run Funded Benchmarks

bun run cli

Select "🚀 Run Benchmark"
Choose a funded suite (marked with ✓)
Enter version label (defaults to YYYY-MM-DD)
Watch live progress

If you select an unfunded suite, you'll see a warning with the option to proceed anyway.

3. Publish Results

After a benchmark run:

# Copy results to visualizer
cp bench/results/[suite-id]/[version]/results.json visualizer/data/benchmark-results.json

# Commit and push
git add .
git commit -m "Add [suite-name] benchmark results [version]"
git push

Results are automatically deployed to the visualizer on push.

Contributing

Contributions welcome. Please open an issue first to discuss proposed changes.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
bench		bench
visualizer		visualizer
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
analyze_costs.js		analyze_costs.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bitbench

Overview

How It Works

Funding Flow

Address Derivation

Test Suites

Getting Started

Prerequisites

Running the CLI

CLI Features

Running the Visualizer

Environment Variables

Visualizer (Vercel)

Bench CLI

Architecture

Key Components

Directory Structure

Admin Workflow

1. Check Funding Status

2. Run Funded Benchmarks

3. Publish Results

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

b-open-io/bitbench

Folders and files

Latest commit

History

Repository files navigation

Bitbench

Overview

How It Works

Funding Flow

Address Derivation

Test Suites

Getting Started

Prerequisites

Running the CLI

CLI Features

Running the Visualizer

Environment Variables

Visualizer (Vercel)

Bench CLI

Architecture

Key Components

Directory Structure

Admin Workflow

1. Check Funding Status

2. Run Funded Benchmarks

3. Publish Results

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages