Go-based evaluation orchestrator for running Mix Eval tasks using browser automation agents.
Mix-Eval-Go orchestrates agent evaluations by:
- Fetching tasks from Convex evaluation platform
- Creating browser sessions with cloud providers (optional)
- Executing tasks via Mix Agent with SSE streaming
- Collecting tool calls and execution history
- Evaluating results with Claude judge
- Submitting results back to Convex
- Mix Agent running at
http://localhost:8088(see Mix Agent Setup) - Convex Database with evaluation API (deployment URL + secret key)
- (Optional) Cloud browser provider API keys (Browserbase, Brightdata, Hyperbrowser, Anchor)
# Install dependencies
task install
# Install CLI globally
task install-cliCreate .env file (auto-loads on startup):
cp .env.example .envRequired variables:
CONVEX_URL- Convex deployment URLCONVEX_SECRET_KEY- Convex API secret key
Optional:
MIX_AGENT_URL- Mix Agent URL (default:http://localhost:8088)BROWSERBASE_API_KEY,BRIGHTDATA_USER, etc. - Cloud browser credentials
# Run single task by ID
mix-eval-go --dataset PostHog_Cleaned_020226 --task-id 93046
# Run task range by index
mix-eval-go --dataset PostHog_Cleaned_020226 --start-index 0 --end-index 9 --parallel 3
# Run with cloud browser provider
mix-eval-go --dataset PostHog_Cleaned_020226 --task-id 93046 --browser-provider browserbaseOptions:
--dataset- Dataset name (required)--task-id- Run specific task by ID--start-index,--end-index- Run task range--parallel- Number of parallel tasks (default: 3)--browser-provider- Cloud browser (browserbase, brightdata, hyperbrowser, anchor)--run-id- Custom run identifier--model- Override LLM model--max-steps- Maximum steps per task
task dev # Development hot reload
task build # Build the binary
task install-cli # Install CLI to ~/go/bin
task test # Run tests with race detection
task test-e2e # Run end-to-end tests (requires Mix Agent)
task tail-dev-log # View dev server logs
task clean # Clean build artifacts
task lint # Run linters
task fmt # Format code
task --list-all # Show all available tasksDuring development, use the auto-built local binary instead of reinstalling globally:
# task dev auto-rebuilds bin/mix-eval-go on file changes
./bin/mix-eval-go --dataset foo --task-id 123
# Optional: Create alias for convenience
alias mix-eval-go="./bin/mix-eval-go"Only use task install-cli when deploying or using the CLI outside this project. The dev server (task dev) hot-reloads changes automatically to bin/mix-eval-go.
mix-eval-go/
├── cmd/mix-eval-go/ # CLI entry point
├── pkg/
│ ├── orchestrator/ # Task orchestration & SSE streaming
│ ├── convex/ # Convex database client
│ └── providers/ # Browser provider implementations
└── test/e2e/ # End-to-end tests
E2E tests require Mix Agent running at localhost:8088:
# Run all tests
task test-all
# Run only e2e tests
task test-e2eTests use //go:build e2e tag and verify the complete workflow with zero mocking.
- Fetch tasks from Convex
- Create browser session (if cloud provider specified)
- Create Mix Agent session
- Stream SSE events in background (manual HTTP due to SDK bug)
- Send task to Mix Agent
- Collect tool calls and screenshots
- Extract execution history
- Judge evaluates completion
- Upload screenshots to Convex
- Submit results to Convex
- Browserbase - High-quality managed browsers
- Brightdata - Global proxy network with browsers
- Hyperbrowser - Stealth browsing capabilities
- Anchor Browser - Mobile and desktop with captcha solving
Mix-Eval-Go is part of a unified evaluation platform with multiple runners:
- evaluation-platform - Shared Convex backend + UI for all runners
- manus-eval (Python) - Evaluates Manus agent (tool-based execution)
- mix-eval-go (Go) - This repository, evaluates Mix Agent
- evaluations-internal (Python) - Original framework for browser-use agent
All runners share task definitions and submit results to the same platform for comparison.
See docs/ for detailed documentation on authentication, GitHub Actions integration, and architecture.
github.com/recreate-run/mix-go-sdk v0.2.1- Mix SDK clientgithub.com/joho/godotenv v1.5.1- Environment variable loading- Go standard library
Proprietary