Rust Coder: Systems Engineering Environment

title

Rust Coder OpenEnv

emoji

🦀

colorFrom

red

colorTo

yellow

sdk

docker

app_port

8000

base_path

/web

pinned

false

Rust Coder: Systems Engineering Environment

Rust Coder is a high-fidelity OpenEnv environment designed to evaluate and train LLM agents on real-world Rust systems programming tasks. Unlike toy environments, Rust Coder simulates valid engineering scenarios involving the borrow checker, concurrency, and memory safety.

Motivation

Rust is uniquely challenging for AI agents due to its strict compile-time safety guarantees. This environment provides a 10-task progression that measures an agent's ability to:

Fix borrow checker violations
Correctly annotate lifetimes
Resolve concurrency deadlocks
Write unsafe FFI code correctly
Identify and prevent memory leaks
Optimize data pipelines for performance

Action Space

Type: RustCoderAction

The agent submits a single string containing the complete, fixed Rust source code.

Field	Type	Description
`code`	string	Full Rust source code to compile and test

Observation Space

Type: RustCoderObservation

The environment returns detailed feedback after each submission:

Field	Type	Description
`problem_description`	string	Task requirements and context
`header_section`	string	LeetCode-style scaffold (imports + signatures/types)
`compilation_success`	bool	Whether `rustc` compiled the submitted code
`compilation_output`	string	Raw compiler errors and warnings
`test_results`	list[dict]	Per-test pass/fail results with error details
`reward_breakdown`	dict	Weighted score breakdown across 5 dimensions

Reward Function

Total reward is a weighted sum of 5 dimensions, each normalized to [0, 1]:

Dimension	Weight	Metric
Compilation	40%	Binary success/failure of `rustc`
Correctness	20%	Fraction of test assertions that pass
Coverage	20%	Fraction of tests that successfully ran
Elegance	10%	Code quality heuristics (avoids `.unwrap()`, long lines, `unsafe`)
Efficiency	10%	Execution time vs. per-problem baseline

Reward provides partial signal at every step — compilation alone earns 0.40, passing all tests earns up to 1.0.

Tasks

10 sequential problems with increasing difficulty:

ID	Title	Difficulty	Skill Evaluated
1	Broken CLI Argument Parser	Easy	Enums & pattern matching
2	Conflicting Borrows	Easy→Med	Borrow checker
3	Invalid Lifetime Annotations	Medium	Lifetime annotations
4	Business Logic Errors	Medium	Math & correctness
5	Linked List Management	Medium	Ownership & data structures
6	Multi-threaded Deadlocks	Hard	Mutex & concurrency
7	Async Borrowing Conflicts	Hard	Async/await lifetimes
8	Unsafe FFI Integration	Hard	`unsafe` & C interop
9	Inefficient Data Pipeline	Hard	Performance optimization
10	Memory Leak Prevention	Hard+	Weak pointers & ownership

Environment Variables / Secrets

The environment reads the following variables. Set them as HF Space secrets (Settings → Variables and Secrets) when deploying to Hugging Face, or in a local .env file for development.

Variable	Required	Default	Description
`HF_TOKEN`	Yes	—	Hugging Face API token for LLM calls
`API_BASE_URL`	No	`https://router.huggingface.co/v1`	Inference endpoint
`MODEL_NAME`	No	`Qwen/Qwen2.5-72B-Instruct`	Model to use for evaluation

Note: The .env file is excluded from Docker images by .dockerignore. On HF Spaces, secrets are injected as OS environment variables by the platform — load_dotenv() silently does nothing if no file is present, and os.getenv() reads from the platform-injected vars. This is the correct behavior.

Setup & Usage

Local Development

# 1. Clone and enter the repo
git clone https://github.com/your-username/rust_coder
cd rust_coder

# 2. Create .env with your credentials
cat > .env << EOF
HF_TOKEN=hf_your_token_here
API_BASE_URL=https://router.huggingface.co/v1
MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
EOF

# 3. Build the Docker image (uses root Dockerfile)
docker build -t rust_coder:latest .

# 4. Run the environment server
docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest

# 5. Verify it's healthy
curl http://localhost:8000/health
# → {"status": "healthy"}

# 6. Run the inference benchmark
python inference.py

Docker Commands Reference

# Build
docker build -t rust_coder:latest .

# Run with .env file
docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest

# View logs
docker logs rust_env

# Stop
docker stop rust_env

Environment API

# Reset (returns first problem)
curl -X POST http://localhost:8000/reset

# Step (submit Rust code)
curl -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action": {"code": "fn main() { println!(\"hello\"); }"}}'

# Health check
curl http://localhost:8000/health

HF Spaces Deployment

# Install HF CLI
pip install huggingface_hub

# Login
huggingface-cli login

# Push to Space
openenv push --repo-id your-username/rust-coder

Then go to your Space settings and add secrets:

HF_TOKEN → your Hugging Face API token
MODEL_NAME → e.g. Qwen/Qwen2.5-72B-Instruct

Baseline Scores

Baseline using Qwen/Qwen2.5-72B-Instruct via Hugging Face router:

Metric	Score
Average reward	0.59
Compilation %	~85%
Correctness %	~45%

Project Structure

rust_coder/
├── Dockerfile                     # Root Dockerfile (used by validator + HF Spaces)
├── server/Dockerfile              # Identical copy (used for -f flag builds)
├── openenv.yaml                   # OpenEnv spec metadata
├── pyproject.toml                 # Python package config
├── uv.lock                        # Locked dependencies
├── problems.json                  # 10 coding problems dataset
├── models.py                      # Pydantic action/observation types
├── client.py                      # WebSocket client for RustCoderEnv
├── inference.py                   # Baseline inference script (entry point)
├── __init__.py                    # Package exports
└── server/
    ├── app.py                     # FastAPI OpenEnv server entrypoint
    └── rust_coder_environment.py  # Core environment logic

HF Space runtime model

The Hugging Face Space serves the environment via uvicorn server.app:app (see openenv.yaml and Dockerfile).
The built-in OpenEnv web UI may send an empty action on Step; this environment supports that by auto-calling the LLM when action.code is empty (unless disabled via AUTO_LLM_ON_EMPTY_STEP=0).
inference.py is the required baseline runner used by the validator/judge. It connects to the running Space and drives reset()/step() in a loop, emitting strict [START]/[STEP]/[END] stdout lines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rust Coder: Systems Engineering Environment

Motivation

Action Space

Observation Space

Reward Function

Tasks

Environment Variables / Secrets

Setup & Usage

Local Development

Docker Commands Reference

Environment API

HF Spaces Deployment

Baseline Scores

Project Structure

HF Space runtime model

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
openenv_rust_coder.egg-info		openenv_rust_coder.egg-info
outputs		outputs
scripts		scripts
server		server
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
client.py		client.py
inference.py		inference.py
models.py		models.py
openenv.yaml		openenv.yaml
problems.json		problems.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Rust Coder: Systems Engineering Environment

Motivation

Action Space

Observation Space

Reward Function

Tasks

Environment Variables / Secrets

Setup & Usage

Local Development

Docker Commands Reference

Environment API

HF Spaces Deployment

Baseline Scores

Project Structure

HF Space runtime model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages