Synapse Code Auditor (OpenEnv Environment)

title	Synapse Code Auditor
emoji	⚡
colorFrom	red
colorTo	pink
sdk	docker
app_port	7860
pinned	false
license	mit
short_description	Code reviewer environment

Synapse Code Auditor (OpenEnv Environment)

Synapse Code Auditor is a production-ready OpenEnv environment for evaluating AI code-review behavior across deterministic tasks.

Real-World Use Case

Software teams use code review to catch bugs, improve quality, and enforce maintainability. This environment simulates that workflow so agent performance can be trained and measured with reproducible rewards.

Project Structure

.
├── server/
│   ├── __init__.py
│   └── app.py
├── inference.py
├── pyproject.toml
├── uv.lock
├── validate_submission.py
├── app/
│   ├── __init__.py
│   ├── env.py
│   ├── grader.py
│   ├── inference.py
│   ├── main.py
│   ├── models.py
│   └── tasks.py
├── .dockerignore
├── Dockerfile
├── openenv.yaml
├── README.md
└── requirements.txt

OpenEnv Interface

This environment implements the required interface methods in the core environment class:

reset(task_id=None, seed=42)
step(action)
state()

FastAPI endpoints expose the same functionality:

POST /reset
POST /step
GET /state

Action and Observation Schemas

Observation:

{
  "task_id": "easy",
  "task_type": "easy",
  "code": "def add_numbers(a, b)\n    result = a + b\n    return result\n",
  "instructions": "Identify the exact syntax error in the code and propose a valid fix.",
  "previous_feedback": null
}

Action:

{
  "review": "The function definition is missing a colon. Change it to def add_numbers(a, b):"
}

Reward:

{
  "score": 0.99,
  "matched_criteria": ["syntax error", "missing colon", "def add_numbers(a, b):"],
  "missed_criteria": [],
  "penalties": {},
  "rationale": "Coverage=0.99; Penalty=0.00; Matched=3/3"
}

Scores are always strictly between 0 and 1 (implemented as 0.01 … 0.99), never exactly 0.0 or 1.0.

Tasks

Easy: Detect syntax errors
Medium: Suggest optimization
Hard: Full code review with scoring

Each task has deterministic grading criteria; rewards are clamped to (0, 1) (here 0.01–0.99).

Deterministic Grader and Reward Design

Deterministic keyword/criterion matching for each task.
Partial reward from criterion coverage.
Penalties for short, irrelevant, or incorrect responses.
Hard task penalizes missing overall score/rating in the review.

Reward formula:

raw = criterion_coverage - penalties
score = clamp(raw, 0.01, 0.99)

Local Setup

1. Create environment and install dependencies

python -m venv .venv
. .venv/Scripts/activate
pip install -r requirements.txt

2. Run FastAPI server

Requires 3+ tasks with graders; each reward score must lie strictly in (0, 1).

uvicorn app.main:app --host 0.0.0.0 --port 7860

Docker Run

docker build -t synapse-code-auditor .
docker run --rm -p 7860:7860 synapse-code-auditor

Then test:

curl http://localhost:7860/health
curl -X POST http://localhost:7860/reset -H "Content-Type: application/json" -d "{\"task_id\":\"easy\",\"seed\":42}"

Baseline Inference Script

The baseline runner is inference.py at the project root. It uses the OpenAI client and executes all 3 tasks.

Required environment variables for non-dry-run execution:

LLM_PROVIDER: set to groq for Groq models (default) or meta for Meta-compatible endpoints
GROQ_API_KEY: API key used by the client when LLM_PROVIDER=groq
GROQ_API_BASE_URL: Groq-compatible OpenAI base URL (defaults to https://api.groq.com/openai/v1)
GROQ_MODEL: model identifier for Groq (defaults to llama-3.3-70b-versatile)
META_API_KEY: API key used by the client when LLM_PROVIDER=meta
META_API_BASE_URL: Meta-compatible OpenAI base URL (defaults to https://api.openai.com/v1)
META_MODEL: model identifier for Meta (defaults to gpt-4o-mini)

Optional:

ENV_BASE_URL: environment API base URL (default http://localhost:7860)
DRY_RUN: set to 1 to skip external LLM calls and use deterministic local responses
API_BASE_URL: defaults to https://api.openai.com/v1 when not provided
MODEL_NAME: defaults to gpt-4o-mini when not provided

Structured logs:

[START] one event at run start
[STEP] one event per task
[END] one event at run finish

Run:

python inference.py

Required env var example

export LLM_PROVIDER="groq"
export GROQ_API_KEY="your_token_here"
export GROQ_API_BASE_URL="https://api.groq.com/openai/v1"
export GROQ_MODEL="llama-3.3-70b-versatile"
export ENV_BASE_URL="http://localhost:7860"
python inference.py

If you want a Meta-compatible endpoint instead, set LLM_PROVIDER="meta" and use META_API_KEY, META_API_BASE_URL, and META_MODEL.

Pre-submission validator

python validate_submission.py

Validator checks:

OpenEnv manifest keys
/health, /reset, /step, /state endpoint behavior
3+ tasks with graders; each score strictly in (0, 1) (e.g. 0.01–0.99)
root inference.py requirements and structured logs

Example Baseline Output

[START] {"event":"run_started","env_base_url":"inprocess","task_count":3,"dry_run":true}
[STEP] {"event":"task_completed","index":1,"task_id":"easy","score":0.99}
[STEP] {"event":"task_completed","index":2,"task_id":"medium","score":0.99}
[STEP] {"event":"task_completed","index":3,"task_id":"hard","score":0.86}
[END] {"event":"run_finished","average_score":0.9467,"status":"ok"}

Hugging Face Spaces Deployment

This project is container-ready and compatible with Hugging Face Docker Spaces.

Push repository to Hugging Face Space configured for Docker.
Ensure port 7860 is exposed (already set in Dockerfile and openenv.yaml).
Build and run automatically in Space.

OpenEnv deployment command:

openenv push

API Reference

GET /health
POST /reset with payload {"task_id": "easy|medium|hard", "seed": 42}
POST /step with payload {"action": {"review": "..."}}
GET /state

This environment is ready for end-to-end local execution, Docker execution, and OpenEnv workflow validation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synapse Code Auditor (OpenEnv Environment)

Real-World Use Case

Project Structure

OpenEnv Interface

Action and Observation Schemas

Tasks

Deterministic Grader and Reward Design

Local Setup

1. Create environment and install dependencies

2. Run FastAPI server

Docker Run

Baseline Inference Script

Required env var example

Pre-submission validator

Example Baseline Output

Hugging Face Spaces Deployment

API Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github		.github
__pycache__		__pycache__
app		app
client		client
server		server
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README-LANDING.md		README-LANDING.md
README.md		README.md
graders.json		graders.json
infer_out.txt		infer_out.txt
inference.py		inference.py
openenv.yaml		openenv.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock
validate_submission.py		validate_submission.py

Folders and files

Latest commit

History

Repository files navigation

Synapse Code Auditor (OpenEnv Environment)

Real-World Use Case

Project Structure

OpenEnv Interface

Action and Observation Schemas

Tasks

Deterministic Grader and Reward Design

Local Setup

1. Create environment and install dependencies

2. Run FastAPI server

Docker Run

Baseline Inference Script

Required env var example

Pre-submission validator

Example Baseline Output

Hugging Face Spaces Deployment

API Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages