| title | Synapse Code Auditor |
|---|---|
| emoji | ⚡ |
| colorFrom | red |
| colorTo | pink |
| sdk | docker |
| app_port | 7860 |
| pinned | false |
| license | mit |
| short_description | Code reviewer environment |
Synapse Code Auditor is a production-ready OpenEnv environment for evaluating AI code-review behavior across deterministic tasks.
Software teams use code review to catch bugs, improve quality, and enforce maintainability. This environment simulates that workflow so agent performance can be trained and measured with reproducible rewards.
.
├── server/
│ ├── __init__.py
│ └── app.py
├── inference.py
├── pyproject.toml
├── uv.lock
├── validate_submission.py
├── app/
│ ├── __init__.py
│ ├── env.py
│ ├── grader.py
│ ├── inference.py
│ ├── main.py
│ ├── models.py
│ └── tasks.py
├── .dockerignore
├── Dockerfile
├── openenv.yaml
├── README.md
└── requirements.txt
This environment implements the required interface methods in the core environment class:
- reset(task_id=None, seed=42)
- step(action)
- state()
FastAPI endpoints expose the same functionality:
- POST /reset
- POST /step
- GET /state
Observation:
{
"task_id": "easy",
"task_type": "easy",
"code": "def add_numbers(a, b)\n result = a + b\n return result\n",
"instructions": "Identify the exact syntax error in the code and propose a valid fix.",
"previous_feedback": null
}Action:
{
"review": "The function definition is missing a colon. Change it to def add_numbers(a, b):"
}Reward:
{
"score": 0.99,
"matched_criteria": ["syntax error", "missing colon", "def add_numbers(a, b):"],
"missed_criteria": [],
"penalties": {},
"rationale": "Coverage=0.99; Penalty=0.00; Matched=3/3"
}Scores are always strictly between 0 and 1 (implemented as 0.01 … 0.99), never exactly 0.0 or 1.0.
- Easy: Detect syntax errors
- Medium: Suggest optimization
- Hard: Full code review with scoring
Each task has deterministic grading criteria; rewards are clamped to (0, 1) (here 0.01–0.99).
- Deterministic keyword/criterion matching for each task.
- Partial reward from criterion coverage.
- Penalties for short, irrelevant, or incorrect responses.
- Hard task penalizes missing overall score/rating in the review.
Reward formula:
raw = criterion_coverage - penalties
score = clamp(raw, 0.01, 0.99)
python -m venv .venv
. .venv/Scripts/activate
pip install -r requirements.txtRequires 3+ tasks with graders; each reward score must lie strictly in (0, 1).
uvicorn app.main:app --host 0.0.0.0 --port 7860docker build -t synapse-code-auditor .
docker run --rm -p 7860:7860 synapse-code-auditorThen test:
curl http://localhost:7860/health
curl -X POST http://localhost:7860/reset -H "Content-Type: application/json" -d "{\"task_id\":\"easy\",\"seed\":42}"The baseline runner is inference.py at the project root. It uses the OpenAI client and executes all 3 tasks.
Required environment variables for non-dry-run execution:
- LLM_PROVIDER: set to
groqfor Groq models (default) ormetafor Meta-compatible endpoints - GROQ_API_KEY: API key used by the client when
LLM_PROVIDER=groq - GROQ_API_BASE_URL: Groq-compatible OpenAI base URL (defaults to
https://api.groq.com/openai/v1) - GROQ_MODEL: model identifier for Groq (defaults to
llama-3.3-70b-versatile) - META_API_KEY: API key used by the client when
LLM_PROVIDER=meta - META_API_BASE_URL: Meta-compatible OpenAI base URL (defaults to
https://api.openai.com/v1) - META_MODEL: model identifier for Meta (defaults to
gpt-4o-mini)
Optional:
- ENV_BASE_URL: environment API base URL (default http://localhost:7860)
- DRY_RUN: set to
1to skip external LLM calls and use deterministic local responses - API_BASE_URL: defaults to https://api.openai.com/v1 when not provided
- MODEL_NAME: defaults to gpt-4o-mini when not provided
Structured logs:
[START]one event at run start[STEP]one event per task[END]one event at run finish
Run:
python inference.pyexport LLM_PROVIDER="groq"
export GROQ_API_KEY="your_token_here"
export GROQ_API_BASE_URL="https://api.groq.com/openai/v1"
export GROQ_MODEL="llama-3.3-70b-versatile"
export ENV_BASE_URL="http://localhost:7860"
python inference.pyIf you want a Meta-compatible endpoint instead, set LLM_PROVIDER="meta" and use META_API_KEY, META_API_BASE_URL, and META_MODEL.
python validate_submission.pyValidator checks:
- OpenEnv manifest keys
/health,/reset,/step,/stateendpoint behavior- 3+ tasks with graders; each score strictly in (0, 1) (e.g. 0.01–0.99)
- root
inference.pyrequirements and structured logs
[START] {"event":"run_started","env_base_url":"inprocess","task_count":3,"dry_run":true}
[STEP] {"event":"task_completed","index":1,"task_id":"easy","score":0.99}
[STEP] {"event":"task_completed","index":2,"task_id":"medium","score":0.99}
[STEP] {"event":"task_completed","index":3,"task_id":"hard","score":0.86}
[END] {"event":"run_finished","average_score":0.9467,"status":"ok"}
This project is container-ready and compatible with Hugging Face Docker Spaces.
- Push repository to Hugging Face Space configured for Docker.
- Ensure port 7860 is exposed (already set in Dockerfile and openenv.yaml).
- Build and run automatically in Space.
OpenEnv deployment command:
openenv push- GET /health
- POST /reset with payload {"task_id": "easy|medium|hard", "seed": 42}
- POST /step with payload {"action": {"review": "..."}}
- GET /state
This environment is ready for end-to-end local execution, Docker execution, and OpenEnv workflow validation.