Skip to content

coderMayank69/Synapse-Code-Auditor

Repository files navigation

title Synapse Code Auditor
emoji
colorFrom red
colorTo pink
sdk docker
app_port 7860
pinned false
license mit
short_description Code reviewer environment

Synapse Code Auditor (OpenEnv Environment)

Synapse Code Auditor is a production-ready OpenEnv environment for evaluating AI code-review behavior across deterministic tasks.

Real-World Use Case

Software teams use code review to catch bugs, improve quality, and enforce maintainability. This environment simulates that workflow so agent performance can be trained and measured with reproducible rewards.

Project Structure

.
├── server/
│   ├── __init__.py
│   └── app.py
├── inference.py
├── pyproject.toml
├── uv.lock
├── validate_submission.py
├── app/
│   ├── __init__.py
│   ├── env.py
│   ├── grader.py
│   ├── inference.py
│   ├── main.py
│   ├── models.py
│   └── tasks.py
├── .dockerignore
├── Dockerfile
├── openenv.yaml
├── README.md
└── requirements.txt

OpenEnv Interface

This environment implements the required interface methods in the core environment class:

  • reset(task_id=None, seed=42)
  • step(action)
  • state()

FastAPI endpoints expose the same functionality:

  • POST /reset
  • POST /step
  • GET /state

Action and Observation Schemas

Observation:

{
  "task_id": "easy",
  "task_type": "easy",
  "code": "def add_numbers(a, b)\n    result = a + b\n    return result\n",
  "instructions": "Identify the exact syntax error in the code and propose a valid fix.",
  "previous_feedback": null
}

Action:

{
  "review": "The function definition is missing a colon. Change it to def add_numbers(a, b):"
}

Reward:

{
  "score": 0.99,
  "matched_criteria": ["syntax error", "missing colon", "def add_numbers(a, b):"],
  "missed_criteria": [],
  "penalties": {},
  "rationale": "Coverage=0.99; Penalty=0.00; Matched=3/3"
}

Scores are always strictly between 0 and 1 (implemented as 0.01 … 0.99), never exactly 0.0 or 1.0.

Tasks

  1. Easy: Detect syntax errors
  2. Medium: Suggest optimization
  3. Hard: Full code review with scoring

Each task has deterministic grading criteria; rewards are clamped to (0, 1) (here 0.01–0.99).

Deterministic Grader and Reward Design

  • Deterministic keyword/criterion matching for each task.
  • Partial reward from criterion coverage.
  • Penalties for short, irrelevant, or incorrect responses.
  • Hard task penalizes missing overall score/rating in the review.

Reward formula:

raw = criterion_coverage - penalties
score = clamp(raw, 0.01, 0.99)

Local Setup

1. Create environment and install dependencies

python -m venv .venv
. .venv/Scripts/activate
pip install -r requirements.txt

2. Run FastAPI server

Requires 3+ tasks with graders; each reward score must lie strictly in (0, 1).

uvicorn app.main:app --host 0.0.0.0 --port 7860

Docker Run

docker build -t synapse-code-auditor .
docker run --rm -p 7860:7860 synapse-code-auditor

Then test:

curl http://localhost:7860/health
curl -X POST http://localhost:7860/reset -H "Content-Type: application/json" -d "{\"task_id\":\"easy\",\"seed\":42}"

Baseline Inference Script

The baseline runner is inference.py at the project root. It uses the OpenAI client and executes all 3 tasks.

Required environment variables for non-dry-run execution:

  • LLM_PROVIDER: set to groq for Groq models (default) or meta for Meta-compatible endpoints
  • GROQ_API_KEY: API key used by the client when LLM_PROVIDER=groq
  • GROQ_API_BASE_URL: Groq-compatible OpenAI base URL (defaults to https://api.groq.com/openai/v1)
  • GROQ_MODEL: model identifier for Groq (defaults to llama-3.3-70b-versatile)
  • META_API_KEY: API key used by the client when LLM_PROVIDER=meta
  • META_API_BASE_URL: Meta-compatible OpenAI base URL (defaults to https://api.openai.com/v1)
  • META_MODEL: model identifier for Meta (defaults to gpt-4o-mini)

Optional:

  • ENV_BASE_URL: environment API base URL (default http://localhost:7860)
  • DRY_RUN: set to 1 to skip external LLM calls and use deterministic local responses
  • API_BASE_URL: defaults to https://api.openai.com/v1 when not provided
  • MODEL_NAME: defaults to gpt-4o-mini when not provided

Structured logs:

  • [START] one event at run start
  • [STEP] one event per task
  • [END] one event at run finish

Run:

python inference.py

Required env var example

export LLM_PROVIDER="groq"
export GROQ_API_KEY="your_token_here"
export GROQ_API_BASE_URL="https://api.groq.com/openai/v1"
export GROQ_MODEL="llama-3.3-70b-versatile"
export ENV_BASE_URL="http://localhost:7860"
python inference.py

If you want a Meta-compatible endpoint instead, set LLM_PROVIDER="meta" and use META_API_KEY, META_API_BASE_URL, and META_MODEL.

Pre-submission validator

python validate_submission.py

Validator checks:

  • OpenEnv manifest keys
  • /health, /reset, /step, /state endpoint behavior
  • 3+ tasks with graders; each score strictly in (0, 1) (e.g. 0.01–0.99)
  • root inference.py requirements and structured logs

Example Baseline Output

[START] {"event":"run_started","env_base_url":"inprocess","task_count":3,"dry_run":true}
[STEP] {"event":"task_completed","index":1,"task_id":"easy","score":0.99}
[STEP] {"event":"task_completed","index":2,"task_id":"medium","score":0.99}
[STEP] {"event":"task_completed","index":3,"task_id":"hard","score":0.86}
[END] {"event":"run_finished","average_score":0.9467,"status":"ok"}

Hugging Face Spaces Deployment

This project is container-ready and compatible with Hugging Face Docker Spaces.

  1. Push repository to Hugging Face Space configured for Docker.
  2. Ensure port 7860 is exposed (already set in Dockerfile and openenv.yaml).
  3. Build and run automatically in Space.

OpenEnv deployment command:

openenv push

API Reference

  • GET /health
  • POST /reset with payload {"task_id": "easy|medium|hard", "seed": 42}
  • POST /step with payload {"action": {"review": "..."}}
  • GET /state

This environment is ready for end-to-end local execution, Docker execution, and OpenEnv workflow validation.

About

CODE THAT WORKS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors