Skip to content

Dioxis34/autonomous-coding-assistant

Repository files navigation

🛠️ Autonomous Coding Assistant

An agentic debugging tool built on Claude's native tool use. You give it a buggy Python file and a plain-English description of the bug; it then autonomously reads the code, searches documentation when it's stuck, writes a fix, runs the test suite, reads the failures, and iterates — looping until the tests pass or it hits a 5-cycle limit. Crucially, Claude decides which tool to call next based on each result; nothing in the code hardcodes a fixed read→fix→test sequence. That open-ended, result-driven control loop — over a real tool surface (code execution, web search, file I/O, pytest) — is what makes this a genuine agent rather than a single prompt, and it's the part worth showing an employer.

How the agent loop works

                         ┌──────────────────────────────────────┐
   buggy file +          │                Streamlit UI           │
   bug description ─────▶ │  upload · live log · diff · results   │
                         └───────────────────┬──────────────────┘
                                             │ background thread
                                             ▼
        ┌─────────────────────── agent loop (agent.py) ───────────────────────┐
        │                                                                      │
        │   ┌─────────────┐  tool_use   ┌──────────────────────────────────┐  │
        │   │   Claude    │ ──────────▶ │  dispatch tool (tools.py)        │  │
        │   │ (Sonnet 4.6)│             │  • execute_code   ┐              │  │
        │   │  + adaptive │             │  • web_search     │→ sandbox.py  │  │
        │   │   thinking  │ ◀────────── │  • read_write_file│  (Docker or  │  │
        │   └─────────────┘ tool_result │  • run_tests      ┘   subprocess)│  │
        │          │                    └──────────────────────────────────┘  │
        │          │ tests pass? ── no ──▶ iterate (up to 5 test cycles)       │
        │          ▼                                                           │
        │     tests pass ✓  /  max iterations reached  /  stopped by user      │
        └──────────────────────────────────────────────────────────────────────┘

Each turn, Claude is given the four tool schemas (tool_schemas.py) and decides what to do. The loop intercepts every tool_use block, executes it, logs it, and feeds the tool_result back — repeating until Claude stops calling tools or the tests go green.

Project layout

File Role
app.py Streamlit UI: setup wizard, inputs, live log, diff/result panel
agent.py The Claude tool-use loop + iteration logic
tools.py The four tool implementations + dispatcher
tool_schemas.py JSON schemas passed to Claude's tools parameter
sandbox.py Docker / subprocess execution backend
Dockerfile The coding-agent-sandbox image (pytest preinstalled)
.env.example Required env vars with placeholder values
requirements.txt Pinned dependencies

Setup

1. Get API keys

  • Anthropic — sign in at https://console.anthropic.com/, open API Keys, and create a key (starts with sk-ant-).
  • Tavily (web search) — sign up at https://tavily.com/ and copy your key from the dashboard (starts with tvly-). The free tier is plenty for a demo.

2. Install

python -m venv .venv
# Windows (PowerShell):  .venv\Scripts\Activate.ps1
# macOS/Linux:           source .venv/bin/activate
pip install -r requirements.txt

3. (Recommended) Build the sandbox image

docker build -t coding-agent-sandbox .

This builds the image the app uses to run code/tests in an isolated, network-disabled container. You can sanity-check it directly:

docker run --rm coding-agent-sandbox python -c "import pytest; print('sandbox OK')"

Docker is optional but strongly recommended. Without it, the app falls back to running code in a plain subprocess on your machine (see the security notice).

4. Run

# With the venv activated:
streamlit run app.py

# Or without activating (Windows):
.venv\Scripts\python.exe -m streamlit run app.py

On first launch you'll see a setup screen: paste your two keys, and they're validated with a live API call and saved to a local .env. You won't see that screen again (use the Reset API keys expander in the sidebar to re-enter them). Then upload or paste a buggy .py file, describe the bug, optionally attach a pytest file, and click Run agent.

When the agent finishes, a Download fixed file button appears below the diff so you can save the fixed version — the agent always works on an isolated copy of your file and never modifies the original.

⚠️ Security notice

This agent executes code on your computer. When the Docker sandbox is available, code and tests run inside an ephemeral, network-isolated container. When Docker is not available, the app falls back to running code in a plain subprocess directly on your host — with no isolation. The model writes and runs this code autonomously.

  • The live log shows every tool call (timestamp, tool, inputs, output) — review them, especially in subprocess-fallback mode (the UI flags it in yellow).
  • Do not point this at sensitive code, secrets, or a machine you can't afford to have arbitrary code run on, without reviewing each tool call first.
  • File writes are confined to a throwaway per-session workspace, but executed code is only as contained as your backend (Docker = isolated; subprocess = not). Prefer Docker.

Built with the Anthropic Python SDK, Streamlit, and Tavily.

About

Agentic Python debugger powered by Claude tool use — diagnoses bugs, writes fixes, runs pytest, and iterates autonomously

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors