Skip to content

chunga7879/ai-roundtable

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Roundtable

Claude, GPT, and Gemini deliberate together — debating requirements, architecture, and implementation across 7 structured rounds — while the developer stays in control at every decision point.


Why This Exists

When using AI to write code, I kept running into the same problem: the answer depends on which AI you ask.

Claude catches a security issue that GPT missed. GPT suggests a library that Gemini flags as deprecated. Gemini's architecture is clean but Claude's error handling is more robust. There's no single AI that's always right — each has different training data, different strengths, and fundamentally produces outputs by sampling from a probability distribution. Ask the same question twice and you may get different answers.

AI Roundtable's answer: run all three and let the best answer win.

Each AI speaks, sees what the others said, and can revise its position. A rotating judge evaluates the outputs and produces a merge directive — not "use Claude's code," but "use Claude's base, replace the auth function with GPT's argon2 implementation, and apply Gemini's transaction rollback." The result is synthesized output that is better than any single agent could produce alone. This is ensemble learning applied to software engineering.

The second problem: most AI coding tools are black boxes. Devin and similar tools run autonomously and hand you a result. If it's wrong, you don't know why. AI Roundtable exposes the full deliberation in real time. You see every agent's reasoning, every disagreement, every consensus decision. You can intervene, redirect, or override at any point. The developer is a participant, not a spectator.


Table of Contents


Features

  • Multi-agent deliberation — Claude, GPT, and Gemini debate each round; up to 3 iterations before consensus
  • 7-round structured workflow — Requirements → Architecture → Development → Code Review → QA → DevOps → Execution Analysis
  • Configurable rounds — Enable/disable any round and add custom instructions per round before starting
  • Rotating judge — Different AI acts as judge each round to prevent anchoring bias
  • Function-level code synthesis — After consensus, the BASE agent merges the best parts from all agents at the function level
  • Chunked code review — Large codebases split into ~20K-token chunks so each agent stays within TPM limits
  • Developer intervention — Inject notes mid-round or at consensus; retry any round with updated context
  • Docker execution — Build and run generated code in Docker; stdout/stderr streamed live; if it fails, agents analyze and fix — user decides whether to re-run
  • WebContainer support — Frontend/fullstack projects run in-browser via WebContainer API
  • GitHub export via Auth0 Token Vault — AI agent pushes code to GitHub server-side; raw token never reaches the browser
  • Session persistence — Resume any session via ?session=<id>; sessions survive server restarts
  • Project dashboard — View and resume all past sessions at /projects

Quick Start

Prerequisites: Python 3.11+, Node.js 18+, Docker, an Auth0 tenant (free at auth0.com), and API keys for at least one AI provider.

1. Backend

cd backend
python -m venv venv && source venv/bin/activate
pip install -r ../requirements.txt

Create backend/.env:

ALLOWED_ORIGINS=http://localhost:3000
AUTH0_DOMAIN=your-tenant.us.auth0.com
AUTH0_AUDIENCE=https://your-api-identifier
AUTH0_MGMT_CLIENT_ID=<M2M app client ID>
AUTH0_MGMT_CLIENT_SECRET=<M2M app client secret>
SECRET_KEY=<run: openssl rand -hex 32>

2. Frontend

cd frontend && npm install

Create frontend/.env.local:

AUTH0_DOMAIN=your-tenant.us.auth0.com
AUTH0_CLIENT_ID=<from Auth0 Application settings>
AUTH0_CLIENT_SECRET=<from Auth0 Application settings>
AUTH0_SECRET=<run: openssl rand -hex 32>
APP_BASE_URL=http://localhost:3000
AUTH0_AUDIENCE=https://your-api-identifier
AUTH0_MGMT_CLIENT_ID=<M2M app client ID>
AUTH0_MGMT_CLIENT_SECRET=<M2M app client secret>

Auth0 Application settings:

  • Allowed Callback URLs: http://localhost:3000/auth/callback
  • Allowed Logout URLs: http://localhost:3000
  • Allowed Web Origins: http://localhost:3000

Auth0 Management API — M2M scopes required: read:users, update:users, read:user_idp_tokens

3. Run

# Terminal 1
cd backend && source venv/bin/activate
uvicorn app.main:app --reload --port 8000

# Terminal 2
cd frontend && npm run dev

Open http://localhost:3000.

4. GitHub Token Vault (optional)

Allows the AI agent to push generated code to GitHub without exposing any token to the browser.

  1. Create a GitHub OAuth App at github.com/settings/developers — callback URL: https://<AUTH0_DOMAIN>/login/callback, scope: repo
  2. Auth0 → Authentication → Social → GitHub → paste credentials → Purpose: Authentication and Connected Accounts for Token Vault
  3. Auth0 → Authentication → Social → GitHub → Applications tab → enable for your app

Sign in with GitHub and the token is stored automatically. Users who sign in with Google/email can still export via Personal Access Token.


Main Use Cases

1. Build a new project from scratch

You describe what you want to build. Three AIs deliberate across up to 7 rounds — agreeing on requirements, architecture, code, review, QA, and deployment setup. At each round you see the full debate, can inject notes, and confirm before moving forward. At the end, download a ZIP or export directly to GitHub.

Best for: APIs, web apps, CLI tools, scripts — anything that can be containerized.


2. Improve an existing codebase

Upload your existing files at the start. The agents extend or modify them rather than starting from scratch — so your structure and conventions are preserved. Useful for adding a feature, refactoring a module, or getting a multi-AI code review.

Best for: Adding features to an existing project, large refactors, getting a second (and third) opinion on code you already wrote.


3. Pick only the rounds you need

Not every project needs all 7 rounds. Running just Requirements + Architecture gives you a detailed spec and system design in minutes. Running just Developer + Code Review gives you working code with a critique. Mix and match.

Common lightweight combos:

  • Spec only: Requirements + Architecture
  • Code only: Developer + Code Review
  • Full cycle without Docker: Requirements → QA (skip DevOps + Execution & Analysis)

4. Stay in control during execution

At every consensus point, you decide what happens next. If the agents disagree, you see the dispute and choose a direction. You can type a note mid-round to redirect the discussion. If a round goes wrong, roll it back and retry with updated instructions.


5. Run and debug generated code (Execution & Analysis)

After DevOps generates a Dockerfile, the Execution & Analysis round auto-builds and runs the container, streams live logs to your browser, and has the agents analyze any failures and apply fixes automatically. Retries up to 3 times before surfacing a final failure.

Note: Success depends on AI-generated Dockerfile quality and project complexity. Simpler projects (single-service APIs) work reliably; complex multi-service setups may need manual intervention.


How It Works

7-Round Structure

# Round Role Token Budget
1 Requirements Principal Product Engineer 2048
2 Architecture Distinguished Software Architect 4096
3 Development Principal Software Engineer 8192
4 Code Review Staff Engineer 4096
5 QA Principal QA Engineer 8192
6 DevOps Senior Platform Engineer 4096
7 Execution & Analysis SRE / Runtime Debugger 8192

Speaking order is randomized per round to prevent first-speaker anchoring bias. Any round can be skipped via round_configs at session start.

Debate Loop

Each round runs up to 3 iterations. Agents see all previous outputs before speaking:

Agent A speaks → Agent B speaks (sees A) → Agent C speaks (sees A, B)
→ Judge evaluates → CONSENSUS? break : iterate (max 3)
→ After 3 iterations without consensus: escalate to developer

Consensus & Synthesis

The judge rotates by round (selected_agents[round_index % len(selected)]) so no single AI always anchors decisions.

Three judge formats by round type:

Category Rounds Format
Discussion Requirements, Architecture CONSENSUS: <summary> or DISCUSS: <options>
Code-producing Developer, QA, DevOps CONSENSUS: BASE=<agent>. INCORPORATE: ... MUST_FIX: ...
Code-reviewing Reviewer, Execution & Analysis CONSENSUS: CRITICAL: ... WARNINGS: ... CONFIRMED_CLEAN: ...

After consensus in code-producing rounds, the BASE agent re-generates all files incorporating the INCORPORATE directives — merging the best function-level contributions from every agent.

Chunked Code Review

The Reviewer round uses token-budget chunking instead of the debate loop to stay within GPT's ~30K TPM limit:

  1. Files sorted by priority: app code → config/infra → tests
  2. Grouped into ≤ 20K token chunks
  3. All agents review each chunk; findings accumulated
  4. Single consensus check on accumulated findings (text only — no code re-sent)
  5. Synthesizer applies fixes chunk by chunk

Execution & Analysis (Round 7)

After DevOps round completes:

  1. final_files written to a temp directory
  2. docker compose up --build (if docker-compose.yml present) or docker build && docker run
  3. Stdout/stderr streamed as SSE events to the UI
  4. On container exit → Execution & Analysis round auto-starts; agents analyze the output and apply fixes
  5. After analysis completes → user is asked whether to re-run Docker with the fixed code

Auth0 & Security

Authentication

Sign in with Google or GitHub. Every session is scoped to the authenticated user's sub claim. The backend verifies Auth0 JWTs (RS256, JWKS) on every request.

API Key Storage

Stage Where Encryption
Saved by user Auth0 user_metadata Fernet (server-side)
Active session SQLite + in-memory Fernet
In transit HTTPS only TLS

Keys are never sent in request bodies beyond the initial save — the backend reads them from Auth0 user_metadata using the JWT's user_id.

GitHub Token Vault

The GitHub OAuth App is registered once (under the developer's account) as an application identity. Each user who connects GitHub gets their own token — the developer's account is never used for user pushes.

User signs in with GitHub → Auth0 stores that user's token in Token Vault
User clicks "Export to GitHub" → POST /api/github/push
  Server reads user's token from Token Vault (Management API)
  Server pushes files to GitHub using user's token
  Returns repo URL — token never sent to browser

Other Security Measures

  • Rate limiting: 10 req/min per IP (slowapi)
  • Security headers: X-Frame-Options: DENY, X-Content-Type-Options: nosniff, HSTS
  • Docker Dockerfile scanning: rejects curl | sh, privileged flags, host network/PID mounts

AI Provider Keys

Provider Where to get
Anthropic (Claude) console.anthropic.com/settings/keys
OpenAI (GPT) platform.openai.com/api-keys
Google (Gemini) aistudio.google.com/app/apikey

API Reference

FastAPI backend (/api):

Method Endpoint Description
POST /session/start Create session
GET /sessions List user's sessions (JWT required)
GET /session/{id} Get session state
GET /session/{id}/round Stream current round (SSE)
POST /session/developer-input Inject developer note
POST /session/{id}/retry Roll back to round index
GET /session/{id}/chat?message= Direct Q&A with judge (SSE)
GET /session/{id}/execute Run in Docker (SSE)
POST /session/{id}/stop Stop Docker container
POST /session/{id}/complete Mark complete
GET /session/{id}/files Get generated files
GET /session/{id}/download Download as ZIP

Next.js proxy routes (/api, Auth0 session required):

Method Endpoint Description
GET/PATCH /api/user/keys AI API keys via Auth0 user_metadata
GET /api/user/sessions Session list (adds Bearer token server-side)
GET /api/github/status GitHub Token Vault connection status
POST /api/github/push Server-side GitHub push via Token Vault

SSE event types:

Event Description
round_start Round begins with metadata
agent_start / agent_end Agent speaking boundaries
token Streaming token {agent, token}
chunk_start Reviewer chunk starting
debate_iteration New debate turn
synthesis_start / synthesis_end Code merge in progress
consensus Round result with summary, options, next_round
exec_output / exec_done / exec_error Docker execution events

Configuration

backend/.env:

Variable Default Description
ALLOWED_ORIGINS http://localhost:3000 CORS allowed origins
AUTH0_DOMAIN (unset) Enables JWT verification when set
AUTH0_AUDIENCE (unset) Must match frontend value
AUTH0_MGMT_CLIENT_ID (unset) M2M app — for user_metadata access
AUTH0_MGMT_CLIENT_SECRET (unset) M2M app secret
SECRET_KEY (required) Fernet key for API key encryption (openssl rand -hex 32)

frontend/.env.local:

Variable Description
AUTH0_DOMAIN Auth0 tenant domain
AUTH0_CLIENT_ID Application client ID
AUTH0_CLIENT_SECRET Application client secret
AUTH0_SECRET Cookie encryption key (openssl rand -hex 32)
APP_BASE_URL App base URL
AUTH0_AUDIENCE API identifier
AUTH0_MGMT_CLIENT_ID M2M client ID
AUTH0_MGMT_CLIENT_SECRET M2M client secret

AI Models:

Agent Deliberation Judge (consensus)
Claude claude-sonnet-4-6 claude-haiku-4-5-20251001
GPT gpt-4o gpt-4o-mini
Gemini gemini-1.5-pro gemini-1.5-flash

Project Structure

ai-roundtable/
├── backend/
│   ├── app/
│   │   ├── main.py               # FastAPI app, CORS, rate limiting, security headers
│   │   ├── auth.py               # Auth0 JWT verification (RS256 + JWKS)
│   │   ├── config.py             # Settings from environment
│   │   ├── models/schemas.py     # Pydantic request/response models
│   │   ├── routers/session.py    # All API endpoints + debate loop orchestration
│   │   └── services/
│   │       ├── claude.py         # Anthropic streaming client
│   │       ├── gpt.py            # OpenAI streaming client
│   │       ├── gemini.py         # Google Gemini streaming client
│   │       ├── consensus.py      # Judge logic, consensus parsing
│   │       ├── orchestrator.py   # Round order, token budgets, role prompts
│   │       └── auth0_mgmt.py     # Auth0 Management API (user_metadata)
│   └── tests/
│       ├── unit/                 # Consensus, chunked review, session lock
│       └── integration/          # Session lifecycle, restore, retry
└── frontend/
    ├── app/
    │   ├── page.tsx              # Home — routes to setup or session restore
    │   ├── projects/page.tsx     # Session dashboard (Server Component)
    │   └── api/
    │       ├── user/keys/        # AI API keys via Auth0 user_metadata
    │       ├── user/sessions/    # Session list proxy (adds Bearer token)
    │       ├── github/status/    # Token Vault connection check
    │       └── github/push/      # Server-side GitHub push
    ├── components/
    │   ├── ChatRoom.tsx          # Main session UI
    │   ├── SetupScreen.tsx       # Agent selection + API key entry
    │   ├── GitHubExportModal.tsx # Token Vault + PAT export
    │   ├── ConsensusBanner.tsx   # Dispute resolution UI
    │   ├── ExecutionPanel.tsx    # Docker terminal output
    │   └── FileViewerModal.tsx   # In-browser file browser
    └── lib/
        ├── useRoundStream.ts     # Session state + SSE management
        ├── useAgentSetup.ts      # Agent config + Auth0 API key sync
        ├── api.ts                # HTTP/SSE client functions
        └── github.ts             # Client-side GitHub API (PAT fallback)

Testing

# Backend
cd backend && source venv/bin/activate
pytest                                    # all tests
pytest --cov=app --cov-report=term-missing  # with coverage

# Frontend
cd frontend && npm test

Future Improvements

Area Current Direction
Debate iterations 3 (hardcoded) Per-round config; more for complex projects
Judge model Cost-optimized (Haiku, mini, Flash) User-selectable: cost vs. accuracy
Token budget & code review Fixed per round type; code review splits files into ~12K-token chunks (cross-file relationships can be missed); QA restricted to Claude/Gemini when multiple agents selected (GPT's 30K TPM limit is too low for full codebase) Tiered plans: higher tiers get larger budgets, skip chunking, and allow all agents in QA — sending all files in one pass for a complete, relationship-aware review
WebContainer npm projects only Python, Deno runtime support
Cross-round memory No persistent context Long-running project continuity
GitHub export Push to new repo only Push to existing repo, open PR
Orchestration Custom round loop in orchestrator.py Migrate to LangGraph — declarative graph nodes per round, conditional edges for retry/consensus, built-in human-in-the-loop and state persistence

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors