InterviewCraft

Deliberate practice engine for tech interviews. Evidence-based scoring · answer diff rewriting · rewind micro-practice · 22-skill graph · salary negotiation simulator

What Is This?

InterviewCraft is a closed-loop training system — not another mock interview tool. Like a sports coach who records every rep, identifies exactly what broke down, and makes you practice that specific thing until it's solid.

ANSWER → LINT (evidence spans) → DIFF (3 versions) → REWIND (re-answer)
    ↑       → DELTA SCORE → SKILL GRAPH UPDATE → ADAPTIVE DRILL PLAN ──┘

Every session feeds back into a persistent 22-microskill graph. The system knows which skills are weakest, schedules spaced-repetition drills, and tracks your delta across sessions — not just a one-shot score.

Key Features

Feature	What it does
Evidence-backed scoring	15-rule rubric. Every triggered rule links to `{start_ms, end_ms}` — the exact moment you said it. No hallucinated quotes.
Answer diff (3 versions)	Minimal patch · medium rewrite · ideal answer. Each shows `[+rule → +N points]`.
Rewind micro-practice	Re-answer any weak segment. Delta shown immediately: `+12 structure, -3 depth`.
22-skill graph	Microskills tracked across all sessions with trend lines and spaced-repetition scheduling.
Cross-session AI memory	The interviewer AI remembers your patterns across sessions — recurring weaknesses, over-used stories, communication habits.
Story bank	Auto-detects STAR stories. Coverage map shows which competencies lack evidence. Overuse warning after 3 uses.
Negotiation simulator	AI recruiter with hidden max budget. Scores anchoring, value articulation, counter-strategy, emotional control.
JD Analyzer	Paste a job description — auto-fills session type, company context, and focus skills.
Voice delivery analysis	Filler words, WPM, pause patterns — scored against benchmarks after each session.
BYOK	Use your own Anthropic / OpenAI / Deepgram / ElevenLabs keys. Encrypted at rest.

Tech Stack

Layer	Choice	Why
Backend	FastAPI + Python 3.13	Async-native, fully typed, fast iteration
Frontend	Next.js 15 + Tailwind + Zustand	App Router, edge CDN, zero-config deploys
Voice	Deepgram Nova-2 + Claude Sonnet + ElevenLabs	Best-in-class STT/LLM/TTS with fallback chain
Database	PostgreSQL 16 + JSONB skill graph	Flexible schema, prompt-cached rubric reads
Cache	Redis 7	Session state + rate limiting + memory cache
AI scoring	Anthropic Claude with prompt caching	Rubric cached = ~90% cheaper on re-reads

Architecture

System Overview

graph TB
    subgraph Client["Browser"]
        UI["Next.js 15 App\nSSR + Zustand"]
    end

    subgraph Vercel["Vercel CDN"]
        FE["Next.js Frontend\nEdge CDN · Auto HTTPS"]
    end

    subgraph Fly["Fly.io — FastAPI (Python 3.13)"]
        API["REST API\n/api/v1/*"]
        WS["WebSocket\n/api/v1/sessions/{id}/ws"]
        subgraph Store["Data"]
            PG[("PostgreSQL 16\nskill graph · sessions\nstories · usage_logs")]
            RD[("Redis 7\nsession state · cache\nrate limiting")]
        end
    end

    subgraph Providers["AI Providers (BYOK-capable)"]
        DG["Deepgram Nova-2\nSTT · word timestamps"]
        AN["Anthropic Claude\nSonnet → voice LLM\nHaiku → scoring / memory"]
        EL["ElevenLabs\nTTS · mp3_44100_128\nfallback → Deepgram TTS"]
    end

    UI --> FE
    FE -- "JWT Bearer" --> API
    UI -- "WebSocket\n?token=JWT" --> WS
    API --> PG & RD
    WS --> PG & RD
    WS --> DG & AN & EL

Why split Fly.io + Vercel? Vercel serverless cannot hold WebSocket connections open for 20–50 min voice sessions. Fly.io runs the stateful backend as a long-lived process; Vercel handles CDN-optimised static + SSR delivery.

Voice Pipeline

sequenceDiagram
    autonumber
    participant Mic as Microphone
    participant WS as WebSocket Server
    participant STT as Deepgram STT
    participant LLM as Claude Sonnet
    participant TTS as ElevenLabs / Deepgram TTS
    participant DB as PostgreSQL

    Mic->>WS: PCM audio chunks (streaming)
    WS->>STT: Forward audio stream
    STT-->>WS: Interim transcripts (low latency)
    STT-->>WS: Final transcript + word timestamps

    Note over WS: Adaptive debounce<br/>~4 s short answers · ~14 s long<br/>measured from last sound (not last word)<br/>[WAIT] token → skip TTS, keep accumulating

    WS->>LLM: System prompt + cross-session memory + transcript
    LLM-->>WS: Streaming response chunks

    Note over WS: Barge-in detection<br/>threshold = 80 · 10 consecutive frames (~1 s)<br/>→ cancel TTS if user speaks

    WS->>TTS: Text chunks (streaming)

    Note over TTS: ElevenLabs primary<br/>401 / timeout → auto-fallback to Deepgram TTS

    TTS-->>WS: Audio (mp3_44100_128)
    WS-->>Mic: Audio playback to user

    WS->>DB: Store segment (question · answer · evidence spans)
    WS->>DB: Trigger async scoring + skill graph update

The Deliberate-Practice Loop

flowchart LR
    A["🎙️ Voice Answer\n± transcription"] --> B

    B["🔍 Lint\n15-rule rubric\nevidence = start_ms/end_ms spans\nno hallucinated quotes"]

    B --> C["📝 Diff\n3 rewrite versions\nminimal · medium · ideal\neach shows +rule → +N pts"]

    C --> D["⏪ Rewind\nre-answer any weak segment\nsame question · fresh slate"]

    D --> E["📈 Delta Score\ninstant per-rule breakdown\n+12 structure · −3 depth"]

    E --> F["🕸️ Skill Graph\n22 microskills\ntrend · spaced-repetition weight\ncross-session memory"]

    F --> G["🎯 Drill Plan\nweakest skills · longest gap\nadaptive scheduling"]

    G -.->|next session| A

    style A fill:#6366f1,color:#fff,stroke:#4f46e5
    style F fill:#7c3aed,color:#fff,stroke:#6d28d9
    style G fill:#4f46e5,color:#fff,stroke:#3730a3

Why this matters: Most AI interview tools are stateless — one session, one score, no memory. InterviewCraft accumulates a 22-microskill model of your weaknesses and schedules deliberate repetition on the exact skills that are failing. The loop above is what separates deliberate practice from mock interviews.

Quick Start

Prerequisites

Docker + Docker Compose
Python 3.13+, Node.js 20+
API keys: ANTHROPIC_API_KEY, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY

Run locally

git clone https://github.com/alexdoroshevich/interviewcraft.git
cd interviewcraft

cp .env.example .env
# Edit .env — add your API keys

# Start all services: postgres, redis, backend, frontend
docker compose up -d

# First run: apply DB migrations
cd backend && pip install -e ".[dev]" && alembic upgrade head

Frontend: http://localhost:3000
API docs: http://localhost:8080/api/docs

Seed demo data

cd backend && python ../scripts/seed_demo.py

Loads 10 pre-built sessions, a skill graph, story bank, and negotiation history. Demo login: demo@interviewcraft.dev / demo1234

Google Authentication

The "Continue with Google" button activates automatically once NEXT_PUBLIC_GOOGLE_CLIENT_ID is set:

Google Cloud Console → APIs & Services → Credentials → Create OAuth 2.0 Client ID (Web application)
Authorized Origins: http://localhost:3000 + your production domain
Authorized Redirect URI: http://localhost:8080/api/v1/auth/google/callback

Add to .env:

GOOGLE_CLIENT_ID=your-client-id
GOOGLE_CLIENT_SECRET=your-client-secret
NEXT_PUBLIC_GOOGLE_CLIENT_ID=your-client-id

Development

# Backend tests
cd backend && pytest -x -q

# Backend lint + type check
cd backend && ruff check . && mypy app/

# Frontend
cd frontend && npm run lint && npm run type-check && npm test

# E2E tests (requires running app)
cd frontend && npm run test:e2e

All PRs run these automatically via GitHub Actions. A PR cannot merge unless every gate passes.

Deployment

The backend deploys to Fly.io and the frontend to Vercel.

Frontend (Vercel): Import the repo, set Root Directory to frontend, add NEXT_PUBLIC_API_URL pointing to your Fly.io backend. Every push auto-deploys; every PR gets a preview URL.

Backend (Fly.io):

flyctl auth login
flyctl apps create <your-app-name>
flyctl postgres create --name <your-db> && flyctl postgres attach <your-db>
flyctl secrets set ANTHROPIC_API_KEY="..." DEEPGRAM_API_KEY="..." # see .env.example
flyctl deploy --config backend/fly.toml

After the initial deploy, GitHub Actions handles all subsequent deploys automatically on push to main.

Database Migrations

# Apply all pending migrations (local)
cd backend && alembic upgrade head

# Apply on Fly.io
flyctl ssh console --app <your-app-name>
cd /app && python -m alembic upgrade head

# Create a new migration after model changes
cd backend && alembic revision --autogenerate -m "description"

Daily Workflow

# 1. Create a feature branch from main
git checkout main && git pull
git checkout -b feature/my-feature

# 2. Make changes, commit
git commit -m "feat: my feature"
git push origin feature/my-feature

# 3. Open a PR → CI runs automatically
#    All gates must pass before merging

# 4. Merge → auto-deploys to production

Project Structure

interviewcraft/
├── backend/
│   ├── app/
│   │   ├── api/v1/          # Route handlers (auth, sessions, scoring, skills…)
│   │   ├── models/          # SQLAlchemy ORM models
│   │   ├── schemas/         # Pydantic request/response schemas
│   │   └── services/        # Voice pipeline, scoring engine, memory, auth
│   └── tests/               # Unit + integration tests
├── frontend/
│   ├── app/                 # Next.js App Router pages
│   ├── components/          # Shared React components
│   └── lib/api.ts           # Typed API client
├── docs/
│   └── adr/                 # Architecture Decision Records
└── scripts/
    ├── seed_demo.py         # Demo data
    └── run_demo.sh          # One-command demo startup

Architecture Decisions

Key decisions are documented in docs/adr/:

ADR	Decision
000	North Star specification
001	WebSocket over WebRTC for voice
002	Full tech stack rationale
003	Evidence spans + batched scoring
004	Text-only rewind in MVP
005	Provider ABC interfaces
006	Audio never stored, encrypted transcripts
007	SWE-only scope, extensible architecture
008	Four-technique variance reduction

Benchmarks

The benchmarks/ directory contains reproducible evaluations of the system's AI subsystems. All scripts require a valid Anthropic API key and run against the real model APIs. Dated output files are gitignored — only synthetic example.json baselines are committed.

Benchmark	What it measures	KPI
memory-recall	Does the LLM accurately recall injected coaching context?	≥ 95% recall, 0% hallucination
scoring-quality	Do automated scores correlate with human judgement?	Pearson r ≥ 0.85, MAE ≤ 10
voice-latency	STT → LLM → TTS latency (mock + production)	E2E p95 < 1 000 ms
cost-profile	Cost per session by quality profile and provider	—

Run any benchmark with --confirm to execute live API calls (see each README for cost estimates). Use the mock scripts to explore latency characteristics without any API keys.

Contributing

See CONTRIBUTING.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.claude		.claude
.github		.github
backend		backend
benchmarks		benchmarks
docs		docs
frontend		frontend
k6		k6
scripts		scripts
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
repomix.config.json		repomix.config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InterviewCraft

What Is This?

Key Features

Tech Stack

Architecture

System Overview

Voice Pipeline

The Deliberate-Practice Loop

Quick Start

Prerequisites

Run locally

Seed demo data

Google Authentication

Development

Deployment

Database Migrations

Daily Workflow

Project Structure

Architecture Decisions

Benchmarks

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InterviewCraft

What Is This?

Key Features

Tech Stack

Architecture

System Overview

Voice Pipeline

The Deliberate-Practice Loop

Quick Start

Prerequisites

Run locally

Seed demo data

Google Authentication

Development

Deployment

Database Migrations

Daily Workflow

Project Structure

Architecture Decisions

Benchmarks

Contributing

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages