Deliberate practice engine for tech interviews. Evidence-based scoring Β· answer diff rewriting Β· rewind micro-practice Β· 22-skill graph Β· salary negotiation simulator
InterviewCraft is a closed-loop training system β not another mock interview tool. Like a sports coach who records every rep, identifies exactly what broke down, and makes you practice that specific thing until it's solid.
ANSWER β LINT (evidence spans) β DIFF (3 versions) β REWIND (re-answer)
β β DELTA SCORE β SKILL GRAPH UPDATE β ADAPTIVE DRILL PLAN βββ
Every session feeds back into a persistent 22-microskill graph. The system knows which skills are weakest, schedules spaced-repetition drills, and tracks your delta across sessions β not just a one-shot score.
| Feature | What it does |
|---|---|
| Evidence-backed scoring | 15-rule rubric. Every triggered rule links to {start_ms, end_ms} β the exact moment you said it. No hallucinated quotes. |
| Answer diff (3 versions) | Minimal patch Β· medium rewrite Β· ideal answer. Each shows [+rule β +N points]. |
| Rewind micro-practice | Re-answer any weak segment. Delta shown immediately: +12 structure, -3 depth. |
| 22-skill graph | Microskills tracked across all sessions with trend lines and spaced-repetition scheduling. |
| Cross-session AI memory | The interviewer AI remembers your patterns across sessions β recurring weaknesses, over-used stories, communication habits. |
| Story bank | Auto-detects STAR stories. Coverage map shows which competencies lack evidence. Overuse warning after 3 uses. |
| Negotiation simulator | AI recruiter with hidden max budget. Scores anchoring, value articulation, counter-strategy, emotional control. |
| JD Analyzer | Paste a job description β auto-fills session type, company context, and focus skills. |
| Voice delivery analysis | Filler words, WPM, pause patterns β scored against benchmarks after each session. |
| BYOK | Use your own Anthropic / OpenAI / Deepgram / ElevenLabs keys. Encrypted at rest. |
| Layer | Choice | Why |
|---|---|---|
| Backend | FastAPI + Python 3.13 | Async-native, fully typed, fast iteration |
| Frontend | Next.js 15 + Tailwind + Zustand | App Router, edge CDN, zero-config deploys |
| Voice | Deepgram Nova-2 + Claude Sonnet + ElevenLabs | Best-in-class STT/LLM/TTS with fallback chain |
| Database | PostgreSQL 16 + JSONB skill graph | Flexible schema, prompt-cached rubric reads |
| Cache | Redis 7 | Session state + rate limiting + memory cache |
| AI scoring | Anthropic Claude with prompt caching | Rubric cached = ~90% cheaper on re-reads |
graph TB
subgraph Client["Browser"]
UI["Next.js 15 App\nSSR + Zustand"]
end
subgraph Vercel["Vercel CDN"]
FE["Next.js Frontend\nEdge CDN Β· Auto HTTPS"]
end
subgraph Fly["Fly.io β FastAPI (Python 3.13)"]
API["REST API\n/api/v1/*"]
WS["WebSocket\n/api/v1/sessions/{id}/ws"]
subgraph Store["Data"]
PG[("PostgreSQL 16\nskill graph Β· sessions\nstories Β· usage_logs")]
RD[("Redis 7\nsession state Β· cache\nrate limiting")]
end
end
subgraph Providers["AI Providers (BYOK-capable)"]
DG["Deepgram Nova-2\nSTT Β· word timestamps"]
AN["Anthropic Claude\nSonnet β voice LLM\nHaiku β scoring / memory"]
EL["ElevenLabs\nTTS Β· mp3_44100_128\nfallback β Deepgram TTS"]
end
UI --> FE
FE -- "JWT Bearer" --> API
UI -- "WebSocket\n?token=JWT" --> WS
API --> PG & RD
WS --> PG & RD
WS --> DG & AN & EL
Why split Fly.io + Vercel? Vercel serverless cannot hold WebSocket connections open for 20β50 min voice sessions. Fly.io runs the stateful backend as a long-lived process; Vercel handles CDN-optimised static + SSR delivery.
sequenceDiagram
autonumber
participant Mic as Microphone
participant WS as WebSocket Server
participant STT as Deepgram STT
participant LLM as Claude Sonnet
participant TTS as ElevenLabs / Deepgram TTS
participant DB as PostgreSQL
Mic->>WS: PCM audio chunks (streaming)
WS->>STT: Forward audio stream
STT-->>WS: Interim transcripts (low latency)
STT-->>WS: Final transcript + word timestamps
Note over WS: Adaptive debounce<br/>~4 s short answers Β· ~14 s long<br/>measured from last sound (not last word)<br/>[WAIT] token β skip TTS, keep accumulating
WS->>LLM: System prompt + cross-session memory + transcript
LLM-->>WS: Streaming response chunks
Note over WS: Barge-in detection<br/>threshold = 80 Β· 10 consecutive frames (~1 s)<br/>β cancel TTS if user speaks
WS->>TTS: Text chunks (streaming)
Note over TTS: ElevenLabs primary<br/>401 / timeout β auto-fallback to Deepgram TTS
TTS-->>WS: Audio (mp3_44100_128)
WS-->>Mic: Audio playback to user
WS->>DB: Store segment (question Β· answer Β· evidence spans)
WS->>DB: Trigger async scoring + skill graph update
flowchart LR
A["ποΈ Voice Answer\nΒ± transcription"] --> B
B["π Lint\n15-rule rubric\nevidence = start_ms/end_ms spans\nno hallucinated quotes"]
B --> C["π Diff\n3 rewrite versions\nminimal Β· medium Β· ideal\neach shows +rule β +N pts"]
C --> D["βͺ Rewind\nre-answer any weak segment\nsame question Β· fresh slate"]
D --> E["π Delta Score\ninstant per-rule breakdown\n+12 structure Β· β3 depth"]
E --> F["πΈοΈ Skill Graph\n22 microskills\ntrend Β· spaced-repetition weight\ncross-session memory"]
F --> G["π― Drill Plan\nweakest skills Β· longest gap\nadaptive scheduling"]
G -.->|next session| A
style A fill:#6366f1,color:#fff,stroke:#4f46e5
style F fill:#7c3aed,color:#fff,stroke:#6d28d9
style G fill:#4f46e5,color:#fff,stroke:#3730a3
Why this matters: Most AI interview tools are stateless β one session, one score, no memory. InterviewCraft accumulates a 22-microskill model of your weaknesses and schedules deliberate repetition on the exact skills that are failing. The loop above is what separates deliberate practice from mock interviews.
- Docker + Docker Compose
- Python 3.13+, Node.js 20+
- API keys:
ANTHROPIC_API_KEY,DEEPGRAM_API_KEY,ELEVENLABS_API_KEY
git clone https://github.com/alexdoroshevich/interviewcraft.git
cd interviewcraft
cp .env.example .env
# Edit .env β add your API keys
# Start all services: postgres, redis, backend, frontend
docker compose up -d
# First run: apply DB migrations
cd backend && pip install -e ".[dev]" && alembic upgrade head- Frontend: http://localhost:3000
- API docs: http://localhost:8080/api/docs
cd backend && python ../scripts/seed_demo.pyLoads 10 pre-built sessions, a skill graph, story bank, and negotiation history. Demo login: demo@interviewcraft.dev / demo1234
The "Continue with Google" button activates automatically once NEXT_PUBLIC_GOOGLE_CLIENT_ID is set:
- Google Cloud Console β APIs & Services β Credentials β Create OAuth 2.0 Client ID (Web application)
- Authorized Origins:
http://localhost:3000+ your production domain - Authorized Redirect URI:
http://localhost:8080/api/v1/auth/google/callback - Add to
.env:GOOGLE_CLIENT_ID=your-client-id GOOGLE_CLIENT_SECRET=your-client-secret NEXT_PUBLIC_GOOGLE_CLIENT_ID=your-client-id
# Backend tests
cd backend && pytest -x -q
# Backend lint + type check
cd backend && ruff check . && mypy app/
# Frontend
cd frontend && npm run lint && npm run type-check && npm test
# E2E tests (requires running app)
cd frontend && npm run test:e2eAll PRs run these automatically via GitHub Actions. A PR cannot merge unless every gate passes.
The backend deploys to Fly.io and the frontend to Vercel.
Frontend (Vercel): Import the repo, set Root Directory to frontend, add NEXT_PUBLIC_API_URL pointing to your Fly.io backend. Every push auto-deploys; every PR gets a preview URL.
Backend (Fly.io):
flyctl auth login
flyctl apps create <your-app-name>
flyctl postgres create --name <your-db> && flyctl postgres attach <your-db>
flyctl secrets set ANTHROPIC_API_KEY="..." DEEPGRAM_API_KEY="..." # see .env.example
flyctl deploy --config backend/fly.tomlAfter the initial deploy, GitHub Actions handles all subsequent deploys automatically on push to main.
# Apply all pending migrations (local)
cd backend && alembic upgrade head
# Apply on Fly.io
flyctl ssh console --app <your-app-name>
cd /app && python -m alembic upgrade head
# Create a new migration after model changes
cd backend && alembic revision --autogenerate -m "description"# 1. Create a feature branch from main
git checkout main && git pull
git checkout -b feature/my-feature
# 2. Make changes, commit
git commit -m "feat: my feature"
git push origin feature/my-feature
# 3. Open a PR β CI runs automatically
# All gates must pass before merging
# 4. Merge β auto-deploys to productioninterviewcraft/
βββ backend/
β βββ app/
β β βββ api/v1/ # Route handlers (auth, sessions, scoring, skillsβ¦)
β β βββ models/ # SQLAlchemy ORM models
β β βββ schemas/ # Pydantic request/response schemas
β β βββ services/ # Voice pipeline, scoring engine, memory, auth
β βββ tests/ # Unit + integration tests
βββ frontend/
β βββ app/ # Next.js App Router pages
β βββ components/ # Shared React components
β βββ lib/api.ts # Typed API client
βββ docs/
β βββ adr/ # Architecture Decision Records
βββ scripts/
βββ seed_demo.py # Demo data
βββ run_demo.sh # One-command demo startup
Key decisions are documented in docs/adr/:
| ADR | Decision |
|---|---|
| 000 | North Star specification |
| 001 | WebSocket over WebRTC for voice |
| 002 | Full tech stack rationale |
| 003 | Evidence spans + batched scoring |
| 004 | Text-only rewind in MVP |
| 005 | Provider ABC interfaces |
| 006 | Audio never stored, encrypted transcripts |
| 007 | SWE-only scope, extensible architecture |
| 008 | Four-technique variance reduction |
The benchmarks/ directory contains reproducible evaluations of the system's AI subsystems. All scripts require a valid Anthropic API key and run against the real model APIs. Dated output files are gitignored β only synthetic example.json baselines are committed.
| Benchmark | What it measures | KPI |
|---|---|---|
| memory-recall | Does the LLM accurately recall injected coaching context? | β₯ 95% recall, 0% hallucination |
| scoring-quality | Do automated scores correlate with human judgement? | Pearson r β₯ 0.85, MAE β€ 10 |
| voice-latency | STT β LLM β TTS latency (mock + production) | E2E p95 < 1 000 ms |
| cost-profile | Cost per session by quality profile and provider | β |
Run any benchmark with --confirm to execute live API calls (see each README for cost estimates). Use the mock scripts to explore latency characteristics without any API keys.
See CONTRIBUTING.md.