Skip to content

alexdoroshevich/Interviewcraft

Repository files navigation

InterviewCraft

Deliberate practice engine for tech interviews. Evidence-based scoring Β· answer diff rewriting Β· rewind micro-practice Β· 22-skill graph Β· salary negotiation simulator

Backend CI Frontend CI License: MIT


What Is This?

InterviewCraft is a closed-loop training system β€” not another mock interview tool. Like a sports coach who records every rep, identifies exactly what broke down, and makes you practice that specific thing until it's solid.

ANSWER β†’ LINT (evidence spans) β†’ DIFF (3 versions) β†’ REWIND (re-answer)
    ↑       β†’ DELTA SCORE β†’ SKILL GRAPH UPDATE β†’ ADAPTIVE DRILL PLAN β”€β”€β”˜

Every session feeds back into a persistent 22-microskill graph. The system knows which skills are weakest, schedules spaced-repetition drills, and tracks your delta across sessions β€” not just a one-shot score.


Key Features

Feature What it does
Evidence-backed scoring 15-rule rubric. Every triggered rule links to {start_ms, end_ms} β€” the exact moment you said it. No hallucinated quotes.
Answer diff (3 versions) Minimal patch Β· medium rewrite Β· ideal answer. Each shows [+rule β†’ +N points].
Rewind micro-practice Re-answer any weak segment. Delta shown immediately: +12 structure, -3 depth.
22-skill graph Microskills tracked across all sessions with trend lines and spaced-repetition scheduling.
Cross-session AI memory The interviewer AI remembers your patterns across sessions β€” recurring weaknesses, over-used stories, communication habits.
Story bank Auto-detects STAR stories. Coverage map shows which competencies lack evidence. Overuse warning after 3 uses.
Negotiation simulator AI recruiter with hidden max budget. Scores anchoring, value articulation, counter-strategy, emotional control.
JD Analyzer Paste a job description β€” auto-fills session type, company context, and focus skills.
Voice delivery analysis Filler words, WPM, pause patterns β€” scored against benchmarks after each session.
BYOK Use your own Anthropic / OpenAI / Deepgram / ElevenLabs keys. Encrypted at rest.

Tech Stack

Layer Choice Why
Backend FastAPI + Python 3.13 Async-native, fully typed, fast iteration
Frontend Next.js 15 + Tailwind + Zustand App Router, edge CDN, zero-config deploys
Voice Deepgram Nova-2 + Claude Sonnet + ElevenLabs Best-in-class STT/LLM/TTS with fallback chain
Database PostgreSQL 16 + JSONB skill graph Flexible schema, prompt-cached rubric reads
Cache Redis 7 Session state + rate limiting + memory cache
AI scoring Anthropic Claude with prompt caching Rubric cached = ~90% cheaper on re-reads

Architecture

System Overview

graph TB
    subgraph Client["Browser"]
        UI["Next.js 15 App\nSSR + Zustand"]
    end

    subgraph Vercel["Vercel CDN"]
        FE["Next.js Frontend\nEdge CDN Β· Auto HTTPS"]
    end

    subgraph Fly["Fly.io β€” FastAPI (Python 3.13)"]
        API["REST API\n/api/v1/*"]
        WS["WebSocket\n/api/v1/sessions/{id}/ws"]
        subgraph Store["Data"]
            PG[("PostgreSQL 16\nskill graph Β· sessions\nstories Β· usage_logs")]
            RD[("Redis 7\nsession state Β· cache\nrate limiting")]
        end
    end

    subgraph Providers["AI Providers (BYOK-capable)"]
        DG["Deepgram Nova-2\nSTT Β· word timestamps"]
        AN["Anthropic Claude\nSonnet β†’ voice LLM\nHaiku β†’ scoring / memory"]
        EL["ElevenLabs\nTTS Β· mp3_44100_128\nfallback β†’ Deepgram TTS"]
    end

    UI --> FE
    FE -- "JWT Bearer" --> API
    UI -- "WebSocket\n?token=JWT" --> WS
    API --> PG & RD
    WS --> PG & RD
    WS --> DG & AN & EL
Loading

Why split Fly.io + Vercel? Vercel serverless cannot hold WebSocket connections open for 20–50 min voice sessions. Fly.io runs the stateful backend as a long-lived process; Vercel handles CDN-optimised static + SSR delivery.


Voice Pipeline

sequenceDiagram
    autonumber
    participant Mic as Microphone
    participant WS as WebSocket Server
    participant STT as Deepgram STT
    participant LLM as Claude Sonnet
    participant TTS as ElevenLabs / Deepgram TTS
    participant DB as PostgreSQL

    Mic->>WS: PCM audio chunks (streaming)
    WS->>STT: Forward audio stream
    STT-->>WS: Interim transcripts (low latency)
    STT-->>WS: Final transcript + word timestamps

    Note over WS: Adaptive debounce<br/>~4 s short answers Β· ~14 s long<br/>measured from last sound (not last word)<br/>[WAIT] token β†’ skip TTS, keep accumulating

    WS->>LLM: System prompt + cross-session memory + transcript
    LLM-->>WS: Streaming response chunks

    Note over WS: Barge-in detection<br/>threshold = 80 Β· 10 consecutive frames (~1 s)<br/>β†’ cancel TTS if user speaks

    WS->>TTS: Text chunks (streaming)

    Note over TTS: ElevenLabs primary<br/>401 / timeout β†’ auto-fallback to Deepgram TTS

    TTS-->>WS: Audio (mp3_44100_128)
    WS-->>Mic: Audio playback to user

    WS->>DB: Store segment (question Β· answer Β· evidence spans)
    WS->>DB: Trigger async scoring + skill graph update
Loading

The Deliberate-Practice Loop

flowchart LR
    A["πŸŽ™οΈ Voice Answer\nΒ± transcription"] --> B

    B["πŸ” Lint\n15-rule rubric\nevidence = start_ms/end_ms spans\nno hallucinated quotes"]

    B --> C["πŸ“ Diff\n3 rewrite versions\nminimal Β· medium Β· ideal\neach shows +rule β†’ +N pts"]

    C --> D["βͺ Rewind\nre-answer any weak segment\nsame question Β· fresh slate"]

    D --> E["πŸ“ˆ Delta Score\ninstant per-rule breakdown\n+12 structure Β· βˆ’3 depth"]

    E --> F["πŸ•ΈοΈ Skill Graph\n22 microskills\ntrend Β· spaced-repetition weight\ncross-session memory"]

    F --> G["🎯 Drill Plan\nweakest skills · longest gap\nadaptive scheduling"]

    G -.->|next session| A

    style A fill:#6366f1,color:#fff,stroke:#4f46e5
    style F fill:#7c3aed,color:#fff,stroke:#6d28d9
    style G fill:#4f46e5,color:#fff,stroke:#3730a3
Loading

Why this matters: Most AI interview tools are stateless β€” one session, one score, no memory. InterviewCraft accumulates a 22-microskill model of your weaknesses and schedules deliberate repetition on the exact skills that are failing. The loop above is what separates deliberate practice from mock interviews.


Quick Start

Prerequisites

  • Docker + Docker Compose
  • Python 3.13+, Node.js 20+
  • API keys: ANTHROPIC_API_KEY, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY

Run locally

git clone https://github.com/alexdoroshevich/interviewcraft.git
cd interviewcraft

cp .env.example .env
# Edit .env β€” add your API keys

# Start all services: postgres, redis, backend, frontend
docker compose up -d

# First run: apply DB migrations
cd backend && pip install -e ".[dev]" && alembic upgrade head

Seed demo data

cd backend && python ../scripts/seed_demo.py

Loads 10 pre-built sessions, a skill graph, story bank, and negotiation history. Demo login: demo@interviewcraft.dev / demo1234


Google Authentication

The "Continue with Google" button activates automatically once NEXT_PUBLIC_GOOGLE_CLIENT_ID is set:

  1. Google Cloud Console β†’ APIs & Services β†’ Credentials β†’ Create OAuth 2.0 Client ID (Web application)
  2. Authorized Origins: http://localhost:3000 + your production domain
  3. Authorized Redirect URI: http://localhost:8080/api/v1/auth/google/callback
  4. Add to .env:
    GOOGLE_CLIENT_ID=your-client-id
    GOOGLE_CLIENT_SECRET=your-client-secret
    NEXT_PUBLIC_GOOGLE_CLIENT_ID=your-client-id

Development

# Backend tests
cd backend && pytest -x -q

# Backend lint + type check
cd backend && ruff check . && mypy app/

# Frontend
cd frontend && npm run lint && npm run type-check && npm test

# E2E tests (requires running app)
cd frontend && npm run test:e2e

All PRs run these automatically via GitHub Actions. A PR cannot merge unless every gate passes.


Deployment

The backend deploys to Fly.io and the frontend to Vercel.

Frontend (Vercel): Import the repo, set Root Directory to frontend, add NEXT_PUBLIC_API_URL pointing to your Fly.io backend. Every push auto-deploys; every PR gets a preview URL.

Backend (Fly.io):

flyctl auth login
flyctl apps create <your-app-name>
flyctl postgres create --name <your-db> && flyctl postgres attach <your-db>
flyctl secrets set ANTHROPIC_API_KEY="..." DEEPGRAM_API_KEY="..." # see .env.example
flyctl deploy --config backend/fly.toml

After the initial deploy, GitHub Actions handles all subsequent deploys automatically on push to main.


Database Migrations

# Apply all pending migrations (local)
cd backend && alembic upgrade head

# Apply on Fly.io
flyctl ssh console --app <your-app-name>
cd /app && python -m alembic upgrade head

# Create a new migration after model changes
cd backend && alembic revision --autogenerate -m "description"

Daily Workflow

# 1. Create a feature branch from main
git checkout main && git pull
git checkout -b feature/my-feature

# 2. Make changes, commit
git commit -m "feat: my feature"
git push origin feature/my-feature

# 3. Open a PR β†’ CI runs automatically
#    All gates must pass before merging

# 4. Merge β†’ auto-deploys to production

Project Structure

interviewcraft/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/v1/          # Route handlers (auth, sessions, scoring, skills…)
β”‚   β”‚   β”œβ”€β”€ models/          # SQLAlchemy ORM models
β”‚   β”‚   β”œβ”€β”€ schemas/         # Pydantic request/response schemas
β”‚   β”‚   └── services/        # Voice pipeline, scoring engine, memory, auth
β”‚   └── tests/               # Unit + integration tests
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ app/                 # Next.js App Router pages
β”‚   β”œβ”€β”€ components/          # Shared React components
β”‚   └── lib/api.ts           # Typed API client
β”œβ”€β”€ docs/
β”‚   └── adr/                 # Architecture Decision Records
└── scripts/
    β”œβ”€β”€ seed_demo.py         # Demo data
    └── run_demo.sh          # One-command demo startup

Architecture Decisions

Key decisions are documented in docs/adr/:

ADR Decision
000 North Star specification
001 WebSocket over WebRTC for voice
002 Full tech stack rationale
003 Evidence spans + batched scoring
004 Text-only rewind in MVP
005 Provider ABC interfaces
006 Audio never stored, encrypted transcripts
007 SWE-only scope, extensible architecture
008 Four-technique variance reduction

Benchmarks

The benchmarks/ directory contains reproducible evaluations of the system's AI subsystems. All scripts require a valid Anthropic API key and run against the real model APIs. Dated output files are gitignored β€” only synthetic example.json baselines are committed.

Benchmark What it measures KPI
memory-recall Does the LLM accurately recall injected coaching context? β‰₯ 95% recall, 0% hallucination
scoring-quality Do automated scores correlate with human judgement? Pearson r β‰₯ 0.85, MAE ≀ 10
voice-latency STT β†’ LLM β†’ TTS latency (mock + production) E2E p95 < 1 000 ms
cost-profile Cost per session by quality profile and provider β€”

Run any benchmark with --confirm to execute live API calls (see each README for cost estimates). Use the mock scripts to explore latency characteristics without any API keys.


Contributing

See CONTRIBUTING.md.

License

MIT

About

Deliberate practice engine for tech interviews

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors