TalkitoutAI is a full-stack practice platform for resume review and mock interview coaching. The repository combines:
- A
FastAPIbackend for resume parsing, question generation, speech feedback, video feedback, and real-time audio/video session endpoints. - A
React + TypeScript + Vitefrontend for the landing page, resume upload flow, and live mock interview experience.
The product currently supports two main workflows:
- Resume analysis: upload a PDF or DOCX resume and extract structured profile data.
- Mock interview or pitch practice: generate questions, capture audio/video, transcribe speech live, and compute post-session feedback.
interview_ai/
|-- backend/
| |-- app/
| | |-- api/ # FastAPI routers by feature
| | |-- analysis/ # Speech/video scoring logic
| | |-- parsers/ # Resume and video helpers
| | |-- questions/ # Question request models + generator
| | |-- utils/ # Upload/file validation helpers
| | |-- config.py # Environment-driven settings
| | |-- main.py # FastAPI app factory + router registration
| | |-- models.py # Resume response models
| | |-- realtime_state.py # Shared WebRTC / WebSocket state
| | |-- speech_models.py # Speech feedback request/response models
| | `-- video_models.py # Video feedback request/response models
| |-- requirements.txt
| `-- README.md
|-- frontend/
| |-- public/
| |-- src/
| | |-- components/
| | | |-- Interview/ # Recording, transcription, questions, feedback UI
| | | `-- parsers/ # Resume upload + analysis UI
| | |-- hooks/ # Session type + feedback request hooks
| | |-- pages/ # App routes
| | |-- App.tsx # Router setup
| | `-- main.tsx # Vite entrypoint
| |-- .env.example
| |-- package.json
| `-- README.md
`-- README.md
- agents.md is the root guidance file for repository-aware coding agents.
.codex/config.tomlstores project-specific navigation and workflow hints..codex/tree.tomlstores a curated source tree and key cross-app relationships.
If you want to finalize a change set with git, use the standard non-interactive flow:
git add .
git commit -m "Describe the change"
git pushReview git status before committing so unrelated work is not included by accident.
- Accepts
.pdfand.docxresume uploads. - Extracts raw text from the document.
- Detects major sections such as
skills,experience, andeducation. - Uses heuristics plus
spaCyNER to pull fields like name, email, phone, skills, education, and experience. - Returns a normalized JSON response that the frontend renders in the analysis screen.
- Lets the user choose interview or pitch mode.
- Generates question sets from role, company, and call type input.
- Uses WebSocket audio streaming for live transcription.
- Uses WebRTC to create a live browser-to-backend media session.
- Captures client-side video frames during the session.
- Sends transcript text to the backend for speech scoring.
- Sends collected video frames to the backend for video-presence scoring.
- If
GEMINI_API_KEYis configured, the backend attempts Gemini-based question generation. - If Gemini is unavailable, the backend falls back to local template-based question banks for:
- interviews
- sales calls
- presentations
- React 19
- TypeScript
- Vite
- React Router
- Tailwind CSS 4
- FastAPI
- Pydantic / pydantic-settings
- Uvicorn
- pdfplumber
- python-docx
- spaCy
- pyresparser
The backend code also imports libraries used by the real-time interview flow:
aiortcwhisperlivekitfaster-whispernumpy
These are used by the live audio/video endpoints and are now pinned in backend/requirements.txt.
src/App.tsxdefines the public routes.Homerenders the marketing landing page.GetStarteddrives the resume upload flow.InterviewTypelets the user pickintervieworpitch.MockInterviewcoordinates:- question generation
- audio streaming
- WebRTC session setup
- transcript accumulation
- speech/video feedback requests
app/main.pycreates the FastAPI application and registers feature routers.ResumeParserhandles PDF/DOCX extraction and resume field parsing.generate_questions()creates question sets from local templates or Gemini.generate_feedback()scores transcript quality using rule-based features and a linear model.generate_video_feedback()scores visual presence from frame metadata and a linear model.
The app currently exposes these routes:
/- landing page/get-started- choose resume or interview path/interview-type- choose interview vs pitch session/mock-interview- live session page/settings- theme and appearance settings/auth- Supabase email/password and Google sign-in/account- saved per-user interview history/user- logged-in session history and feedback review page
GET /- Returns a simple service status payload.
GET /health- Returns
{ "status": "healthy" }.
- Returns
-
POST /parse-resume/- Multipart form request with:
file: uploaded resumefilePath: extra form field currently sent by the frontend
- Returns
ParseResponse.
- Multipart form request with:
-
POST /parse-resumes-batch/- Multipart batch upload for multiple files.
- Returns a list of
ParseResponseitems.
POST /questions/generate- JSON body:
{
"role": "Backend Engineer",
"company": "Acme",
"call_type": "interview",
"num_questions": 10
}- Returns categorized questions plus warnings and input metadata.
POST /speech/feedback- Accepts either:
textwordswith timestampssegmentswith timestamps
- Returns:
- score
- derived metrics
- feedback messages
- warnings
- Accepts either:
POST /video/feedback- Accepts an array of frame measurements:
- timestamp
- face presence
- gaze / camera-looking flag
- smile probability
- head yaw / pitch
- Returns score, metrics, feedback, and warnings.
- Accepts an array of frame measurements:
-
POST /webrtc/offer- Accepts a WebRTC SDP offer and returns an answer plus a generated
session_id.
- Accepts a WebRTC SDP offer and returns an answer plus a generated
-
WS /asr- Accepts streamed audio bytes and emits ASR messages.
-
WS /ws/results/{session_id}- Session-specific results socket.
- Current code only allows the origin
http://localhost:3000.
The backend parser in backend/app/parsers/resume_parser.py works like this:
- Extracts document text from PDF or DOCX.
- Detects sections using configurable section headings.
- Parses skills, education, and experience from those sections when possible.
- Falls back to regex and heuristic extraction for common fields.
nameemailphoneskillseducationexperience
The parser supports section-heading overrides through environment variables:
RESUME_SECTION_HEADINGS_JSONRESUME_SECTION_HEADINGS_PATHRESUME_SECTION_HEADINGS_MODE
Use this when your resumes follow custom section names not covered by the built-in defaults.
backend/app/questions/question_generator.py supports:
- role-aware question templates
- interview, sales, and presentation call types
- optional company-specific question injection
- optional Gemini-backed generation
Role detection is heuristic and maps text such as backend, frontend, ml, data, and devops into internal categories.
If Gemini is not configured, the backend adds a warning and uses local templates instead.
Speech scoring is implemented in backend/app/analysis/speech_feedback.py.
It computes features such as:
- filler word rate
- vocabulary variety
- sentence length average and variance
- pause frequency
- long-pause ratio
- speaking rate (words per minute)
- repetition rate
Those features are combined through a lightweight linear model and converted to a 0-100 score.
The backend also returns human-readable coaching suggestions such as:
- reduce filler words
- slow down or speed up
- vary vocabulary
- shorten long sentences
Video scoring is implemented in backend/app/analysis/video_feedback.py.
It computes:
- face presence rate
- gaze-at-camera rate
- smile rate
- average smile probability
- head movement variability
- long gaze break rate
Those signals are also passed through a simple linear model to produce a score and qualitative feedback.
- Node.js 18+
- npm
- Python 3.10+
- pip
Recommended for the live interview flow:
- a working webcam and microphone
- browser permission for media capture
- any system dependencies required by your local
whisperlivekit/ media stack
From the repository root:
cd backend
python -m venv .venvActivate the virtual environment:
# Windows PowerShell
.venv\Scripts\Activate.ps1Install the pinned backend packages:
pip install -r requirements.txtInstall the spaCy English model:
python -m spacy download en_core_web_smRun the API:
uvicorn app.main:app --reload --host 127.0.0.1 --port 8000In a second terminal:
cd frontend
npm install
npm run devThe Vite dev server runs on http://localhost:3000.
Create frontend/.env from frontend/.env.example:
VITE_API_BASE=http://127.0.0.1:8000
VITE_WS_BASE=ws://127.0.0.1:8000
VITE_SUPABASE_URL=https://your-project-id.supabase.co
VITE_SUPABASE_PUBLISHABLE_DEFAULT_KEY=your-supabase-publishable-keyTo enable per-user persistence:
- Create a Supabase project.
- Copy the project URL and publishable key into
frontend/.env. - Run the SQL in
supabase/schema.sqlinside the Supabase SQL editor. - If you want Google login, enable the Google provider in Supabase Auth and add your frontend origin as an allowed redirect URL.
- Restart the frontend dev server.
The repository includes two GitHub Actions workflows under .github/workflows:
ci.ymlruns on pushes tomainand on pull requests. It:- installs frontend dependencies with
npm ci - runs
npm run lint - runs
npm run build - installs backend dependencies from
backend/requirements.txt - compiles
backend/appwithpython -m compileall - smoke-tests the FastAPI app import with
from app.main import app
- installs frontend dependencies with
dependency-review.ymlruns on pull requests and uses GitHub's dependency review action to flag risky dependency changes before merge.
The CI workflow currently targets:
- Node.js
20 - Python
3.10
The backend reads .env values through pydantic-settings. The main optional settings are:
GEMINI_API_KEY=your_key_here
GEMINI_MODEL=gemini-1.5-flash
GEMINI_API_URL=https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent
CORS_ALLOW_ORIGINS=["http://localhost:3000"]
WS_ALLOWED_ORIGINS=["http://localhost:3000"]Optional resume parsing overrides:
RESUME_SECTION_HEADINGS_MODE=merge
RESUME_SECTION_HEADINGS_JSON={"skills":["skills","tools"],"experience":["experience","projects"]}- Open
http://localhost:3000. - Click
Get Started. - Choose the resume path.
- Upload a PDF or DOCX file.
- The frontend posts the file to
POST /parse-resume/. - The analysis page renders the extracted JSON fields.
- Open
http://localhost:3000/interview-type. - Choose
IntervieworPitch. - Generate a question set from role/company/call type input.
- Start the session in audio, video, or both mode.
- Speak into the microphone; audio is streamed to
WS /asr. - When the session ends, the frontend sends:
- transcript text to
POST /speech/feedback - collected vision frames to
POST /video/feedback
- transcript text to
- The feedback panel displays scores, metrics, and improvement notes.
curl -X POST http://localhost:8000/questions/generate \
-H "Content-Type: application/json" \
-d "{\"role\":\"Frontend Engineer\",\"company\":\"Acme\",\"call_type\":\"interview\",\"num_questions\":5}"curl -X POST http://localhost:8000/speech/feedback \
-H "Content-Type: application/json" \
-d "{\"text\":\"I led the migration, reduced latency, and improved reliability.\"}"- The frontend now routes backend calls through
VITE_API_BASEandVITE_WS_BASE. - Supabase auth and session persistence are frontend-driven. The browser signs users in with Supabase Auth and writes session history directly to Postgres using row-level security.
- Google OAuth now uses the same Supabase auth flow as email/password sign-in.
- The SQL schema and RLS policies for session storage live in
supabase/schema.sql. - Per-question review data now also persists in
public.interview_session_answers, including answer timing and final transcript segments for each saved question. - Session history lists now read from a lightweight
interview_session_summariesview, while full transcript/feedback payloads are fetched only when a user opens a specific session. - The frontend route tree is lazy-loaded, so large interview and account screens are split into separate chunks instead of inflating the initial bundle.
frontend/vite.config.tsdefines an/apiproxy, but the current frontend code calls the backend through explicit absolute URLs, so that proxy is not currently used.- The backend CORS list and WebSocket allowed origins are configured through
backend/app/config.pyand can be overridden with environment variables. - The
run_video_pipeline()function currently only sends a basic status message. Most video scoring in practice comes from client-collected frames posted later to/video/feedback. - Client-side video feedback now sends metric-shaped frames that match the backend schema instead of raw base64 image data.
backend/app/parsers/resume_parser.pywill attempt to download the spaCy model automatically if it is missing, but installing it manually is more predictable for local setup and CI.
frontend/src/App.tsx- route registrationfrontend/src/pages/GetStarted.tsx- resume upload flowfrontend/src/pages/MockInterview.tsx- interview orchestrationfrontend/src/components/Interview/WebRTCRecorder.tsx- browser media capture + WebRTC signalingfrontend/src/components/Interview/useWhisper.tsx- ASR WebSocket clientfrontend/src/hooks/useFeedbackRequests.ts- post-session feedback requests
backend/app/main.py- app factory and router registrationbackend/app/api/- feature routers for core, resumes, questions, feedback, and realtime endpointsbackend/app/parsers/resume_parser.py- resume parsing logicbackend/app/questions/question_generator.py- question generationbackend/app/analysis/speech_feedback.py- transcript scoringbackend/app/analysis/video_feedback.py- video scoringbackend/app/config.py- environment settings
This codebase is already useful for local experimentation and demos, but it still has some rough edges:
- some pages are placeholders
- browser-dependent vision sampling still needs stronger cross-browser support
- video streaming is only partially implemented server-side
For local development, the project is best treated as a working prototype with a clear frontend/backend split and reasonably self-contained feature modules.