This is a full-stack web application designed to help users practice for technical interviews. It leverages a suite of modern AI models to create a realistic, interactive, and session-based interview experience, providing detailed, multi-metric feedback on user performance.
- AI-Generated Questions: Utilizes the Gemini API (
gemini-1.5-flash-latest) and also (gemini-2.5-flash-latest) to generate a unique and varied set of interview questions on topics like Data Structures, Databases, and OOPs. - Text-to-Speech (TTS): Reads questions aloud using Coqui's high-quality
xtts_v2model, with multiple voice options for the user to choose from. - Speech-to-Text (STT): Transcribes the user's spoken answers in real-time using the
faster-whispermodel. - Session-Based Flow: Supports multi-question interview sessions where users can answer, re-record if unsatisfied, and continue until they choose to end the session.
- Detailed Performance Analysis: After the session, the user's answers are evaluated by the Gemini API against multiple performance metrics: Problem Understanding, Vocabulary, Analytical Thinking, Professionalism, and Correctness, each with its own score.
- Interactive UI: A clean, modern, and responsive user interface built with Next.js and React, featuring an animated waveform visualizer for audio recording and playback.
- Framework: FastAPI
- AI Models:
- Google Gemini (
gemini-1.5-flash-latest) for question generation & evaluation. - Coqui TTS (
xtts_v2) for voice generation. faster-whisperfor speech-to-text transcription.
- Google Gemini (
- Async:
asyncio,run_in_threadpoolfor non-blocking AI model execution.
- Framework: Next.js & React
- State Management: Zustand
- Styling: Custom CSS
- Audio Visualization: Wavesurfer.js, react-mic
- Deployment: (Ready for Vercel/Netlify)
Follow these steps to set up and run the project locally.
- Git: Download Git
- Python: Version 3.10 or 3.11 recommended. Download Python
- Node.js: Version 18.x or later. Download Node.js
git clone https://github.com/kanishksamuraig/Interview_Bot.git
cd Interview_Bot-
Navigate to the backend directory:
cd backend -
Create a Python virtual environment:
- Windows (PowerShell):
python -m venv venv - Linux / macOS:
python3 -m venv venv
- Windows (PowerShell):
-
Activate the virtual environment:
- Windows (PowerShell):
.\venv\Scripts\activate
- Linux / macOS:
source venv/bin/activate
- Windows (PowerShell):
-
Install Python dependencies:
pip install -r requirements.txt
-
Navigate to the frontend directory:
cd ../frontend -
Install Node.js dependencies:
npm install
You need to create .env files to store your secret API keys.
- Backend: In the
backend/directory, create a file named.envand add your Gemini API key:# backend/.env GEMINI_API_KEY="YOUR_GEMINI_API_KEY_HERE" - Frontend: The frontend does not require any environment variables for the current setup.
You must have two separate terminals open to run both the backend and frontend servers simultaneously.
- Navigate to the
backend/directory. - Make sure your virtual environment is activated.
- Run the following command:
The backend will be running at
uvicorn app:app --reload --reload-dir . --reload-exclude venvhttp://localhost:8000.
- Navigate to the
frontend/directory in a new terminal. - Run the following command:
The frontend will be running at
npm run dev
http://localhost:3000.
You can now open your browser and navigate to http://localhost:3000 to use the application.
- The user selects an interview topic and a preferred voice on the homepage.
- An interview session begins, and the backend calls the Gemini API to generate the first question.
- The question text is converted to speech by the Coqui TTS model and played back to the user with a waveform visualizer.
- A timer starts, and the user records their answer, with a microphone visualizer providing real-time feedback.
- The user's audio is transcribed by the Whisper model. The user can review the transcribed text and choose to either re-record or accept the answer and continue to the next question.
- This cycle repeats until the user ends the session.
- On the results page, all recorded answers are sent to the Gemini API for a detailed, multi-metric evaluation, which is then displayed in a comprehensive table.