An AI-powered interview bot designed for conducting technical interviews for AI Engineering positions, with a focus on RAG (Retrieval Augmented Generation) roles. The bot evaluates candidates on Python experience, project examples, RAG understanding, evaluation metrics, and open-source contributions, providing feedback on performance and areas for improvement.
The bot comprises three main components:
- Speech-to-Text (STT): Utilizes Deepgram's Nova model for accurate transcription.
- Language Model (LLM): Employs Llama 3.2 on Ollama for generating contextual questions and feedback.
- Text-to-Speech (TTS): Uses Deepgram TTS for natural-sounding voice output.
- Deepgram API key (for STT and TTS)
- Daily.co API key (for video/audio communication)
- Ollama installed with Llama 3.2 model
Create a .env file with the following content:
DEEPGRAM_API_KEY=your_key_here
DAILY_API_KEY=your_key_here- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtStart the interview bot:
python interview_bot.pyThe bot will conduct a technical interview and provide feedback on the candidate's performance.
-
Daily Transport Layer: Manages audio/video communication with VAD (Voice Activity Detection) using Silero.
-
Speech Services:
- STT: Deepgram's Nova model for English (en-US).
- TTS: British male voice profile.
-
LLM Service: Llama 3.2 model for question generation and response processing.
The bot follows a structured interview format with:
- Interviewer persona: Allysa from Meta
- Fixed question set covering key topics
- Concludes with personalized feedback
- Environment variables for API keys
- Daily.co room configuration via
configure()function - Supports command-line arguments and environment variables
Silero VAD with configurable parameters:
- Stop threshold: 0.2 seconds
- Start threshold: 0.2 seconds
- Confidence threshold: 0.4
- WebRTC for video call interface, hosted through Daily.co
- Input: User's voice captured via WebRTC
- VAD: Silero VAD manages speech segmentation
- Processing Pipeline:
- User Speech → Deepgram STT → Text → Llama 3.2 → Generated Response → Deepgram TTS → Audio Output
This setup ensures a seamless, natural conversation flow, allowing the AI interviewer to engage in real-time dialogue and provide appropriate follow-up questions or feedback.