A native Python desktop application that simulates realistic job interviews using AI.
- AI-Powered Interviews: Claude LLM drives realistic interview conversations with dynamic follow-up questions
- Human-like Voice: ElevenLabs TTS with emotion modulation for natural-sounding speech
- Real-time Voice Analysis: Pitch, intensity, tempo analysis based on Juslin & Laukka (2003) framework
- Visual Analysis: MediaPipe-based face/pose detection (100% local, free)
- Eye contact tracking
- Blink detection
- Posture analysis
- Emotion indicators from facial expressions
- Intelligent Interjections: AI interjects when responses go off-topic, have long pauses, or are unclear
- Post-Interview Feedback: Comprehensive grading and actionable improvement tips
┌─────────────────────────────────────────────────────────────┐
│ UI Layer (PyQt6) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Camera │ │ Transcript │ │ Modulation Viz │ │
│ │ Widget │ │ Widget │ │ Widget │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ Core Engine │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ Session │ │ Interjection │ │ Grading │ │
│ │ Manager │ │ Engine │ │ Engine │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────────┬─────────┴─────────┬───────────────────┐
│ Audio Pipeline │ AI Integration │ Vision Module │
│ ┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────┐ │
│ │ Recorder │ │ │ Claude │ │ │ Face │ │
│ │ Analyzer │ │ │ Client │ │ │ Analyzer │ │
│ │ Transcriber │ │ │ ElevenLabs │ │ │ Gaze │ │
│ │ Player │ │ │ TTS │ │ │ Tracker │ │
│ └─────────────┘ │ └─────────────┘ │ └─────────────┘ │
└───────────────────┴───────────────────┴───────────────────┘
- Python 3.11+
- Microphone
- Camera (optional, for visual analysis)
- Anthropic API key (Claude)
- ElevenLabs API key
- Clone the repository:
git clone https://github.com/yourusername/AudioInterviewer.git
cd AudioInterviewer- Create virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set API keys:
export ANTHROPIC_API_KEY=your_claude_api_key
export ELEVENLABS_API_KEY=your_elevenlabs_api_keyOr create a .env file:
cp .env.example .env
# Edit .env with your API keysRun the application with the graphical interface:
python -m src.mainFor testing without the GUI:
python -m src.main --cli--cli Run in CLI mode (no GUI)
--debug Enable debug logging
--config PATH Path to custom config file
--help Show help message
Edit config/settings.yaml to customize:
interview:
default_duration_minutes: 30
question_count: 5interjection:
off_topic_threshold: 0.5
pause_duration_threshold: 3.0
clarity_threshold: 0.6
cooldown_seconds: 10.0grading:
weights:
confidence: 0.2
clarity: 0.2
content: 0.3
engagement: 0.15
eye_contact: 0.15audio:
sample_rate: 16000
channels: 1
buffer_size: 1024voice_analysis:
pitch_range_hz: [75.0, 500.0]
silence_threshold_db: -40.0AudioInterviewer/
├── config/ # Configuration files
│ ├── settings.yaml # Main settings
│ ├── prompts.yaml # Interview prompts
│ └── emotions.yaml # Emotion thresholds
├── data/ # Data storage
│ ├── sessions/ # Session recordings
│ ├── database/ # SQLite database
│ └── exports/ # Exported reports
├── src/
│ ├── ai/ # AI integration
│ │ ├── claude_client.py
│ │ ├── elevenlabs_client.py
│ │ ├── context.py
│ │ └── evaluator.py
│ ├── audio/ # Audio processing
│ │ ├── recorder.py
│ │ ├── analyzer.py
│ │ ├── transcriber.py
│ │ ├── player.py
│ │ └── vad.py
│ ├── core/ # Core engine
│ │ ├── session_manager.py
│ │ ├── interjection.py
│ │ ├── grading.py
│ │ ├── feedback.py
│ │ ├── events.py
│ │ └── models.py
│ ├── data/ # Data layer
│ │ ├── database.py
│ │ └── models.py
│ ├── ui/ # User interface
│ │ ├── main_window.py
│ │ ├── camera_widget.py
│ │ ├── transcript_widget.py
│ │ ├── modulation_widget.py
│ │ ├── feedback_widget.py
│ │ ├── control_panel.py
│ │ ├── settings_dialog.py
│ │ └── styles.py
│ ├── utils/ # Utilities
│ │ ├── config.py
│ │ ├── constants.py
│ │ └── logger.py
│ ├── vision/ # Visual analysis
│ │ ├── camera.py
│ │ ├── emotion_detector.py
│ │ ├── face_analyzer.py
│ │ ├── gaze_tracker.py
│ │ ├── posture_analyzer.py
│ │ └── models.py
│ └── main.py # Entry point
├── tests/ # Test suite
│ ├── test_audio/
│ ├── test_core/
│ ├── test_vision/
│ ├── integration/
│ └── conftest.py
├── scripts/ # Helper scripts
│ ├── build.sh # Build script
│ └── setup_dev.sh # Dev setup
├── plans/ # Architecture docs
├── pyproject.toml # Project metadata
├── requirements.txt # Dependencies
├── .env.example # Environment template
└── README.md # This file
Run tests with pytest:
# Run all tests
pytest
# Run with coverage
pytest --cov=src
# Run specific test module
pytest tests/test_audio/
# Run integration tests
pytest tests/integration/ -m integrationBuild a standalone executable:
./scripts/build.shThe executable will be in dist/AIInterviewer.
./scripts/setup_dev.shThis will:
- Create a virtual environment
- Install dependencies
- Install development tools (pytest, black, ruff, mypy)
- Create
.envfrom template - Create necessary data directories
- Format with Black:
black src tests - Lint with Ruff:
ruff check src tests - Type check with mypy:
mypy src
Get your API key from Anthropic Console
Get your API key from ElevenLabs
- Voice analysis based on Juslin & Laukka (2003) - "Communication of emotions in vocal expression and music performance"
- Facial analysis using MediaPipe
- LLM powered by Claude
- TTS powered by ElevenLabs
MIT License - See LICENSE for details.
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Commit changes:
git commit -am 'Add my feature' - Push to branch:
git push origin feature/my-feature - Submit a Pull Request