Intelligent offline voice transcription with automatic speaker identification
It's Just Notes is a powerful desktop application that transcribes audio recordings while automatically identifying different speakers. Perfect for meetings, interviews, lectures, podcasts, and any multi-speaker audio content.
- Register unlimited voice profiles
- Automatic speaker identification during transcription
- High-accuracy voice matching using SpeechBrain's ECAPA-VOXCELEB model
- Visual audio segment selector for precise voice training
- Powered by OpenAI's Whisper AI model
- Real-time microphone recording
- Support for pre-recorded audio files (MP3, WAV, OGG, M4A, FLAC)
- Automatic language detection
- Speaker-labeled output format:
[Speaker Name]: transcribed text
- 100% offline operation - no internet required after initial setup
- All processing happens locally on your machine
- Your voice data never leaves your computer
- No cloud services, no data collection
- English and Portuguese interfaces
- Easy language switching
- More languages can be added easily
- Export voice profiles for backup or sharing
- Import voice presets from collaborators
- Persistent voice storage across sessions
- Delete or update registered voices anytime
- Python 3.8 or higher
- 4GB+ RAM recommended
- Microphone (for live recording)
- Windows, macOS, or Linux
- Clone the repository
git clone https://github.com/yourusername/itsjust-notes.git
cd itsjust-notes- Install required dependencies
pip install -r requirements.txt- Run the application
python main.pyOn first launch, the application will automatically download the required AI models (~150MB):
- Whisper base model
- SpeechBrain speaker recognition model
This is a one-time download. After that, the app works completely offline.
customtkinter>=5.2.0
sounddevice>=0.4.6
numpy>=1.24.0
scipy>=1.10.0
openai-whisper>=20230314
torch>=2.0.0
torchaudio>=2.0.0
speechbrain>=0.5.14
Create a requirements.txt file with the above dependencies.
Before transcribing, register the voices of people who will be speaking:
- Navigate to the "π€ Manage Voices" tab
- Enter the person's name
- Choose one option:
- π€ Record Voice (5s): Record 5 seconds of the person speaking
- π Upload Voice Audio: Select an existing audio file
- Use the visual slider to select a 5-30 second segment
- Choose clear audio where only that person is speaking
Tips for best results:
- Use quiet environments
- Avoid background music or noise
- 5-30 seconds of clear speech is optimal
- Multiple speakers can be registered
After registering voices:
-
Go to the "π Transcription" tab
-
Choose your input method:
- βΆ Start Microphone: Record live audio
- π Upload Long File: Process pre-recorded audio files
-
The transcription will appear with speaker labels:
[John]: Hello everyone, welcome to today's meeting.
[Sarah]: Thanks John. Let's start with the quarterly review.
[Unknown]: Can everyone hear me?
- π€ Export: Save individual voice profiles as
.pklfiles - π₯ Import Preset: Load voice profiles from files
- ποΈ Delete: Remove voice profiles you no longer need
- π Meeting Transcriptions - Automatically label who said what
- π Lecture Notes - Transcribe educational content
- ποΈ Podcast Production - Generate speaker-labeled transcripts
- π Interview Documentation - Keep track of multi-speaker conversations
- π£οΈ Accessibility - Create text versions of audio content
- π Research - Transcribe focus groups and qualitative interviews
Default configuration in the code:
- Sample Rate: 16,000 Hz
- Channels: Mono (1)
- Speaker Match Threshold: 0.28 (cosine similarity)
The app uses the base model by default. For better accuracy, you can modify line 150:
# For better accuracy (requires more RAM):
self.model = whisper.load_model("small") # or "medium", "large"Model comparison:
tiny- Fastest, least accurate (~1GB RAM)base- Good balance (default, ~1GB RAM)small- Better accuracy (~2GB RAM)medium- High accuracy (~5GB RAM)large- Best accuracy (~10GB RAM)
itsjust-notes/
βββ main.py # Main application file
βββ requirements.txt # Python dependencies
βββ vozes_cadastradas.pkl # Saved voice profiles (auto-generated)
βββ settings.json # User settings (auto-generated)
βββ README.md # This file
βββ docs/ # Documentation and screenshots
βββ screenshot.png
-
UI Layer (
customtkinter)- Modern, customizable interface
- Tabbed navigation
- Real-time progress indicators
-
Audio Processing (
sounddevice,scipy,torchaudio)- Microphone capture
- Audio file loading and resampling
- Audio segment extraction
-
Transcription Engine (
whisper)- Speech-to-text conversion
- Automatic language detection
- Segment-based processing
-
Speaker Recognition (
speechbrain)- Voice embedding generation
- Cosine similarity matching
- ECAPA-VOXCELEB pre-trained model
Audio Input β Whisper Transcription β Speaker Segmentation β
Voice Embedding β Speaker Matching β Labeled Transcript Output
Contributions are welcome! Here's how you can help:
- π Translations: Add more language interfaces
- π¨ UI/UX: Improve the user interface
- π§ Features: Add new functionality
- π Bug Fixes: Report and fix issues
- π Documentation: Improve guides and examples
- π§ͺ Testing: Test on different platforms
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
- Follow PEP 8 guidelines
- Add comments for complex logic
- Update documentation for new features
- Test on multiple platforms when possible
Problem: Models not downloading
- Solution: Check your internet connection on first launch. Models are downloaded once and cached.
Problem: Microphone not working
- Solution: Check system audio permissions. The app needs microphone access.
Problem: Poor speaker recognition
- Solution:
- Use longer voice samples (15-30 seconds)
- Ensure clean audio without background noise
- Re-register voices if accuracy is low
Problem: High RAM usage
- Solution: Use the
tinyorbaseWhisper model instead of larger variants
Problem: Application crashes on startup
- Solution: Ensure all dependencies are installed correctly:
pip install -r requirements.txt --upgrade
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 2GB | 4GB+ |
| CPU | Dual-core | Quad-core+ |
| Storage | 500MB | 2GB |
| OS | Windows 10, macOS 10.14, Linux | Latest versions |
On a typical modern laptop:
- Voice registration: ~2-5 seconds
- 1 minute audio transcription: ~15-30 seconds
- Speaker identification: Real-time
- β No telemetry or tracking
- β No data sent to external servers
- β All processing is local
- β Voice profiles stored locally only
- β Open source - audit the code yourself
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2024 It's Just Notes Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
This project is built with amazing open-source technologies:
- OpenAI Whisper - Speech recognition model
- SpeechBrain - Speaker recognition toolkit
- CustomTkinter - Modern UI framework
- PyTorch - Deep learning framework
Special thanks to all contributors and the open-source community!
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: support@itsjustnotes.com
- Real-time transcription display
- Export transcripts to multiple formats (PDF, DOCX, SRT)
- Diarization improvements
- Custom model training
- Batch processing mode
- API for integration with other tools
- Mobile version (iOS/Android)
- Cloud sync option (optional, privacy-preserving)
If you find this project useful, please consider giving it a star! β
Made with π by It's Just
Website β’ Documentation β’ Community
