🌻 It's Just Notes - Voice Identity Transcriber

Intelligent offline voice transcription with automatic speaker identification

It's Just Notes is a powerful desktop application that transcribes audio recordings while automatically identifying different speakers. Perfect for meetings, interviews, lectures, podcasts, and any multi-speaker audio content.

✨ Key Features

🎤 Speaker Recognition

Register unlimited voice profiles
Automatic speaker identification during transcription
High-accuracy voice matching using SpeechBrain's ECAPA-VOXCELEB model
Visual audio segment selector for precise voice training

📝 Smart Transcription

Powered by OpenAI's Whisper AI model
Real-time microphone recording
Support for pre-recorded audio files (MP3, WAV, OGG, M4A, FLAC)
Automatic language detection
Speaker-labeled output format: [Speaker Name]: transcribed text

🔒 Privacy First

100% offline operation - no internet required after initial setup
All processing happens locally on your machine
Your voice data never leaves your computer
No cloud services, no data collection

🌍 Multi-Language Interface

English and Portuguese interfaces
Easy language switching
More languages can be added easily

💾 Voice Profile Management

Export voice profiles for backup or sharing
Import voice presets from collaborators
Persistent voice storage across sessions
Delete or update registered voices anytime

🚀 Quick Start

Prerequisites

Python 3.8 or higher
4GB+ RAM recommended
Microphone (for live recording)
Windows, macOS, or Linux

Installation

Clone the repository

git clone https://github.com/yourusername/itsjust-notes.git
cd itsjust-notes

Install required dependencies

pip install -r requirements.txt

Run the application

python main.py

First Launch

On first launch, the application will automatically download the required AI models (~150MB):

Whisper base model
SpeechBrain speaker recognition model

This is a one-time download. After that, the app works completely offline.

📦 Dependencies

customtkinter>=5.2.0
sounddevice>=0.4.6
numpy>=1.24.0
scipy>=1.10.0
openai-whisper>=20230314
torch>=2.0.0
torchaudio>=2.0.0
speechbrain>=0.5.14

Create a requirements.txt file with the above dependencies.

📖 How to Use

1️⃣ Register Speaker Voices

Before transcribing, register the voices of people who will be speaking:

Navigate to the "🎤 Manage Voices" tab
Enter the person's name
Choose one option:
- 🎤 Record Voice (5s): Record 5 seconds of the person speaking
- 📁 Upload Voice Audio: Select an existing audio file
  - Use the visual slider to select a 5-30 second segment
  - Choose clear audio where only that person is speaking

Tips for best results:

Use quiet environments
Avoid background music or noise
5-30 seconds of clear speech is optimal
Multiple speakers can be registered

2️⃣ Transcribe Audio

After registering voices:

Go to the "📝 Transcription" tab
Choose your input method:
- ▶ Start Microphone: Record live audio
- 📁 Upload Long File: Process pre-recorded audio files
The transcription will appear with speaker labels:

[John]: Hello everyone, welcome to today's meeting.
[Sarah]: Thanks John. Let's start with the quarterly review.
[Unknown]: Can everyone hear me?

3️⃣ Manage Voice Profiles

📤 Export: Save individual voice profiles as .pkl files
📥 Import Preset: Load voice profiles from files
🗑️ Delete: Remove voice profiles you no longer need

🎯 Use Cases

📊 Meeting Transcriptions - Automatically label who said what
🎓 Lecture Notes - Transcribe educational content
🎙️ Podcast Production - Generate speaker-labeled transcripts
📞 Interview Documentation - Keep track of multi-speaker conversations
🗣️ Accessibility - Create text versions of audio content
📚 Research - Transcribe focus groups and qualitative interviews

⚙️ Configuration

Audio Settings

Default configuration in the code:

Sample Rate: 16,000 Hz
Channels: Mono (1)
Speaker Match Threshold: 0.28 (cosine similarity)

Whisper Model

The app uses the base model by default. For better accuracy, you can modify line 150:

# For better accuracy (requires more RAM):
self.model = whisper.load_model("small")  # or "medium", "large"

Model comparison:

tiny - Fastest, least accurate (~1GB RAM)
base - Good balance (default, ~1GB RAM)
small - Better accuracy (~2GB RAM)
medium - High accuracy (~5GB RAM)
large - Best accuracy (~10GB RAM)

🗂️ File Structure

itsjust-notes/
├── main.py                      # Main application file
├── requirements.txt             # Python dependencies
├── vozes_cadastradas.pkl        # Saved voice profiles (auto-generated)
├── settings.json               # User settings (auto-generated)
├── README.md                   # This file
└── docs/                       # Documentation and screenshots
    └── screenshot.png

🛠️ Technical Architecture

Core Components

UI Layer (customtkinter)
- Modern, customizable interface
- Tabbed navigation
- Real-time progress indicators
Audio Processing (sounddevice, scipy, torchaudio)
- Microphone capture
- Audio file loading and resampling
- Audio segment extraction
Transcription Engine (whisper)
- Speech-to-text conversion
- Automatic language detection
- Segment-based processing
Speaker Recognition (speechbrain)
- Voice embedding generation
- Cosine similarity matching
- ECAPA-VOXCELEB pre-trained model

Data Flow

Audio Input → Whisper Transcription → Speaker Segmentation → 
Voice Embedding → Speaker Matching → Labeled Transcript Output

🤝 Contributing

Contributions are welcome! Here's how you can help:

Areas for Contribution

🌍 Translations: Add more language interfaces
🎨 UI/UX: Improve the user interface
🔧 Features: Add new functionality
🐛 Bug Fixes: Report and fix issues
📚 Documentation: Improve guides and examples
🧪 Testing: Test on different platforms

Development Setup

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Commit your changes: git commit -m 'Add amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

Code Style

Follow PEP 8 guidelines
Add comments for complex logic
Update documentation for new features
Test on multiple platforms when possible

🐛 Troubleshooting

Common Issues

Problem: Models not downloading

Solution: Check your internet connection on first launch. Models are downloaded once and cached.

Problem: Microphone not working

Solution: Check system audio permissions. The app needs microphone access.

Problem: Poor speaker recognition

Solution:
- Use longer voice samples (15-30 seconds)
- Ensure clean audio without background noise
- Re-register voices if accuracy is low

Problem: High RAM usage

Solution: Use the tiny or base Whisper model instead of larger variants

Problem: Application crashes on startup

Solution: Ensure all dependencies are installed correctly: pip install -r requirements.txt --upgrade

📊 Performance

System Requirements

Component	Minimum	Recommended
RAM	2GB	4GB+
CPU	Dual-core	Quad-core+
Storage	500MB	2GB
OS	Windows 10, macOS 10.14, Linux	Latest versions

Benchmarks

On a typical modern laptop:

Voice registration: ~2-5 seconds
1 minute audio transcription: ~15-30 seconds
Speaker identification: Real-time

🔐 Privacy & Security

✅ No telemetry or tracking
✅ No data sent to external servers
✅ All processing is local
✅ Voice profiles stored locally only
✅ Open source - audit the code yourself

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

Copyright (c) 2024 It's Just Notes Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

🙏 Acknowledgments

This project is built with amazing open-source technologies:

OpenAI Whisper - Speech recognition model
SpeechBrain - Speaker recognition toolkit
CustomTkinter - Modern UI framework
PyTorch - Deep learning framework

Special thanks to all contributors and the open-source community!

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: support@itsjustnotes.com

🗺️ Roadmap

Planned Features

Real-time transcription display
Export transcripts to multiple formats (PDF, DOCX, SRT)
Diarization improvements
Custom model training
Batch processing mode
API for integration with other tools
Mobile version (iOS/Android)
Cloud sync option (optional, privacy-preserving)

⭐ Star History

If you find this project useful, please consider giving it a star! ⭐

Made with 💙 by It's Just

Website • Documentation • Community

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
settings.json		settings.json
vozes_cadastradas.pkl		vozes_cadastradas.pkl

Folders and files

Latest commit

History

Repository files navigation

🌻 It's Just Notes - Voice Identity Transcriber

✨ Key Features

🎤 Speaker Recognition

📝 Smart Transcription

🔒 Privacy First

🌍 Multi-Language Interface

💾 Voice Profile Management

🚀 Quick Start

Prerequisites

Installation

First Launch

📦 Dependencies

📖 How to Use

1️⃣ Register Speaker Voices

2️⃣ Transcribe Audio

3️⃣ Manage Voice Profiles

🎯 Use Cases

⚙️ Configuration

Audio Settings

Whisper Model

🗂️ File Structure

🛠️ Technical Architecture

Core Components

Data Flow

🤝 Contributing

Areas for Contribution

Development Setup

Code Style

🐛 Troubleshooting

Common Issues

📊 Performance

System Requirements

Benchmarks

🔐 Privacy & Security

📄 License

🙏 Acknowledgments

📞 Support

🗺️ Roadmap

Planned Features

⭐ Star History

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages