Skip to content

ItsJust-Code/ItsJustNotes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌻 It's Just Notes - Voice Identity Transcriber

Python License Offline

Intelligent offline voice transcription with automatic speaker identification

It's Just Notes is a powerful desktop application that transcribes audio recordings while automatically identifying different speakers. Perfect for meetings, interviews, lectures, podcasts, and any multi-speaker audio content.

Screenshot Placeholder


✨ Key Features

🎀 Speaker Recognition

  • Register unlimited voice profiles
  • Automatic speaker identification during transcription
  • High-accuracy voice matching using SpeechBrain's ECAPA-VOXCELEB model
  • Visual audio segment selector for precise voice training

πŸ“ Smart Transcription

  • Powered by OpenAI's Whisper AI model
  • Real-time microphone recording
  • Support for pre-recorded audio files (MP3, WAV, OGG, M4A, FLAC)
  • Automatic language detection
  • Speaker-labeled output format: [Speaker Name]: transcribed text

πŸ”’ Privacy First

  • 100% offline operation - no internet required after initial setup
  • All processing happens locally on your machine
  • Your voice data never leaves your computer
  • No cloud services, no data collection

🌍 Multi-Language Interface

  • English and Portuguese interfaces
  • Easy language switching
  • More languages can be added easily

πŸ’Ύ Voice Profile Management

  • Export voice profiles for backup or sharing
  • Import voice presets from collaborators
  • Persistent voice storage across sessions
  • Delete or update registered voices anytime

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • 4GB+ RAM recommended
  • Microphone (for live recording)
  • Windows, macOS, or Linux

Installation

  1. Clone the repository
git clone https://github.com/yourusername/itsjust-notes.git
cd itsjust-notes
  1. Install required dependencies
pip install -r requirements.txt
  1. Run the application
python main.py

First Launch

On first launch, the application will automatically download the required AI models (~150MB):

  • Whisper base model
  • SpeechBrain speaker recognition model

This is a one-time download. After that, the app works completely offline.


πŸ“¦ Dependencies

customtkinter>=5.2.0
sounddevice>=0.4.6
numpy>=1.24.0
scipy>=1.10.0
openai-whisper>=20230314
torch>=2.0.0
torchaudio>=2.0.0
speechbrain>=0.5.14

Create a requirements.txt file with the above dependencies.


πŸ“– How to Use

1️⃣ Register Speaker Voices

Before transcribing, register the voices of people who will be speaking:

  1. Navigate to the "🎀 Manage Voices" tab
  2. Enter the person's name
  3. Choose one option:
    • 🎀 Record Voice (5s): Record 5 seconds of the person speaking
    • πŸ“ Upload Voice Audio: Select an existing audio file
      • Use the visual slider to select a 5-30 second segment
      • Choose clear audio where only that person is speaking

Tips for best results:

  • Use quiet environments
  • Avoid background music or noise
  • 5-30 seconds of clear speech is optimal
  • Multiple speakers can be registered

2️⃣ Transcribe Audio

After registering voices:

  1. Go to the "πŸ“ Transcription" tab

  2. Choose your input method:

    • β–Ά Start Microphone: Record live audio
    • πŸ“ Upload Long File: Process pre-recorded audio files
  3. The transcription will appear with speaker labels:

[John]: Hello everyone, welcome to today's meeting.
[Sarah]: Thanks John. Let's start with the quarterly review.
[Unknown]: Can everyone hear me?

3️⃣ Manage Voice Profiles

  • πŸ“€ Export: Save individual voice profiles as .pkl files
  • πŸ“₯ Import Preset: Load voice profiles from files
  • πŸ—‘οΈ Delete: Remove voice profiles you no longer need

🎯 Use Cases

  • πŸ“Š Meeting Transcriptions - Automatically label who said what
  • πŸŽ“ Lecture Notes - Transcribe educational content
  • πŸŽ™οΈ Podcast Production - Generate speaker-labeled transcripts
  • πŸ“ž Interview Documentation - Keep track of multi-speaker conversations
  • πŸ—£οΈ Accessibility - Create text versions of audio content
  • πŸ“š Research - Transcribe focus groups and qualitative interviews

βš™οΈ Configuration

Audio Settings

Default configuration in the code:

  • Sample Rate: 16,000 Hz
  • Channels: Mono (1)
  • Speaker Match Threshold: 0.28 (cosine similarity)

Whisper Model

The app uses the base model by default. For better accuracy, you can modify line 150:

# For better accuracy (requires more RAM):
self.model = whisper.load_model("small")  # or "medium", "large"

Model comparison:

  • tiny - Fastest, least accurate (~1GB RAM)
  • base - Good balance (default, ~1GB RAM)
  • small - Better accuracy (~2GB RAM)
  • medium - High accuracy (~5GB RAM)
  • large - Best accuracy (~10GB RAM)

πŸ—‚οΈ File Structure

itsjust-notes/
β”œβ”€β”€ main.py                      # Main application file
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ vozes_cadastradas.pkl        # Saved voice profiles (auto-generated)
β”œβ”€β”€ settings.json               # User settings (auto-generated)
β”œβ”€β”€ README.md                   # This file
└── docs/                       # Documentation and screenshots
    └── screenshot.png

πŸ› οΈ Technical Architecture

Core Components

  1. UI Layer (customtkinter)

    • Modern, customizable interface
    • Tabbed navigation
    • Real-time progress indicators
  2. Audio Processing (sounddevice, scipy, torchaudio)

    • Microphone capture
    • Audio file loading and resampling
    • Audio segment extraction
  3. Transcription Engine (whisper)

    • Speech-to-text conversion
    • Automatic language detection
    • Segment-based processing
  4. Speaker Recognition (speechbrain)

    • Voice embedding generation
    • Cosine similarity matching
    • ECAPA-VOXCELEB pre-trained model

Data Flow

Audio Input β†’ Whisper Transcription β†’ Speaker Segmentation β†’ 
Voice Embedding β†’ Speaker Matching β†’ Labeled Transcript Output

🀝 Contributing

Contributions are welcome! Here's how you can help:

Areas for Contribution

  • 🌍 Translations: Add more language interfaces
  • 🎨 UI/UX: Improve the user interface
  • πŸ”§ Features: Add new functionality
  • πŸ› Bug Fixes: Report and fix issues
  • πŸ“š Documentation: Improve guides and examples
  • πŸ§ͺ Testing: Test on different platforms

Development Setup

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit your changes: git commit -m 'Add amazing feature'
  4. Push to the branch: git push origin feature/amazing-feature
  5. Open a Pull Request

Code Style

  • Follow PEP 8 guidelines
  • Add comments for complex logic
  • Update documentation for new features
  • Test on multiple platforms when possible

πŸ› Troubleshooting

Common Issues

Problem: Models not downloading

  • Solution: Check your internet connection on first launch. Models are downloaded once and cached.

Problem: Microphone not working

  • Solution: Check system audio permissions. The app needs microphone access.

Problem: Poor speaker recognition

  • Solution:
    • Use longer voice samples (15-30 seconds)
    • Ensure clean audio without background noise
    • Re-register voices if accuracy is low

Problem: High RAM usage

  • Solution: Use the tiny or base Whisper model instead of larger variants

Problem: Application crashes on startup

  • Solution: Ensure all dependencies are installed correctly: pip install -r requirements.txt --upgrade

πŸ“Š Performance

System Requirements

Component Minimum Recommended
RAM 2GB 4GB+
CPU Dual-core Quad-core+
Storage 500MB 2GB
OS Windows 10, macOS 10.14, Linux Latest versions

Benchmarks

On a typical modern laptop:

  • Voice registration: ~2-5 seconds
  • 1 minute audio transcription: ~15-30 seconds
  • Speaker identification: Real-time

πŸ” Privacy & Security

  • βœ… No telemetry or tracking
  • βœ… No data sent to external servers
  • βœ… All processing is local
  • βœ… Voice profiles stored locally only
  • βœ… Open source - audit the code yourself

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

Copyright (c) 2024 It's Just Notes Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

πŸ™ Acknowledgments

This project is built with amazing open-source technologies:

Special thanks to all contributors and the open-source community!


πŸ“ž Support


πŸ—ΊοΈ Roadmap

Planned Features

  • Real-time transcription display
  • Export transcripts to multiple formats (PDF, DOCX, SRT)
  • Diarization improvements
  • Custom model training
  • Batch processing mode
  • API for integration with other tools
  • Mobile version (iOS/Android)
  • Cloud sync option (optional, privacy-preserving)

⭐ Star History

If you find this project useful, please consider giving it a star! ⭐

Star History Chart


Made with πŸ’™ by It's Just

Website β€’ Documentation β€’ Community

About

Open Source Offline Transcription Tool

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages