# Unified STT Notebook

**Single notebook for all Speech-to-Text models**

This notebook provides a unified interface for:
- **STT Models**: Whisper (tiny, base, small, medium, large), Faster-Whisper (optimized)
- **Input Formats**: Audio files (MP3, WAV, M4A, FLAC, OGG) and Video files (MP4, MOV, AVI, MKV, etc.)
- **Output Formats**: Text transcripts, SRT subtitles, VTT captions, JSON with timestamps

The notebook will automatically install only the dependencies you need based on your selections!

‚úÖ **Works both locally and in Google Colab** - automatically detects environment and downloads required files.

üé¨ **Video Support**: Automatically extracts audio from video files (requires ffmpeg)

ü§ñ **Smart Defaults**: 
- **Colab**: Uses `faster-whisper-medium` (best quality, leverages GPU)
- **Local/Mac**: Uses `faster-whisper-base` (fast on CPU, good quality)
- You can override the model in the configuration section!

## 0a) Environment Detection & Setup

**This cell automatically detects if you're running in Google Colab or locally.**

If in Colab, it will download the required Python modules from the GitHub repository.

In [None]:
import sys
import os
from pathlib import Path

# Detect if running in Google Colab
try:
    import google.colab
    IN_COLAB = True
    print("üåê Running in Google Colab")
except ImportError:
    IN_COLAB = False
    print("üíª Running locally")

# GitHub repository URL for downloading Python modules
GITHUB_RAW_URL = "https://raw.githubusercontent.com/SVM0N/ttsweb.github.io/main/"

# Required Python modules (in tts_lib folder - shared with TTS notebook)
REQUIRED_MODULES = [
    "tts_lib/__init__.py",
    "tts_lib/config.py",
    "tts_lib/stt_backends.py",
    "tts_lib/output_formatters.py",
    "tts_lib/stt_setup.py",
    "tts_lib/init_system_stt.py",
    "tts_lib/stt_examples.py",
    "tts_lib/cleanup.py"
]

if IN_COLAB:
    print("\nüì¶ Setting up Colab environment...")
    print("   Downloading required Python modules from GitHub...")
    
    import urllib.request
    
    # Create tts_lib directory
    Path("tts_lib").mkdir(exist_ok=True)
    
    for module in REQUIRED_MODULES:
        url = GITHUB_RAW_URL + module
        try:
            print(f"   ‚Üí Downloading {module}...")
            urllib.request.urlretrieve(url, module)
            print(f"   ‚úì {module} downloaded")
        except Exception as e:
            print(f"   ‚úó Failed to download {module}: {e}")
            print(f"     URL: {url}")
    
    # Create files directory for outputs
    files_dir = Path("files")
    files_dir.mkdir(exist_ok=True)
    print(f"\n‚úì Created output directory: {files_dir}")
    
    # Install ffmpeg for audio processing
    print("\nüì¶ Installing system dependencies for audio processing...")
    get_ipython().system('apt-get update -qq')
    get_ipython().system('apt-get install -y -qq ffmpeg')
    print("   ‚úì FFmpeg installed")
    
    print("\n‚úì Colab environment setup complete!")
    print("  You can now proceed with the rest of the notebook.")
    print("\nüìù Note: To upload audio files, use the file upload button in the sidebar")
    print("  or run: from google.colab import files; uploaded = files.upload()")
    
else:
    print("\n‚úì Local environment detected")
    print("  Using local Python modules")
    
    # Check if required modules exist locally
    missing_modules = []
    for module in REQUIRED_MODULES:
        if not Path(module).exists():
            missing_modules.append(module)
    
    if missing_modules:
        print(f"\n‚ö†Ô∏è  Warning: Missing modules: {', '.join(missing_modules)}")
        print("  Make sure you're running this notebook from the repository directory")
    else:
        print(f"  ‚úì All required modules found")
    
    # Check for ffmpeg on local system
    import platform
    import subprocess
    try:
        subprocess.run(['ffmpeg', '-version'], capture_output=True, check=True)
        print("\n‚úì FFmpeg detected")
    except (subprocess.CalledProcessError, FileNotFoundError):
        print("\n‚ö†Ô∏è  FFmpeg not found. For audio processing:")
        if platform.system() == "Darwin":
            print("   Run: brew install ffmpeg")
        elif platform.system() == "Linux":
            print("   Run: sudo apt-get install ffmpeg")
        else:
            print("   Download from: https://ffmpeg.org/")

print("\n" + "="*60)

## 0b) Conda Environment Setup (Optional - Local Only)

**This step helps you manage Python packages and avoid conflicts with your system installation.**

- If you have **conda** installed, you can create a fresh environment for this notebook
- Or use an existing environment by providing its name
- At the end of the notebook, you can easily clean up and delete the environment to free storage
- **Note**: This section is only relevant for local installations, not Google Colab

In [None]:
from tts_lib.cleanup import interactive_conda_setup

# Run interactive conda environment setup
environment_created_by_notebook, environment_name = interactive_conda_setup()

## 1) Configuration - Choose Your Setup

**Select which STT model and output formats you want to use.**

The notebook automatically chooses the best model for your environment:
- **Colab**: `faster-whisper-medium` (best quality, uses GPU)  
- **Local**: `faster-whisper-base` (fast, good quality, CPU-optimized)

You can override this by setting `STT_MODEL` manually in the cell below!

In [None]:
# ========================================
# AUDIO/VIDEO FILE CONFIGURATION
# ========================================
# Path to your audio or video file to transcribe
# Supports audio: MP3, WAV, M4A, FLAC, OGG, etc.
# Supports video: MP4, MOV, AVI, MKV, WebM, etc. (audio will be extracted)
AUDIO_PATH = "files/audio.mp3"

# ========================================
# STT MODEL SELECTION
# ========================================
# Smart defaults based on environment:
#   - Colab (with GPU): faster-whisper-medium (best quality, leverages GPU)
#   - Local runtime: faster-whisper-base (fast, good quality, runs on CPU/Mac)
#
# You can override by uncommenting and setting STT_MODEL manually:
# STT_MODEL = "faster-whisper-small"

# Auto-select model based on environment
if 'STT_MODEL' not in locals():
    if IN_COLAB:
        # Colab has more resources (often GPU), use better model
        STT_MODEL = "faster-whisper-medium"
        print("üåê Colab detected: Using faster-whisper-medium (best quality)")
    else:
        # Local runtime: optimize for speed on CPU/Mac
        STT_MODEL = "faster-whisper-base"
        print("üíª Local detected: Using faster-whisper-base (fast, good quality)")

# Available models (change STT_MODEL above to use):
#   - "whisper-tiny": Fastest, least accurate (~75MB, ~1GB RAM)
#   - "whisper-base": Fast, decent accuracy (~150MB, ~1GB RAM)
#   - "whisper-small": Balanced speed/accuracy (~500MB, ~2GB RAM)
#   - "whisper-medium": Good accuracy, slower (~1.5GB, ~5GB RAM)
#   - "whisper-large": Best accuracy, slowest (~3GB, ~10GB RAM)
#   - "faster-whisper-tiny": Optimized tiny (4x faster)
#   - "faster-whisper-base": Optimized base (4x faster) ‚≠ê DEFAULT for LOCAL
#   - "faster-whisper-small": Optimized small (4x faster)
#   - "faster-whisper-medium": Optimized medium (4x faster) ‚≠ê DEFAULT for COLAB
#   - "faster-whisper-large": Optimized large (4x faster, needs 10GB RAM)

# ========================================
# TRANSCRIPTION OPTIONS
# ========================================
# Language code (None = auto-detect, or use "en", "es", "fr", "de", etc.)
LANGUAGE = None

# Task type: "transcribe" or "translate" (translate converts to English)
TASK = "transcribe"

# ========================================
# OUTPUT FORMATS
# ========================================
# Select which output formats to generate (can select multiple):
OUTPUT_FORMATS = {
    "txt": True,      # Plain text transcript
    "srt": True,      # SRT subtitle format
    "vtt": True,      # WebVTT caption format
    "json": True,     # JSON with word-level timestamps
}

# ========================================
# DEVICE CONFIGURATION
# ========================================
# Device to use for STT transcription:
#   - "auto": Automatically select best device (CUDA > MPS > CPU)
#   - "cuda": Force CUDA/GPU
#   - "cpu": Force CPU
#   - "mps": Force Apple Silicon MPS (not supported by faster-whisper)

DEVICE = "auto"

# ========================================
# OUTPUT DIRECTORY
# ========================================
# Directory where transcripts will be saved
OUTPUT_DIR = "files"

# ========================================
# VALIDATION
# ========================================
if not Path(AUDIO_PATH).exists():
    print(f"‚ö†Ô∏è  WARNING: Audio/video file not found: {AUDIO_PATH}")
    print("   Please upload a file or update AUDIO_PATH")

if "faster-whisper" in STT_MODEL and DEVICE == "mps":
    print("‚ö†Ô∏è  WARNING: Faster-Whisper does not support MPS (Apple Silicon GPU)")
    print("   Will fall back to CPU. Use regular Whisper models for MPS support.")

if not any(OUTPUT_FORMATS.values()):
    print("‚ö†Ô∏è  WARNING: No output formats selected!")
    print("   At least one output format should be enabled")

print("\n" + "="*60)
print("CONFIGURATION SUMMARY")
print("="*60)
print(f"Environment: {'Google Colab' if IN_COLAB else 'Local Runtime'}")
print(f"Audio/Video File: {AUDIO_PATH}")
print(f"STT Model: {STT_MODEL}")
print(f"Language: {LANGUAGE or 'Auto-detect'}")
print(f"Task: {TASK}")
print(f"Output Formats: {', '.join([fmt.upper() for fmt, enabled in OUTPUT_FORMATS.items() if enabled])}")
print(f"Device: {DEVICE}")
print(f"Output Directory: {OUTPUT_DIR}")
print("="*60)

## 1.5) Apple Silicon (MPS) Fix

**Automatically detect and fix Apple Silicon compatibility issues.**

If you're on Apple Silicon, this will enable CPU fallback for unsupported operations.

In [None]:
import os
import platform

# Check if we're on macOS with Apple Silicon
is_apple_silicon = (
    platform.system() == "Darwin" and 
    platform.machine() == "arm64"
)

if is_apple_silicon:
    print("üçé Apple Silicon detected")
    print("   Enabling MPS fallback for unsupported operations...")
    
    # Set environment variable to enable CPU fallback for unsupported MPS operations
    os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'
    
    print("   ‚úì MPS fallback enabled")
    print("   Note: Some operations will fall back to CPU (slightly slower but works)")
else:
    print("‚úì No Apple Silicon-specific fixes needed")

## 2) Install Dependencies

**Running automatic dependency installation...**

This will install only what you need based on your configuration.

In [None]:
from tts_lib.stt_setup import install_dependencies

# Install dependencies based on configuration
install_dependencies(
    stt_model=STT_MODEL,
    output_formats=OUTPUT_FORMATS
)

print("\nüöÄ Ready to initialize system!")

## 3) Initialize STT System

**Loading STT model...**

In [None]:
from tts_lib.init_system_stt import initialize_system

# Initialize STT backend and config
stt, config = initialize_system(
    stt_model=STT_MODEL,
    output_dir=OUTPUT_DIR,
    device=DEVICE
)

## 4) Run Transcription

Transcribe the audio file and generate output files in selected formats.

In [None]:
from tts_lib.stt_examples import run_transcription

# Run the transcription
result = run_transcription(
    stt=stt,
    config=config,
    audio_path=AUDIO_PATH,
    output_formats=OUTPUT_FORMATS,
    language=LANGUAGE,
    task=TASK
)

# Display results
print("\n" + "="*60)
print("TRANSCRIPTION COMPLETE")
print("="*60)
print(f"\nTranscript Preview:")
print("-" * 60)
print(result['text'][:500] + ("..." if len(result['text']) > 500 else ""))
print("-" * 60)
print(f"\nGenerated Files:")
for file_path in result['output_files']:
    print(f"  ‚úì {file_path}")
print("="*60)

## 5) Optional Cleanup Sections

The following sections help you manage storage and environments.

### 5a) Delete Conda Environment (Optional)

If you created a new environment at the beginning of this notebook, you can delete it here to free up storage space.

‚ö†Ô∏è **Warning**: This will permanently delete the environment and all installed packages!

In [None]:
from tts_lib.cleanup import delete_conda_environment

# Delete conda environment if created by this notebook
if 'environment_created_by_notebook' not in globals():
    print("‚úó No environment tracking found")
    print("This cell only works if you ran the environment setup cell at the beginning")
else:
    success, environment_created_by_notebook, environment_name = delete_conda_environment(
        environment_name, 
        environment_created_by_notebook
    )

### 5b) Delete Model Caches (Optional)

Delete downloaded models and caches to free up disk space.

‚ö†Ô∏è **Warning**: Models will need to be re-downloaded if you run the notebook again!

In [None]:
from tts_lib.cleanup import interactive_cache_cleanup

# Run interactive cache cleanup
interactive_cache_cleanup()