# Somali Solfege Converter - Audio Processing

## PRSE (Python Rapid-Systems Engine) Mode

This notebook implements **Phase 1: Initialization** of the Somali Solfege Converter system.

### Design Blueprint: Phase 1

* **Architectural Choice:** Procedural (Linear pipeline is most efficient for this automation)
* **Key Libraries:** `moviepy` (Video handling), `scipy` (I/O), `numpy` (DSP)
* **Memory Strategy:** Immediate conversion to `float32` and deletion of video objects after audio extraction
* **Target Sample Rate:** 22.05kHz for memory efficiency on 8GB RAM systems

### Supported Formats

* **Video:** .mp4, .mov, .avi, .mkv (audio will be extracted automatically)
* **Audio:** .wav, .mp3, .flac, .ogg

## Cell 1: Environment & Dependencies

This cell ensures your virtual environment is ready and all required libraries are installed.

In [None]:
import sys
import subprocess

# List of required libraries
libraries = ['moviepy', 'numpy', 'scipy', 'matplotlib', 'librosa']

def check_setup():
    print("Checking environment dependencies...")
    for lib in libraries:
        try:
            __import__(lib)
            print(f"✅ {lib} is installed.")
        except ImportError:
            print(f"❌ {lib} missing. Installing now...")
            subprocess.check_call([sys.executable, "-m", "pip", "install", lib])

check_setup()

## Cell 2: Audio Extraction & Pre-processing

This cell detects if your input is a video and extracts the audio stream, or loads an audio file directly. 
It performs the **22.05kHz downsampling** for memory efficiency.

### Features:
* Automatic video-to-audio extraction
* Stereo-to-mono conversion
* Downsampling to 22.05kHz
* Memory-efficient float32 normalization
* Automatic cleanup of temporary files

In [None]:
import os
# Import moviepy - compatible with both v1.x and v2.x
try:
    from moviepy import VideoFileClip
except ImportError:
    from moviepy.editor import VideoFileClip
from scipy.io import wavfile
import numpy as np
import gc

def prepare_audio_input(file_path, target_sr=22050):
    """
    Extracts audio from video if needed and loads it into memory.
    Optimized for 8GB RAM using float32 and downsampling.
    """
    ext = os.path.splitext(file_path)[1].lower()
    temp_audio = "temp_extracted_audio.wav"
    
    # Step 1: Video to Audio Extraction (If needed)
    if ext in ['.mp4', '.mov', '.avi', '.mkv']:
        print(f"Video detected. Extracting audio from {file_path}...")
        video = VideoFileClip(file_path)
        video.audio.write_audiofile(temp_audio, fps=target_sr, verbose=False, logger=None)
        video.close() # Close file handle immediately
        load_path = temp_audio
        del video
    else:
        load_path = file_path

    # Step 2: Load and Downsample
    print(f"Loading and normalizing audio...")
    sr, data = wavfile.read(load_path)
    
    # Convert to Mono if Stereo
    if len(data.shape) > 1:
        data = data.mean(axis=1)
    
    # Downsample logic (Simple decimation for speed)
    if sr != target_sr:
        resample_factor = max(1, sr // target_sr)
        data = data[::resample_factor]
    
    # Memory-safe conversion
    samples = data.astype(np.float32)
    samples /= np.max(np.abs(samples)) if np.max(np.abs(samples)) > 0 else 1.0
    
    # Cleanup
    if os.path.exists(temp_audio) and ext in ['.mp4', '.mov', '.avi', '.mkv']:
        os.remove(temp_audio)
    
    del data
    gc.collect()
    
    print(f"Done. Loaded {len(samples)/target_sr:.2f}s of audio at {target_sr}Hz.")
    return samples, target_sr

# --- TEST THE CELL ---
# Uncomment the lines below and provide your input file path
# INPUT_FILE = "your_video_or_audio_here.mp4" 
# samples, sr = prepare_audio_input(INPUT_FILE)
print("Audio extraction function ready. Set INPUT_FILE and run to process.")

## Cell 3: Visualize Audio Waveform (Optional)

Once you have loaded audio, you can visualize it to verify the extraction worked correctly.

In [None]:
import matplotlib.pyplot as plt

def plot_waveform(samples, sr, duration_limit=10.0):
    """
    Plot the audio waveform.
    
    Args:
        samples: Audio samples array
        sr: Sample rate
        duration_limit: Maximum duration to plot in seconds (default: 10s)
    """
    # Limit the plot to avoid memory issues
    max_samples = int(duration_limit * sr)
    plot_samples = samples[:max_samples]
    
    time = np.arange(len(plot_samples)) / sr
    
    plt.figure(figsize=(12, 4))
    plt.plot(time, plot_samples, linewidth=0.5)
    plt.xlabel('Time (seconds)')
    plt.ylabel('Amplitude')
    plt.title(f'Audio Waveform (first {duration_limit}s)')
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

# --- VISUALIZE AUDIO ---
# Uncomment to visualize after loading audio
# plot_waveform(samples, sr)
print("Waveform visualization function ready.")

## Next Steps: Phase 2 - Pitch Detection

The next phase will implement:
1. **YIN Algorithm** for pitch detection
2. **Note Segmentation** to identify individual notes
3. **Somali Pentatonic Scale Mapping** for solfege conversion

This will allow the system to analyze the musical content and convert it to Somali solfege notation.