# Audio Preprocessing Assignment


## Submitted by : S.Akil


#### Goal: Download public domain podcast-style audio ‚Üí 16kHz mono speech (basic cleaning, ready for transcription)

#### Steps we will do:

##### 1.Download audio files from online public domain sources (LibriVox MP3s).
##### 2.Load with librosa (resamples automatically to 16kHz).
##### 3.Convert stereo ‚Üí mono.
##### 4.Normalize volume.
##### 5.Trim leading/trailing silence (using librosa.effects.trim).
##### 6.Basic energy-based silence removal (simple VAD alternative).

## 1. Install & Import (minimal ‚Äì run once)

In [None]:
import librosa  # Audio loading, effects, and resampling wizard
import numpy as np  # Math helper for arrays and peaks
import soundfile as sf  # For saving the processed .wav (high-quality)
import matplotlib.pyplot as plt  # For visualization
print("‚úÖ Imports loaded! Ready for step-by-step processing.")


## Setup File Paths

In [None]:
# Define paths once‚Äîuse these in all steps below
sample_input = r'C:\Users\Akil S\OneDrive\Desktop\infosys\archive\Data\genres_original\blues\blues.00000.wav'
sample_output = 'blues_processed_16kHz.wav'  # Saves in current Jupyter folder

# Optional: Check if file exists
import os
if os.path.exists(sample_input):
    print(f"‚úÖ Input file ready: {sample_input}")
    print(f"   File size: {os.path.getsize(sample_input) / (1024*1024):.1f} MB")
else:
    print("‚ùå Input file not found‚Äîcheck path!")

print("üîÑ Setup complete. Run next cell to load audio.")


## Step 1: Load the Original Audio (Raw Waveform + Sample Rate)

In [None]:
# Step 1: Load Original Audio (Raw Waveform + Sample Rate)
# Why? Gets the raw data; keeps stereo if present for later averaging.

if 'sample_input' not in locals():
    print("‚ùå Run Cell 2 first to set paths!")
else:
    try:
        y_original, sr_original = librosa.load(sample_input, sr=None, mono=False)  # mono=False keeps stereo
        print(f"üîÑ Step 1: Loading {sample_input}")
        print(f"   üìä Loaded: Duration={len(y_original)/sr_original:.2f}s, Channels={'Stereo' if y_original.ndim > 1 else 'Mono'}, SR={sr_original}Hz")
        print(f"   üîç Shape: {y_original.shape} (samples)")
    except Exception as e:
        raise ValueError(f"‚ùå Load failed: {e}")

print("‚úÖ Step 1 complete. y_original and sr_original ready for next step.")


## Step 2: Resample to 16kHz & Convert to Mono

In [None]:
# Step 2: Resample to 16kHz & Convert to Mono
# Why? 16kHz is efficient (cuts samples ~27% from 22kHz); mono averages L/R for simplicity.
# Librosa resamples smoothly (anti-aliasing filters prevent distortion).

if 'y_original' not in locals() or 'sr_original' not in locals():
    print("‚ùå Run Cell 3 first to load audio!")
else:
    target_sr = 16000  # Standard for speech/transcription
    
    # Resample first (works for mono or stereo)
    y_resampled = librosa.resample(y_original, orig_sr=sr_original, target_sr=target_sr)
    
    # Convert to mono if stereo
    if y_resampled.ndim > 1:  # Stereo shape: (2, samples)
        y_mono = np.mean(y_resampled, axis=0)  # Average channels: (samples,)
        print(f"   üîÑ Averaged stereo channels to mono")
    else:
        y_mono = y_resampled
    
    sr_processed = target_sr
    print(f"üîÑ Step 2: Resampling {sample_input}")
    print(f"   üìä New: Duration={len(y_mono)/sr_processed:.2f}s, Mono, SR={sr_processed}Hz")
    print(f"   üîç Shape: {y_mono.shape} (reduced from {y_original.shape})")

print("‚úÖ Step 2 complete. y_mono and sr_processed ready for next step.")


## Step 3: Trim Leading/Trailing Silence

In [None]:
# Step 3: Trim Silence (Remove Long Quiet Starts/Ends)
# Why? GTZAN files often have fade-ins/outs‚Äîtrimming focuses on content, shortens clips.
# top_db=20: Trim where amplitude < -20dB (quiet threshold‚Äîtune lower for more aggressive).

if 'y_mono' not in locals() or 'sr_processed' not in locals():
    print("‚ùå Run Cell 4 first to resample!")
else:
    trim_db = 20  # dB threshold‚Äîhigher = less trimming
    
    y_trimmed, trim_info = librosa.effects.trim(y_mono, top_db=trim_db)
    print(f"üîÑ Step 3: Trimming silence in resampled audio")
    print(f"   ‚úÇÔ∏è From {len(y_mono)/sr_processed:.2f}s to {len(y_trimmed)/sr_processed:.2f}s")
    print(f"   üìç Trimmed indices: {trim_info} (start={trim_info[0]/sr_processed:.2f}s, end={trim_info[1]/sr_processed:.2f}s)")

print("‚úÖ Step 3 complete. y_trimmed ready for next step.")


## Step 4: Normalize Volume

In [None]:
# Step 4: Normalize Amplitude (Scale to Peak [-1, 1])
# Why? Ensures all tracks have similar volume‚Äîkey for fair feature extraction (e.g., MFCCs won't bias loud songs).
# np.max(np.abs()) finds peak, divides to cap at 1 (preserves shape, just scales).

if 'y_trimmed' not in locals():
    print("‚ùå Run Cell 5 first to trim!")
else:
    normalize = True  # Set False to skip
    
    if normalize:
        peak = np.max(np.abs(y_trimmed))
        if peak > 0:  # Avoid divide-by-zero (silent files)
            y_normalized = y_trimmed / peak
            print(f"üîÑ Step 4: Normalizing trimmed audio")
            print(f"   üìê Peak from {peak:.3f} to 1.0 (range now [{np.min(y_normalized):.3f}, {np.max(y_normalized):.3f}])")
        else:
            y_normalized = y_trimmed  # Already silent
            print("   ‚ö†Ô∏è File was silent‚Äîno normalization needed.")
    else:
        y_normalized = y_trimmed
        print("   ‚è≠Ô∏è Normalization skipped.")

print("‚úÖ Step 4 complete. y_normalized ready for next step.")


## Step 5: Basic Energy-Based Silence Removal

In [None]:
# Step 5: Reduce Silence (Gentle Compression for Background Noise)
# Why? Attenuates quiet parts without full VAD‚Äîmakes speech pop, reduces noise.
# Below -20dB? Scale down by 1/4 (e.g., -30dB ‚Üí -7.5dB output).

if 'y_normalized' not in locals():
    print("‚ùå Run Cell 6 first to normalize!")
else:
    threshold_db = -20  # Attenuate below this
    ratio = 4  # Compression: 4dB quiet input ‚Üí 1dB output
    
    # Convert to dB
    y_db = librosa.amplitude_to_db(np.abs(y_normalized), ref=np.max)
    
    # Identify & attenuate quiet parts
    below_threshold = y_db < threshold_db
    gain = np.where(below_threshold, 1 / ratio, 1.0)  # 0.25x for silence, 1x for speech
    y_compressed = y_normalized * (10 ** (gain / 20))  # dB to linear amplitude
    
    y_processed = y_compressed  # Final waveform
    print(f"üîÑ Step 5: Reducing silence in normalized audio")
    print(f"   üîâ Applied {ratio}:1 compression below {threshold_db}dB")
    print(f"   üìä Non-silent %: ~{100 * np.mean(np.abs(y_processed) > 0.01):.1f}% (rough activity estimate)")

print("‚úÖ Step 5 complete. y_processed ready for save/visualization.")


## Step 6: Save the Processed Audio

In [None]:
# Step 6: Save Processed Audio (High-Quality .wav)
# Why? soundfile saves without compression loss‚Äîready for transcription or features.

if 'y_processed' not in locals() or 'sr_processed' not in locals():
    print("‚ùå Run Cell 7 first to process!")
else:
    if 'sample_output' not in locals():
        print("‚ùå Run Cell 2 first for output path!")
    else:
        sf.write(sample_output, y_processed, sr_processed)
        print(f"üîÑ Step 6: Saving processed audio")
        print(f"   üíæ Saved: {sample_output} (Duration={len(y_processed)/sr_processed:.2f}s, Mono 16kHz)")
        print(f"   üîç Final shape: {y_processed.shape}")

print("‚úÖ All steps complete! Check your folder for the WAV file.")


## Final Stats & Visualization

In [None]:
# Final Stats & Visualization
# Why? Quantify changes; plot to "see" the waveform.

if 'y_processed' not in locals() or 'sr_processed' not in locals():
    print("‚ùå Run Cell 7 first to process!")
else:
    print(f"\nüìà Final Stats:")
    print(f" - Amplitude range: [{np.min(y_processed):.3f}, {np.max(y_processed):.3f}] (normalized!)")
    print(f" - Zero-crossings (activity): ~{np.sum(np.diff(np.sign(y_processed)) != 0)/len(y_processed)*100:.1f}% non-silent")
    print(f" - Total duration: {len(y_processed)/sr_processed:.2f}s")
    
    # Visualize
    plt.figure(figsize=(10, 4))
    time_axis = np.linspace(0, len(y_processed) / sr_processed, len(y_processed))
    plt.plot(time_axis, y_processed)
    plt.title('Processed Audio: 16kHz Mono, Normalized, Silence Handled')
    plt.xlabel('Time (s)')
    plt.ylabel('Amplitude')
    plt.grid(True)
    plt.show()

print("üéâ Processing pipeline done! Experiment by re-running cells.")
