
# Audio Analysis – Preprocessing Assignment

**Submitted by:** Goutham G  
**Date:** 27th November 2025  

## Objective
Automatic Podcast Transcription and Topic Segmentation

This notebook demonstrates standard audio preprocessing steps used before feeding data to ASR and NLP pipelines.



## 1. Installing & Importing Libraries


In [1]:
%pip install librosa noisereduce soundfile webrtcvad numpy matplotlib

Collecting webrtcvad
  Using cached webrtcvad-2.0.10.tar.gz (66 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: webrtcvad
  Building wheel for webrtcvad (pyproject.toml): started
  Building wheel for webrtcvad (pyproject.toml): finished with status 'error'
Failed to build webrtcvad
Note: you may need to restart the kernel to use updated packages.


  error: subprocess-exited-with-error
  
  × Building wheel for webrtcvad (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [23 lines of output]
      !!
      
              ********************************************************************************
              Please consider removing the following classifiers in favor of a SPDX license expression:
      
              License :: OSI Approved :: MIT License
      
              See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
              ********************************************************************************
      
      !!
        self._finalize_license_expression()
      running bdist_wheel
      running build
      running build_py
      creating build\lib.win-amd64-cpython-311
      copying webrtcvad.py -> build\lib.win-amd64-cpython-311
      running build_ext
      building '_webrtcvad' extension
      error: Microsoft Visual C++ 14.0 or greater is 

In [2]:
import librosa
import numpy as np
import soundfile as sf
import noisereduce as nr
import matplotlib.pyplot as plt
import webrtcvad
import os


## 2. Load & Inspect Audio File


In [3]:
audio_path = "..\dataset\Harward mini OR\OSR_us_000_0010_8k.wav"

y, sr = librosa.load(audio_path, sr=None)
print(f"Sample Rate: {sr}")
print(f"Duration: {len(y)/sr:.2f} seconds")

Sample Rate: 8000
Duration: 33.62 seconds



## 3. Resample Audio to 16 kHz


In [4]:

target_sr = 16000
if sr != target_sr:
    y = librosa.resample(y, orig_sr=sr, target_sr=target_sr)
    sr = target_sr



## 4. Convert Stereo to Mono


In [5]:

y = librosa.to_mono(y)



## 5. Amplitude Normalization


In [6]:

y = y / np.max(np.abs(y))



## 6. Noise Reduction


In [7]:

y_denoised = nr.reduce_noise(y=y, sr=sr)



## 7. Silence Removal


In [8]:

intervals = librosa.effects.split(y_denoised, top_db=20)
y_nonsilent = np.concatenate([y_denoised[start:end] for start, end in intervals])



## 8. Voice Activity Detection (VAD)
Using WebRTC VAD for detecting speech segments.


In [9]:

vad = webrtcvad.Vad(2)

def frame_generator(frame_duration_ms, audio, sample_rate):
    frame_length = int(sample_rate * frame_duration_ms / 1000)
    for i in range(0, len(audio), frame_length):
        yield audio[i:i + frame_length]

speech_frames = []
for frame in frame_generator(30, (y_nonsilent * 32768).astype(np.int16), sr):
    if len(frame) == int(sr * 0.03):
        if vad.is_speech(frame.tobytes(), sr):
            speech_frames.append(frame)

speech_audio = np.concatenate(speech_frames) if speech_frames else np.array([])



## 9. Save Preprocessed Audio


In [10]:

sf.write("processed_audio.wav", speech_audio, sr)
print("Preprocessing complete. File saved as processed_audio.wav")


Preprocessing complete. File saved as processed_audio.wav



## Final Output
This preprocessed audio can now be directly used for:
- Automatic Speech Recognition (ASR)
- Topic Segmentation
- Speaker Diarization
