# 1. Raw Audio Waveform (Time Domain)

Speech is a **time-domain signal** representing variations in air pressure
captured by a microphone over time.

Before any signal processing or machine learning,
we must understand how raw audio looks and behaves.


## Why Time-Domain Representation Matters

- The x-axis represents **time (samples)**
- The y-axis represents **amplitude**
- Sampling rate defines how many samples are captured per second

Speech processing always starts here.


In [1]:
# --- Notebook setup (shared across all notebooks) ---
import sys
import os

PROJECT_ROOT = os.path.abspath("..")
if PROJECT_ROOT not in sys.path:
    sys.path.insert(0, PROJECT_ROOT)

import matplotlib
matplotlib.use("Agg")  # headless-safe for script execution

import matplotlib.pyplot as plt

from src.load_audio import load_audio
from src.graph_utils import save_graph


In [2]:
signal, sr = load_audio("sample.ogg")

print(f"Sampling Rate: {sr} Hz")
print(f"Total Samples: {len(signal)}")

plt.figure(figsize=(12, 4))
plt.plot(signal)
plt.title("Raw Audio Waveform")
plt.xlabel("Samples")
plt.ylabel("Amplitude")
plt.tight_layout()

save_graph("01_waveform.png")
plt.close()


Sampling Rate: 16000 Hz
Total Samples: 286685


## Key Observations

- Speech waveforms appear irregular and complex
- This complexity motivates transforming the signal into
  more interpretable representations
