# 5. MFCCs (Mel-Frequency Cepstral Coefficients)

MFCCs are the most widely used features
in classical and deep learning-based speech systems.


## MFCC Pipeline

1. FFT
2. Mel filterbanks
3. Logarithmic compression
4. Discrete Cosine Transform (DCT)

MFCCs capture the **spectral envelope** of speech.


In [1]:
# --- Notebook setup (shared across all notebooks) ---
import sys
import os

PROJECT_ROOT = os.path.abspath("..")
if PROJECT_ROOT not in sys.path:
    sys.path.insert(0, PROJECT_ROOT)

import matplotlib
matplotlib.use("Agg")  # headless-safe for script execution

import matplotlib.pyplot as plt

from src.load_audio import load_audio
from src.graph_utils import save_graph


In [2]:
import librosa.display
from src.mfcc import compute_mfcc

signal, sr = load_audio("sample.ogg")

mfcc = compute_mfcc(signal, sr)

plt.figure(figsize=(12, 5))
librosa.display.specshow(
    mfcc,
    x_axis="time"
)
plt.colorbar()
plt.title("MFCCs")
plt.tight_layout()

save_graph("05_mfcc.png")
plt.close()


## Why MFCCs Work Well for Speech

- Compact representation
- Robust to noise and pitch variation
- Strong correlation with phonetic content
