
# Final Project Notebook — Music Genre Classification (GTZAN)
**Authors:** Sai Arunanshu Govindarajula, Tejaswini  

This notebook demonstrates **end-to-end usage** of our Music Genre Classification project:

1. Environment setup (Python + dependencies)  
2. Dataset folder expectations  
3. MFCC extraction + segmentation (30s → 10 segments)  
4. Loading the trained CNN model  
5. Making Top-3 predictions with confidence  
6. (Optional) Running the Flask web app demo  


## 1) Environment Setup
If you haven't installed dependencies yet, run this (in a terminal or a notebook cell):

```bash
pip install -r requirements.txt
```

If you want MP3 upload support in Flask, install FFmpeg and ensure it is in PATH:
- Windows (winget): `winget install Gyan.FFmpeg`
- Verify: `ffmpeg -version`


In [1]:
import sys, platform, os
print("Python:", sys.version)
print("Platform:", platform.platform())
print("Working directory:", os.getcwd())


Python: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Platform: Windows-10-10.0.26100-SP0
Working directory: c:\Users\aruna\Desktop\EDU\SEM3\ECE-5831 - Pattern Recognition and NN\Project\MusicGenreClassifier\Music-Genre-Classification



## 2) Project Paths
Update these paths to match your local folder structure (repo root recommended).

Expected structure:
- `Data/genres_original/<genre>/<file>.wav`
- model file: `MusicGenre_CNN_79.73.h5` (or your best `.h5`)


In [5]:

from pathlib import Path

PROJECT_ROOT = Path(".")  # set to your repo root if needed
DATASET_DIR = PROJECT_ROOT / "Data" / "genres_original"
MODEL_PATH  = PROJECT_ROOT / "MusicGenre_CNN_.h5" 

print("Dataset dir:", DATASET_DIR.resolve())
print("Model path :", MODEL_PATH.resolve())
print("Dataset exists?", DATASET_DIR.exists())
print("Model exists?  ", MODEL_PATH.exists())


Dataset dir: C:\Users\aruna\Desktop\EDU\SEM3\ECE-5831 - Pattern Recognition and NN\Project\MusicGenreClassifier\Music-Genre-Classification\Data\genres_original
Model path : C:\Users\aruna\Desktop\EDU\SEM3\ECE-5831 - Pattern Recognition and NN\Project\MusicGenreClassifier\Music-Genre-Classification\MusicGenre_CNN_.h5
Dataset exists? True
Model exists?   True



## 3) MFCC Segmentation + Feature Extraction
We follow the same parameters used for training/inference:
- Sample rate: 22050 Hz
- Track duration: 30 s
- 10 segments
- MFCC: 13 coefficients
- FFT: 2048, Hop: 512

Each segment produces a feature tensor of shape roughly `(130, 13, 1)`.
All segments stack into `(10, 130, 13, 1)`.


In [6]:

import math
import numpy as np

SAMPLE_RATE = 22050
NUM_MFCC = 13
N_FFT = 2048
HOP_LENGTH = 512
TRACK_DURATION = 30
NUM_SEGMENTS = 10

SAMPLES_PER_TRACK = SAMPLE_RATE * TRACK_DURATION
SAMPLES_PER_SEGMENT = int(SAMPLES_PER_TRACK / NUM_SEGMENTS)
EXPECTED_FRAMES = int(math.ceil(SAMPLES_PER_SEGMENT / HOP_LENGTH))

print("Samples/track   :", SAMPLES_PER_TRACK)
print("Samples/segment :", SAMPLES_PER_SEGMENT)
print("Expected frames :", EXPECTED_FRAMES)


Samples/track   : 661500
Samples/segment : 66150
Expected frames : 130


In [7]:

import librosa

def extract_mfcc_segments(audio_path: str) -> np.ndarray:
    '''
    Returns: X of shape (segments, frames, n_mfcc, 1)
    Pads/crops frames to EXPECTED_FRAMES for model compatibility.
    '''
    y, sr = librosa.load(audio_path, sr=SAMPLE_RATE, mono=True)

    segments = []
    for d in range(NUM_SEGMENTS):
        start = d * SAMPLES_PER_SEGMENT
        finish = start + SAMPLES_PER_SEGMENT
        if finish > len(y):
            continue

        mfcc = librosa.feature.mfcc(
            y=y[start:finish],
            sr=sr,
            n_mfcc=NUM_MFCC,
            n_fft=N_FFT,
            hop_length=HOP_LENGTH
        ).T  # (frames, 13)

        frames = mfcc.shape[0]
        if frames < EXPECTED_FRAMES:
            pad = np.zeros((EXPECTED_FRAMES - frames, NUM_MFCC), dtype=mfcc.dtype)
            mfcc = np.vstack([mfcc, pad])
        else:
            mfcc = mfcc[:EXPECTED_FRAMES, :]

        segments.append(mfcc)

    if not segments:
        raise ValueError("No valid segments extracted. Audio may be too short or unreadable.")

    X = np.stack(segments, axis=0).astype(np.float32)  # (segments, frames, mfcc)
    X = X[..., np.newaxis]  # add channel dim
    return X



## 4) Load the Trained CNN Model
We load the `.h5` model using Keras.

If the model path is incorrect, the cell will print an error and you can update `MODEL_PATH` above.


In [8]:

import keras

model = None
try:
    model = keras.models.load_model(MODEL_PATH)
    print("✅ Model loaded successfully.")
    model.summary()
except Exception as e:
    print("❌ Could not load model:", type(e).__name__)
    print(e)


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


✅ Model loaded successfully.



## 5) Predict Top-3 Genres for an Audio File
This uses the segment-wise approach:
1. Extract MFCC segments → shape `(10, 130, 13, 1)`  
2. Predict per segment → shape `(10, 10)`  
3. Average probabilities across segments → shape `(10,)`  
4. Return Top-3 genres with confidence scores  


In [9]:

genre_dict = {
    0: "disco", 1: "pop", 2: "classical", 3: "metal", 4: "rock",
    5: "blues", 6: "hiphop", 7: "reggae", 8: "country", 9: "jazz"
}

def predict_top3(audio_path: str):
    if model is None:
        raise RuntimeError("Model is not loaded. Fix MODEL_PATH and re-run the model loading cell.")

    X = extract_mfcc_segments(audio_path)
    probs = model.predict(X, verbose=0)     # (segments, 10)
    avg = probs.mean(axis=0)               # (10,)
    top = avg.argsort()[-3:][::-1]         # top3 indices desc

    results = [(genre_dict[int(i)], float(avg[int(i)])) for i in top]
    return X.shape, probs.shape, results

# Pick a sample file (update as needed)
sample_audio = DATASET_DIR / "blues" / "blues.00000.wav"
print("Sample audio:", sample_audio)
print("Exists?", sample_audio.exists())

try:
    xshape, pshape, top3 = predict_top3(str(sample_audio))
    print("Input shape:", xshape)
    print("Pred shape :", pshape)
    for rank, (g, s) in enumerate(top3, 1):
        print(f"Top {rank}: {g:10s}  {s*100:6.2f}%")
except Exception as e:
    print("Prediction not executed:", type(e).__name__)
    print(e)


Sample audio: Data\genres_original\blues\blues.00000.wav
Exists? True
Input shape: (10, 130, 13, 1)
Pred shape : (10, 10)
Top 1: blues        64.47%
Top 2: rock         13.18%
Top 3: country      11.86%



## 7) Run the Flask Web App (End-to-End Demo)
From the project root in a terminal:

```bash
python app.py
```

Then open:
- http://127.0.0.1:5000/

**MP3 uploads:** Ensure FFmpeg is installed and reachable in PATH.
