# Audio Classification with Pre-Trained Model

This notebook demonstrates feature extraction for our ai model.
Feautures we use are:
- MFCCs
- Chroma
- Mel Spectrogram
- Spectral Contrast
- Tonnetz


## Imports and Dependencies

We import necessary libraries for audio processing, model handling, and numerical computations.
- **TensorFlow/Keras:** Load the pre-trained model
- **Joblib:** Load the saved scaler for feature normalization
- **Librosa:** Audio feature extraction
- **NumPy:** Numerical operations
- **JSON:** Handle label mappings

In [None]:
import tensorflow as tf
import joblib
import json
import numpy as np
import librosa


## File Paths

Define paths to the model, scaler, and label mapping JSON file. Centralized paths simplify maintenance and ensure reproducibility.

In [None]:
MODEL_PATH = "/app/song-storage/model.keras"
SCALER_PATH = "/app/song-storage/scaler.pkl"
LABELS_PATH = "/app/song-storage/label_order.json"


## Load Model and Scaler

Load the trained Keras model and standard scaler.
Load the list of dance labels to map predictions back to human-readable classes.
This step prepares the environment for inference on new audio files.


In [None]:
model = tf.keras.models.load_model(MODEL_PATH)
scaler = joblib.load(SCALER_PATH)

with open(LABELS_PATH, "r") as f:
    dance_styles = json.load(f)

print("Model, scaler, and labels loaded.")


## Feature Extractor Class

`AudioFeatureExtractor` encapsulates all audio processing:

- **MFCCs (Mel-Frequency Cepstral Coefficients):** Captures timbral texture
- **Chroma Features:** Harmonic content representation
- **Mel Spectrogram:** Frequency-energy representation
- **Spectral Contrast:** Measures harmonic vs. percussive components
- **Tonnetz:** Tonal centroid features for harmony
- **Concatenation:** Combines mean and variance of all features into a single feature vector

This class also provides a method to directly process an audio file and return a flattened feature vector ready for the model.


In [None]:
class AudioFeatureExtractor:
    def __init__(self, sr=22050, n_mfcc=13, n_fft=2048, hop_length=512):
        self.sr = sr
        self.n_mfcc = n_mfcc
        self.n_fft = n_fft
        self.hop_length = hop_length

    def load_audio(self, file_path):
        y, sr = librosa.load(file_path, sr=self.sr)
        y = librosa.util.normalize(y)
        return y, sr

    def extract_features(self, y):
        mfccs = librosa.feature.mfcc(y=y, sr=self.sr, n_mfcc=self.n_mfcc, n_fft=self.n_fft, hop_length=self.hop_length)
        mfccs_mean = np.mean(mfccs, axis=1)
        mfccs_var = np.var(mfccs, axis=1)

        chroma = librosa.feature.chroma_stft(y=y, sr=self.sr, n_fft=self.n_fft, hop_length=self.hop_length)
        chroma_mean = np.mean(chroma, axis=1)
        chroma_var = np.var(chroma, axis=1)

        mel = librosa.feature.melspectrogram(y=y, sr=self.sr, n_fft=self.n_fft, hop_length=self.hop_length)
        mel_mean = np.mean(mel, axis=1)
        mel_var = np.var(mel, axis=1)

        contrast = librosa.feature.spectral_contrast(y=y, sr=self.sr, n_fft=self.n_fft, hop_length=self.hop_length)
        contrast_mean = np.mean(contrast, axis=1)
        contrast_var = np.var(contrast, axis=1)

        tonnetz = librosa.feature.tonnetz(y=librosa.effects.harmonic(y), sr=self.sr)
        tonnetz_mean = np.mean(tonnetz, axis=1)
        tonnetz_var = np.var(tonnetz, axis=1)

        features = np.concatenate([
            mfccs_mean, mfccs_var,
            chroma_mean, chroma_var,
            mel_mean, mel_var,
            contrast_mean, contrast_var,
            tonnetz_mean, tonnetz_var
        ])
        return features

    def extract_features_from_file(self, file_path):
        y, sr = self.load_audio(file_path)
        return self.extract_features(y)


## Classification Function

`classify_audio` performs the following:

1. Extract features from the input audio using `AudioFeatureExtractor`.
2. Normalize features using the pre-trained scaler.
3. Perform model prediction to obtain probabilities for each dance class.
4. Sort predictions in descending order of confidence and return structured output including dance name, confidence score, and speed category.


In [None]:
def classify_audio(file_path, extractor):
    features = extractor.extract_features_from_file(file_path).reshape(1, -1)
    features = scaler.transform(features)
    probabilities = model.predict(features)[0]
    predictions = sorted(zip(dance_styles, probabilities), key=lambda x: x[1], reverse=True)
    return [
        {"danceName": dance, "confidence": float(f"{conf:.6f}"), "speedCategory": "slow"}
        for dance, conf in predictions
    ]


## Example Usage

- Instantiate the feature extractor
- Provide the path to an audio file
- Call `classify_audio` to get predictions

The resulting list contains dance names with confidence scores, ready for display or further processing.


In [None]:
extractor = AudioFeatureExtractor()
file_path = "/path/to/audio/file.wav"
predictions = classify_audio(file_path, extractor)
predictions
