# Step-by-Step: From MFCC ```.npy``` Files to a Model

## Step 1: Load Metadata

We’ll read ```mfcc_metadata.csv``` to get:
- The name of the ```.npy``` file with MFCCs,
- The ```queen_status``` label.

In [1]:
import pandas as pd

# Load metadata file
metadata_df = pd.read_csv("mfcc_metadata.csv")

# Optional: Show distribution of labels
print(metadata_df["queen_status"].value_counts())

queen_status
3    3563
2    1553
0    1038
1     946
Name: count, dtype: int64


## Step 2: Prepare Data (Fixed Length for Simpler Models)

We’ll:
- Load the MFCC matrix from each .npy file,
- Pad or truncate it to a fixed number of frames (say, 100),
- Flatten it into a 1D vector (so you can use traditional models like Random Forest or Logistic Regression).

In [2]:
import numpy as np
import os

# Settings
mfcc_dir = "mfcc_npy_files/"
n_mfcc = 13
max_frames = 100  # You can adjust this if needed

# Containers for features and labels
X = []
y = []

# Loop through metadata and load MFCCs
for idx, row in metadata_df.iterrows():
    file_path = os.path.join(mfcc_dir, row["mfcc_file"])
    label = row["queen_status"]

    # Load MFCC matrix
    mfcc = np.load(file_path)

    # Pad or truncate to fixed length
    if mfcc.shape[0] < max_frames:
        pad_width = max_frames - mfcc.shape[0]
        mfcc = np.pad(mfcc, ((0, pad_width), (0, 0)), mode="constant")
    else:
        mfcc = mfcc[:max_frames]

    # Flatten to 1D vector: [100 frames × 13 mfccs] → 1300 features
    X.append(mfcc.flatten())
    y.append(label)

X = np.array(X)
y = np.array(y)

print("✅ Loaded and prepared all data:", X.shape, y.shape)


✅ Loaded and prepared all data: (7100, 1300) (7100,)


## Step 3: Train a Simple Model (Random Forest)

In [3]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.84      0.35      0.49       208
           1       0.63      0.18      0.28       189
           2       0.64      0.46      0.53       310
           3       0.65      0.97      0.78       713

    accuracy                           0.66      1420
   macro avg       0.69      0.49      0.52      1420
weighted avg       0.67      0.66      0.62      1420



## What We’ll Get:
- A first working classifier that predicts queen status from MFCC audio features.
- Precision, recall, F1-score — all the standard metrics.
- A real baseline you can compare future models to (like RNNs or CNNs).

## Next Levels
- Normalize MFCCs before modeling.
- Try PCA for feature reduction.
- Train time-distributed models (like LSTMs) using the full [frames × 13] matrices.
- Visualize samples: MFCC heatmaps vs. queen status.