# 🐓 GallusSense — Train Your Own Rooster Detector!

This notebook walks you through training a machine learning model to detect rooster sounds using your own audio data. The final model can be used with the GallusSense application.

---


### 1. Prepare the Dataset with ESC-50 and Custom Samples

This step prepares the training and testing dataset for our rooster sound detection model.

We do two things here:

- **Automatically download and split the ESC-50 dataset**: We use this public dataset as a source for `non_rooster` samples (e.g. urban noise, animals, etc.). We ignore its few rooster samples because they are too limited to train a robust model.
- **Encourage you to add your own samples**: You should manually add `.wav` or `.mp3` files of `rooster` sounds (and optionally more `non_rooster` sounds) into these folders:
  - `audio/train/rooster`
  - `audio/test/rooster`
  - `audio/train/non_rooster`
  - `audio/test/non_rooster`

This hybrid setup helps us train a model that generalizes better to real-world scenarios where rooster audio is collected independently.


In [None]:
import os
import zipfile
import shutil
import pandas as pd
from pathlib import Path
from sklearn.model_selection import train_test_split
from urllib.request import urlretrieve

# Define paths
download_url = "https://github.com/karoldvl/ESC-50/archive/master.zip"
zip_path = "./audio/esc-50.zip"
extract_path = "./audio"
audio_output_path = Path("./audio")

# Download the dataset
if not os.path.exists(zip_path):
    print("⬇️ Downloading ESC-50...")
    os.makedirs("./audio", exist_ok=True)
    urlretrieve(download_url, zip_path)

# Unzip the dataset
if not (Path("./audio/ESC-50-master").exists()):
    print("📦 Extracting ESC-50...")
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall("./audio")

# Prepare paths
esc_root = Path("./audio/ESC-50-master")
esc_audio_dir = esc_root / "audio"
esc_meta_path = esc_root / "meta/esc50.csv"

# Define output folders
train_nonrooster_dir = audio_output_path / "train/non_rooster"
test_nonrooster_dir = audio_output_path / "test/non_rooster"

for folder in [train_nonrooster_dir, test_nonrooster_dir]:
    folder.mkdir(parents=True, exist_ok=True)

# Load metadata
df = pd.read_csv(esc_meta_path)

# Filter out rooster sounds only
non_rooster_df = df[df["category"] != "rooster"]

# Split dataset: 10% in train, 90% in test
nonrooster_train, nonrooster_test = train_test_split(non_rooster_df, test_size=0.9, random_state=42)

# Copy files
def copy_files(rows, src_dir, dest_dir):
    for _, row in rows.iterrows():
        src = src_dir / row["filename"]
        dst = dest_dir / row["filename"]
        if src.exists():
            shutil.copy(src, dst)

print("📁 Copying non-rooster samples...")
copy_files(nonrooster_train, esc_audio_dir, train_nonrooster_dir)
copy_files(nonrooster_test, esc_audio_dir, test_nonrooster_dir)

print("✅ ESC-50 non_rooster dataset is ready and split into train/test.")


### 2. Generate Metadata for Training Set

Now that the dataset is prepared and structured, we generate a metadata CSV file listing all training audio files and their associated labels.

This metadata is crucial for processing the dataset programmatically in the next steps.

In [None]:
import os
import pandas as pd

base_path = "audio/train"
categories = ["rooster", "non_rooster"]
audio_extensions = [".wav", ".mp3"]

data = []

for category in categories:
    folder_path = os.path.join(base_path, category)
    for filename in os.listdir(folder_path):
        if any(filename.lower().endswith(ext) for ext in audio_extensions):
            data.append({
                "filename": os.path.join(folder_path, filename),
                "label": category
            })

metadata_train = pd.DataFrame(data)
metadata_train.to_csv("metadata_train_rf.csv", index=False)
metadata_train.head()

### 3. Extract MFCC Audio Features

In this step, we extract **MFCCs (Mel-frequency cepstral coefficients)** from the audio files.

MFCCs are a compact representation of the audio signal and are commonly used in audio classification tasks because they capture the timbral texture of sounds.

In [None]:
import numpy as np
import librosa
from tqdm import tqdm

metadata = pd.read_csv("metadata_train_rf.csv")
metadata["label_encoded"] = metadata["label"].map({"rooster": 1, "non_rooster": 0})

X, y = [], []
n_mfcc, sr_target = 40, 22050

for row in tqdm(metadata.itertuples(), total=len(metadata)):
    try:
        signal, _ = librosa.load(row.filename, sr=sr_target)
        signal = librosa.util.normalize(signal)
        mfcc = librosa.feature.mfcc(y=signal, sr=sr_target, n_mfcc=n_mfcc)
        X.append(np.mean(mfcc.T, axis=0))
        y.append(row.label_encoded)
    except Exception as e:
        print(f"⚠️ {row.filename}: {e}")

X = np.array(X)
y = np.array(y)

print("✅ MFCC extraction complete", X.shape, y.shape)

### 4. Visualize Class Distribution

To better understand the balance of our dataset, we plot the distribution of the different audio classes.

This helps ensure that our model won't be biased due to class imbalance.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(X)
df["label"] = y
sns.countplot(x="label", data=df)
plt.title("Class Distribution")
plt.xlabel("Class")
plt.ylabel("Number of Samples")
plt.show()

### 5. Train the Random Forest Model

We train a **Random Forest classifier**, a robust and interpretable machine learning model, using the MFCC features and the associated labels.

In [None]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(class_weight="balanced", n_estimators=100, random_state=42)
model.fit(X, y)

print("✅ Model trained!")

### 6. Save the Trained Model

Once training is complete, we save the trained model to disk so it can later be reused for inference or deployment.

In [None]:
import joblib
joblib.dump(model, "gallussense_custom_rf.pkl")
print("💾 Model saved as gallussense_custom_rf.pkl")

### 7. Generate Metadata for Test Set

Next, we generate a metadata file for the test set, just like we did for training.

This will allow us to run the same MFCC extraction and evaluation pipeline on the test data.

In [None]:
base_path = "audio/test"
data = []

for category in ["rooster", "non_rooster"]:
    folder_path = os.path.join(base_path, category)
    for filename in os.listdir(folder_path):
        if any(filename.lower().endswith(ext) for ext in [".wav", ".mp3"]):
            data.append({
                "filename": os.path.join(folder_path, filename),
                "label": category
            })

metadata_test = pd.DataFrame(data)
metadata_test.to_csv("metadata_test_rf.csv", index=False)
print("✅ Test metadata created!")
metadata_test.head()

### 8. Evaluate the Model on the Test Set

We evaluate our model on the test data to understand how well it generalizes.

The evaluation includes extracting MFCCs from the test set, predicting the labels, and calculating precision, recall, and F1-score.

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

metadata_test = pd.read_csv("metadata_test_rf.csv")
metadata_test["label_encoded"] = metadata_test["label"].map({"rooster": 1, "non_rooster": 0})

X_test, y_test = [], []

for row in tqdm(metadata_test.itertuples(), total=len(metadata_test)):
    try:
        signal, _ = librosa.load(row.filename, sr=sr_target)
        signal = librosa.util.normalize(signal)
        mfcc = librosa.feature.mfcc(y=signal, sr=sr_target, n_mfcc=n_mfcc)
        X_test.append(np.mean(mfcc.T, axis=0))
        y_test.append(row.label_encoded)
    except Exception as e:
        print(f"⚠️ {row.filename}: {e}")

X_test = np.array(X_test)
y_test = np.array(y_test)

y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred, target_names=["non_rooster", "rooster"]))

### 9. Visualize the Confusion Matrix

We visualize the **confusion matrix** to identify which classes are being confused the most.

This is a useful diagnostic to improve model performance.

In [None]:
cm = confusion_matrix(y_test, y_pred, labels=[1, 0])
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues",
            xticklabels=["rooster", "non_rooster"],
            yticklabels=["rooster", "non_rooster"])
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()

### 9. ❌ Step 9 — Inspect Misclassifications

We visualize the **confusion matrix** to identify which classes are being confused the most.

This is a useful diagnostic to improve model performance.


Finally, we inspect a few specific errors made by the model, including the audio file, the true and predicted labels, and the model's confidence.

This helps us understand failure modes and improve the dataset or the model.

In [None]:
probas = model.predict_proba(X_test)
errors = np.where(y_test != y_pred)[0]

print(f"❌ Number of errors: {len(errors)}")

for idx in errors[:10]:
    file = metadata_test.iloc[idx]["filename"]
    true = "rooster" if y_test[idx] == 1 else "non_rooster"
    pred = "rooster" if y_pred[idx] == 1 else "non_rooster"
    conf = np.max(probas[idx])
    print(f"\n🔍 {file}\n✅ True: {true}\n❌ Predicted: {pred}\n🔢 Confidence: {conf:.2f}")