# Authors
 - **Mohammed Essam Mohammed**       ***20220299***
 - **Amr Eihab Abdel-Zaher**         ***20221110***

 - Dataset: Free Spoken Digit Dataset (FSDD)
 - Total Samples: ~3,000 WAV files
 - Classes: 10 digits (0–9)
 - Features: 16 

### Feature Descriptions

#### 1–13. MFCCs (Mel-Frequency Cepstral Coefficients)
- **Count**: 13 features  
- **What They Represent**:  
    - MFCCs describe the timbre (tone quality) of the sound.
    - They model how humans perceive pitch, especially in speech.
- **How They Are Derived**:  
    1. Taking a short-time Fourier Transform. 
    2. Mapping to the Mel scale (perceived pitch scale).
    3. Applying a logarithm and then a Discrete Cosine Transform.
- **Why Useful**:  
    - They capture the shape of the vocal tract, which is unique for each sound.
    - Perfect for speech-related tasks: recognizing digits, emotions, gender, etc.
    - Think of MFCCs as a fingerprint of the sound’s texture.

#### 14. Spectral Centroidectral Centroid
- **Count**: 1 feature  
- **What It Represents**:  
    - The “center of mass” of the spectrum.      
    - Measures how bright or dark a sound is.  
    - High centroid = more high frequencies.
- **Why Useful**:  
    - Speech sounds like “s”, “f” (fricatives) have high centroids.
    - Helps distinguish between types of spoken digits or speaker styles.
    - A bit like figuring out if a voice is sharp or mellow.

#### 15. Spectral Rolloff
- **Count**: 1 feature  
- **What It Represents**: 
    - The frequency below which a specified percentage (usually 85%) of the total spectral energy is contained.
    - Tells how quickly energy rolls off at high frequencies.
- **Why Useful**:  
    - Helps detect whether a sound has more low- or high-frequency components.
    - Can capture differences in speaker pitch or emphasis.
    - Imagine a sound’s frequency distribution: this says how far it “reaches.”

#### 16. Zero-Crossing Rate (ZCR)
- **Count**: 1 feature  
- **What It Represents**:
    - The rate at which the signal changes from positive to negative or vice versa.
    - High ZCR = more frequent sign changes = noisier or more percussive sound.
- **Why Useful**:  
    - Helps differentiate between voiced (vowels, digits) and unvoiced (like “s”, “t”) sounds.
    - Good for detecting silence, fricatives, and even speaker characteristics.
    - Think of it as a measure of how jittery or smooth the waveform is.

In [1]:
import os
import librosa
import numpy as np
from sklearn.model_selection import train_test_split

In [2]:

def extract_features(file_path):
    y, sr = librosa.load(file_path, sr=8000)
    y, _ = librosa.effects.trim(y)

    mfccs = np.mean(librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13).T, axis=0)
    centroid = np.mean(librosa.feature.spectral_centroid(y=y, sr=sr).T, axis=0)
    rolloff = np.mean(librosa.feature.spectral_rolloff(y=y, sr=sr).T, axis=0)
    zcr = np.mean(librosa.feature.zero_crossing_rate(y).T, axis=0)

    return np.hstack([mfccs, centroid, rolloff, zcr])

def load_fsdd_features(data_path='recordings', selected_digits=['0', '1']):
    X, y = [], []
    for fname in os.listdir(data_path):
        if fname.endswith('.wav') and fname[0] in selected_digits:
            label = int(fname[0])
            features = extract_features(os.path.join(data_path, fname))
            X.append(features)
            y.append(label)
    return np.array(X), np.array(y)

In [3]:
X, y = load_fsdd_features()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

print(f"Training set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")
print(f"Number of features: {X_train.shape[1]}")
print(f"Unique labels in training set: {np.unique(y_train)}")
print(f"Unique labels in test set: {np.unique(y_test)}")



Training set size: 480
Test set size: 120
Number of features: 16
Unique labels in training set: [0 1]
Unique labels in test set: [0 1]




In [4]:
class NaiveBayesClassifier:
    def __init__(self):
        self.classes = None
        self.mean = {}
        self.var = {}
        self.priors = {}

    def fit(self, X, y):
        self.classes = np.unique(y)
        for cls in self.classes:
            X_cls = X[y == cls]
            self.mean[cls] = np.mean(X_cls, axis=0)
            self.var[cls] = np.var(X_cls, axis=0)
            self.priors[cls] = X_cls.shape[0] / X.shape[0]

    def _gaussian_pdf(self, x, mean, var):
        eps = 1e-6  # To avoid division by zero
        coeff = 1.0 / np.sqrt(2.0 * np.pi * var + eps)
        exponent = np.exp(-((x - mean) ** 2) / (2.0 * var + eps))
        return coeff * exponent

    def _predict_single(self, x):
        posteriors = {}
        for cls in self.classes:
            prior = np.log(self.priors[cls])
            likelihood = np.sum(np.log(self._gaussian_pdf(x, self.mean[cls], self.var[cls])))
            posteriors[cls] = prior + likelihood
        return max(posteriors, key=posteriors.get)

    def predict(self, X):
        return np.array([self._predict_single(x) for x in X])

# Training
nb_classifier = NaiveBayesClassifier()
nb_classifier.fit(X_train, y_train)

# Predicting
y_pred = nb_classifier.predict(X_test)


In [5]:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))


              precision    recall  f1-score   support

           0       0.89      0.95      0.92        60
           1       0.95      0.88      0.91        60

    accuracy                           0.92       120
   macro avg       0.92      0.92      0.92       120
weighted avg       0.92      0.92      0.92       120



In [None]:
from collections import Counter
import numpy as np

def bagging_ensemble_nb(X_train, y_train, X_test, n_estimators=10):
    preds = []

    for _ in range(n_estimators):
        
        indices = np.random.choice(len(X_train), size=len(X_train), replace=True)
        X_sample = X_train[indices]
        y_sample = y_train[indices]
        
        model = NaiveBayesClassifier()
        model.fit(X_sample, y_sample)
        preds.append(model.predict(X_test))

    
    final_preds = np.array([
        Counter(sample_preds).most_common(1)[0][0]
        for sample_preds in zip(*preds)
    ])
    
    return final_preds


In [None]:
from sklearn.linear_model import LogisticRegression

def bagging_ensemble_lr(X_train, y_train, X_test, n_estimators=10):
    preds = []

    for _ in range(n_estimators):
        indices = np.random.choice(len(X_train), size=len(X_train), replace=True)
        X_sample = X_train[indices]
        y_sample = y_train[indices]

        model = LogisticRegression(max_iter=1000)
        model.fit(X_sample, y_sample)
        preds.append(model.predict(X_test))

    
    final_preds = np.array([
        Counter(sample_preds).most_common(1)[0][0]
        for sample_preds in zip(*preds)
    ])
    
    return final_preds


In [None]:
from sklearn.metrics import classification_report


y_pred_bag_nb = bagging_ensemble_nb(X_train, y_train, X_test)
print("Bagged Naive Bayes")
print(classification_report(y_test, y_pred_bag_nb))


y_pred_bag_lr = bagging_ensemble_lr(X_train, y_train, X_test)
print("Bagged Logistic Regression")
print(classification_report(y_test, y_pred_bag_lr))


Bagged Naive Bayes
              precision    recall  f1-score   support

           0       0.89      0.95      0.92        60
           1       0.95      0.88      0.91        60

    accuracy                           0.92       120
   macro avg       0.92      0.92      0.92       120
weighted avg       0.92      0.92      0.92       120

Bagged Logistic Regression
              precision    recall  f1-score   support

           0       1.00      0.97      0.98        60
           1       0.97      1.00      0.98        60

    accuracy                           0.98       120
   macro avg       0.98      0.98      0.98       120
weighted avg       0.98      0.98      0.98       120

