### **Gaussian Mixture Models for Speaker Identification**
 **Goal:** Use Gaussian Mixture Models to learn and identify different speakers based on their voice feature distributions.

In [39]:
import kagglehub
kongaevans_speaker_recognition_dataset_path = kagglehub.dataset_download('kongaevans/speaker-recognition-dataset')

print('Data source import complete.')

Data source import complete.


In [46]:
import os
import numpy as np
import librosa
from sklearn.mixture import GaussianMixture
from sklearn.model_selection import train_test_split

# Base dataset path
BASE_PATH = "/kaggle/input/speaker-recognition-dataset/16000_pcm_speeches"

# List of speaker folders
speaker_folders = [
    "Benjamin_Netanyau",
    "Jens_Stoltenberg",
    "Julia_Gillard",
    "Magaret_Tarcher",
    "Nelson_Mandela"
]

In [48]:
def extract_features_from_files(files, folder_path):
    features = []
    for file in files:
        file_path = os.path.join(folder_path, file)
        y, sr = librosa.load(file_path, sr=None)
        mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
        features.append(np.mean(mfcc.T, axis=0))
    return np.vstack(features)

In [49]:
models = {}
test_data = []

# Train a GMM for each speaker
for speaker in speaker_folders:
    folder_path = os.path.join(BASE_PATH, speaker)
    files = [f for f in os.listdir(folder_path) if f.endswith(".wav")]

    train_files, test_files = train_test_split(files, test_size=0.3, random_state=42)

    # Extract training features
    train_features = extract_features_from_files(train_files, folder_path)

    # Train GMM
    gmm = GaussianMixture(n_components=8, covariance_type='diag', n_init=3)
    gmm.fit(train_features)
    models[speaker] = gmm

    # Save test files for evaluation
    for file in test_files:
        test_data.append((os.path.join(folder_path, file), speaker))

In [50]:
# Prediction function
def predict_speaker(file_path):
    y, sr = librosa.load(file_path, sr=None)
    mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
    feature = np.mean(mfcc.T, axis=0).reshape(1, -1)

    scores = {speaker: model.score(feature) for speaker, model in models.items()}
    return max(scores, key=scores.get)

In [51]:
# Evaluate accuracy
correct = 0
for file_path, true_speaker in test_data:
    predicted_speaker = predict_speaker(file_path)
    if predicted_speaker == true_speaker:
        correct += 1

accuracy = correct / len(test_data)
print(f"Accuracy: {accuracy:.2%}")

Accuracy: 98.09%


### **Summary**

In this project, we built a Speaker Recognition System using audio recordings from 5 different speakers

We extracted MFCC (Mel-Frequency Cepstral Coefficients) features from each .wav file to represent the speaker’s vocal characteristics.

A separate Gaussian Mixture Model (GMM) was trained for each speaker using these features. During prediction, the system calculates the likelihood of the test audio sample under each speaker's GMM and selects the speaker with the highest score.

The dataset was split into training and testing sets, and the final evaluation showed an accuracy of 98.09%.

**Key methods used:**
- MFCC feature extraction (librosa)
- Gaussian Mixture Models (sklearn)
- Multi-class classification
