# Gender Recognition Based on Voice msms

By Laxman Singh Tomar

---

A lot can be achieved by analyzing Voice in Speech Analytics. And one of the most foundational tasks can be: Identifying the Gender with the help of Voice. In this project, I'll analyze and cover the workflow of how to detect the gender of the speaker using **MFCC** (Mel Frequency and Cepstral Coefficients) and **GMM** (Gaussian Mixture Models). I'll make use of the mentioned techniques to achieve noteworthy performance.

---

# Outline

1. Introduction to Project
    - 1.1 Project Objective
    - 1.2 Historical Context
    - 1.3 Project Workflow
    
    
2. Project Setup
    - 2.1 Importing the Libraries
    - 2.2 Importing the Data
    - 2.3 Managing the Data
    
    
3. What is MFCC?
    - 3.1 Building the Features Extractor


4. What are GMMs?
    - 3.2 Training the Models


5. Identifying the Gender


6. Conclusions and Analysis


7. Acknowledgements and References

---

# 1. Introduction of Project

## 1.1 Project Objective

To predict the gender of the speaker based on his/her voice samples.

---

## 1.2 Historical Context

Large amounts of computing power available alongwith Artificial Intelligent systems has resulted inflection into capability of machines to recognize the voices. Faster Processing and large amount of Speech Data available makes the performance of these sytems roughly on par with humans. From **Audrey** a speech recognizing system which could recognize a single voice speaking digits aloud at Bell Labs in 1952; we've reached to having day to day conversations with voice assistants like Google Assistant and Siri in our smartphones.

But most of these systems are usually neutral to the gender of the speaker and results being given. Having systems which can respond as per the user's gender is indeed an amazing capability. A large amount of tasks which are based on gender preferences can be handled by them. It results into better customer service and enhances user experience.

![waveform.jpg](attachment:waveform.jpg)

---

## 1.3 Project Workflow 

![Voice_Detection_Workflow.png](attachment:Voice_Detection_Workflow.png)

---

# 2. Project Setup

## 2.1 Importing the Libraries

Here I'm using one of the standard machine learning libraries which are available in scikit-learn along with numpy and pandas for data manipulation.

In [1]:
# Importing Libraries and Modules

# For Importing Files
import os
import sys
import math
import tarfile

# For Data Manipulation
import numpy as np
import pandas as pd

# For Audio Files Processing
from scipy.io.wavfile import read
from sklearn.mixture import GaussianMixture as GMM
from python_speech_features import mfcc
from python_speech_features import delta 
from sklearn import preprocessing

# To Ignore Warnings
import warnings
warnings.filterwarnings('ignore')

# To Save Models
import pickle

---

## 2.2 Importing the Data

Data about voice samples of males and females is **The Free ST American English Corpus dataset** which can be downloaded from [here](http://www.openslr.org/45)!. It contains utterances from 10 speakers, 5 from each gender. Each speaker has about 350 utterances.

Once you download your dataset, you need to split it into two parts:Training Set and Testing Set.

- **Training Set** : It's used to train the gender models. 


- **Testing Set** : It's used for testing the accuracy of the gender recognition.

The spilliting criterion depends totally on you. I'll prefer going with 2/3 for Training Set and rest for Testing Set. I'll create a class which will help us managing and formatting our data. We may need functions for following tasks:

1. A function for getting the path where our compressed dataset resides.


2. A function to extract files out of our compressed dataset.


3. A function to create separate folders for our training and testing files.


4. A function which can fill filenames into an empty dictionary.


5. A function which can move files into their respective folders.


6. And ofcourse a driver function for all of the above functions.


---

## 2.3 Managing the Data

In [2]:
class Data_Manager:
    # Function #1
    
    def __init__(self, dataset_path):
        self.dataset_path = dataset_path

#-------------------------------------------------------------------------------------------------------------------------------
    # Function #2
    
    def extract_dataset(self, compressed_dataset_file_name, dataset_directory):
        try:
            tar = tarfile.open(compressed_dataset_file_name, "r:gz")
            tar.extractall(dataset_directory)
            tar.close()
            print("Files extraction was successful!")

        except:
            print("No extraction was performed !")
            
#-------------------------------------------------------------------------------------------------------------------------------
    # Function #3
    
    def make_folder(self, folder_path):
        try:
            os.mkdir(folder_path)
            print(folder_path, "was created !")
        except:
            print("Exception raised: ", folder_path, "could not be created !")
            
#-------------------------------------------------------------------------------------------------------------------------------
    # Function #4
    
    def get_fnames_from_dict(self, dataset_dict, f_or_m):
        training_data, testing_data = [], []

        for i in range(1,5):
            length_data       = len(dataset_dict[f_or_m +"000" + str(i)])
            length_separator  = math.trunc(length_data*2/3)

            training_data += dataset_dict[f_or_m + "000" + str(i)][:length_separator]
            testing_data  += dataset_dict[f_or_m + "000" + str(i)][length_separator:]

        return training_data, testing_data
    
#------------------------------------------------------------------------------------------------------------------------------   
    # Function #5
    
    def move_files(self, src, dst, group):
        for fname in group:
            os.rename(src + '/' + fname, dst + '/' + fname)

#------------------------------------------------------------------------------------------------------------------------------
    # Function #6
    
    def manage(self):

        compressed_dataset_file_name = self.dataset_path
        dataset_directory = compressed_dataset_file_name.split(".")[0]

        try:
            os.mkdir(dataset_directory)
        except:
            pass

        self.extract_dataset(compressed_dataset_file_name, dataset_directory)

        file_names   = [fname for fname in os.listdir(dataset_directory) if ("f0" in fname or "m0" in fname)]
        dataset_dict = {"f0001": [], "f0002": [], "f0003": [], "f0004": [], "f0005": [],
                        "m0001": [], "m0002": [], "m0003": [], "m0004": [], "m0005": [], }

        for fname in file_names:
            dataset_dict[fname.split('_')[0]].append(fname)

        training_set, testing_set = {},{}
        training_set["females"], testing_set["females"] = self.get_fnames_from_dict(dataset_dict, "f")
        training_set["males"  ], testing_set["males"  ] = self.get_fnames_from_dict(dataset_dict, "m")

        self.make_folder("TrainingData")
        self.make_folder("TestingData")
        self.make_folder("TrainingData/females")
        self.make_folder("TrainingData/males")
        self.make_folder("TestingData/females")
        self.make_folder("TestingData/males")

        self.move_files(dataset_directory, "TrainingData/females", training_set["females"])
        self.move_files(dataset_directory, "TrainingData/males",   training_set["males"])
        self.move_files(dataset_directory, "TestingData/females",  testing_set["females"])
        self.move_files(dataset_directory, "TestingData/males",    testing_set["males"])

#-------------------------------------------------------------------------------------------------------------------------------

if __name__== "__main__":
    data_manager = Data_Manager("SLR45.tgz")
    data_manager.manage()

No extraction was performed !
Exception raised:  TrainingData could not be created !
Exception raised:  TestingData could not be created !
Exception raised:  TrainingData/females could not be created !
Exception raised:  TrainingData/males could not be created !
Exception raised:  TestingData/females could not be created !
Exception raised:  TestingData/males could not be created !


Let me expain briefly what I've done here:

1. **Function #1** : It gets path where our dataset resides!


2. **Function #2** : It extracts tar format file to a directory.


3. **Function #3** : It creates a folder for the Data.


4. **Function #4** : Create dictionaries from Training Set and Testing Set.


5. **Function #5** : Move files to their respective folders.


6. **Function #6** : It reads file & creates folder for the data where it'll decompress our dataset. Later it'll select our files and fill them in our dictionary, divide and group our file names. And finally when we're done creating folders for our files, it'll move them into their respective folders.

---

# 3. What is MFCC?

It's time to build a feature extractor now. There can be many acoustic features which can help distinguishing males from females, but I'm here gonna use **MFCC** or **Mel Frequency Cepstral Co-efficients**, since it's one of the best acoustic feature in terms of results. Generally here's how they're derived:

1. Take the Fourier transform of (a windowed excerpt of) a signal. It transforms the time domain signal into spectral domain signal where source and filter part are now in multiplication.


2. Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.


3. Take the logs of the powers at each of the mel frequencies. It helps in separating source and filter.


4. Take the discrete cosine transform of the list of mel log powers, as if it were a signal. 


5. The MFCCs are the amplitudes of the resulting spectrum.

![mfcc.jpeg](attachment:mfcc.jpeg)

---

## 3.1 Building Features Extractor

To extract MFCC features, I'm gonna make use of a python module named:python_speech_features. It's simple to apply, and has a good documentation for support.

It's best suitable to build a class and encaspulate a function which does features extraction for us:

In [3]:
class Features_Extractor:
    def __init__(self):
        pass
       
    def extract_features(self, audio_path):
        rate, audio  = read(audio_path)
        mfcc_feature = mfcc(audio, rate, winlen = 0.05, winstep = 0.01, numcep = 5, nfilt = 30, nfft = 800, appendEnergy = True)
      
        mfcc_feature  = preprocessing.scale(mfcc_feature)
        deltas        = delta(mfcc_feature, 2)
        double_deltas = delta(deltas, 2)
        combined      = np.hstack((mfcc_feature, deltas, double_deltas))
        return combined

Let's see what I've just done here! I've built a function which extracts MFCC from audio files and performs the CMS normalization and later I've combined it with MFCC deltas and double_deltas. It takes audio_path i.e path to the audio wave and returns an array or extracted features matrix.

MFCC function has several arguments, they signify:

- **audio**: Audio signal from which we've to compute features


- **rate** : Sample rate of the audio signal we're working with


- **winlen**: Length of the analysis window in seconds; default is 0.025s (25 milliseconds)


- **winstep**: Default step between successive windows in seconds; default is 0.01s (10 milliseconds)


- **numcep**: Number of Cepstrum to return; default is 13


- **nfilt**: Number of filters in the filterbank; default is 26


- **nfft**: Size of the fft; default is 512


- **appendEnergy**: If it's set True, the zeroth cepstral coefficient is replaced with log of total frame energy

---

# 4. What are GMMs?

> A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of Gaussian component densities. GMMs are commonly used as a parametric model of the probability distribution of continuous measurements or features in a biometric system, such as vocal-tract related spectral features in a speaker recognition system. GMM parameters are estimated from training data using the iterative Expectation-Maximization (EM) algorithm or Maximum A Posteriori(MAP) estimation from a well-trained prior model. 
>
> [D. Reynolds](https://www.semanticscholar.org/paper/Gaussian-Mixture-Models-Reynolds/734b07b53c23f74a3b004d7fe341ae4fce462fc6)

![gmm.png](attachment:gmm.png)

A Gaussian Mixture Model popularly known as GMM is a probabilistic clustering model for reprenting a certain data distribution as a sum of Gaussian Density Functions. These densities forming a GMM are also known as components of GMM. The likelihood of a data point is given by the following equation:


$P(X|\lambda) = \sum_{k=1}^{K} w_k P_k(X|\mu_k, \Sigma_k)  $




where $P_k(X|\mu_k, \Sigma_k) $ is the Gaussian Distribution:




$P_k(X|\mu_k,\Sigma_k) = \frac{1}{{\sqrt{2\pi|\Sigma_k|}}} \thinspace e^{\frac{1}{2}(X-\mu_k)^T \Sigma^{-1}(X-\mu_k)}$   

where:

$\lambda$ : It represents Training Data.

$\mu$ : It represents the mean.

$\Sigma$ : It represents the co-variance matrices.

$w_k$ : It represents the weights.

$k$ : It represents the index of the components.

---

## 4.1 Training the Models

I'm going to build a class where I'll train my audio samples. It'll be a tedious task if I'll write it in separate cells, so I'll stick with the same cell. Let's see what I aim to achieve here:

1. A function which can assign paths to where our voice samples resides


2. A function which collects voice features from the files


3. A function where I will generate GMM Models and later would fit our features


4. A function where I will save our newly constructed GMM Models

In [4]:
class Models_Trainer:
    
    # Function #1
    
    def __init__(self, females_files_path, males_files_path):
        self.females_training_path = females_files_path
        self.males_training_path   = males_files_path
        self.features_extractor    = Features_Extractor()
#-----------------------------------------------------------------------------------------------------------------------------        
    
    # Function #2
    
    def get_file_paths(self, females_training_path, males_training_path):
        females = [ os.path.join(females_training_path, f) for f in os.listdir(females_training_path) ]
        males   = [ os.path.join(males_training_path, f) for f in os.listdir(males_training_path) ]
        return females, males
    
#-----------------------------------------------------------------------------------------------------------------------------
    
    # Function #3  
    
    def collect_features(self, files):
        features = np.asarray(())
        
        for file in files:
            print("%5s %10s" % ("Processing ", file))
            vector = self.features_extractor.extract_features(file)
            if features.size == 0: 
                features = vector
            else:
                features = np.vstack((features, vector))
        return features
    
#------------------------------------------------------------------------------------------------------------------------------ 
    # Function #4
    
    def process(self):
        females, males = self.get_file_paths(self.females_training_path,self.males_training_path)
        
        female_voice_features = self.collect_features(females)
        male_voice_features   = self.collect_features(males)
        
        females_gmm = GMM(n_components = 16, max_iter = 200, covariance_type='diag', n_init = 3)
        males_gmm   = GMM(n_components = 16, max_iter = 200, covariance_type='diag', n_init = 3)
        
        females_gmm.fit(female_voice_features)
        males_gmm.fit(male_voice_features)
        
        self.save_gmm(females_gmm, "females")
        self.save_gmm(males_gmm,   "males")

#-----------------------------------------------------------------------------------------------------------------------------
    # Function #5
    
    def save_gmm(self, gmm, name):

        filename = name + ".gmm"
        
        with open(filename, 'wb') as gmm_file:
            pickle.dump(gmm, gmm_file)
        print ("%5s %10s" % ("Saving", filename,))

#-----------------------------------------------------------------------------------------------------------------------------
if __name__== "__main__":
    models_trainer = Models_Trainer("TrainingData/females", "TrainingData/males")
    models_trainer.process()

ValueError: Expected 2D array, got 1D array instead:
array=[].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Okay, I'll explain what I've done here. Lemme go through each function and succinctly tell you what's happening:

1. **Function #1** : It assigns the paths of the female and male audio samples to their respective variables; signifying that these are training samples.


2. **Function #2** : It gets the file paths and stores them in their respective appropriate variable names.


3. **Function #3** : It collects various features from the people of the same gender. It takes up audio samples, and returns extracted features matrix. It extracts MFCC and delta features and stacks them.


4. **Function #4** : This function gathers features from Function #3, generates GMM Models and later fits features collected to them. There are 2 separate models for males and females. Finally, generated models are saved.


5. **Function #5** : It's always to better to save your models so you don't have to iterate the whole process again.It takes the GMM models and the filename. Pickle Module is used to dump the models just generated.

---

# 5. Identifying the Gender

Finally, all the pieces are about to chip in the right place. We've already collected features, fitted them to our generated GMM models. It's time to see how it works on samples it hasn't seen yet!

I'm going to create a class once again, which encapsulates several functions. Let's see what I wish to achieve here:

1. A function for necessary variables and to load our previously saved GMM models.


2. A function which can return where our voice samples to be tested resides.


3. A function to identify the gender by computing the likelihood of male and female voice samples.


4. A function which can read the samples and can declare the better likelihood out of two and to predict results.

In [5]:
class Gender_Identifier:
    
    # Function #1
    
    def __init__(self, females_files_path, males_files_path, females_model_path, males_model_path):
        self.females_training_path = females_files_path
        self.males_training_path   = males_files_path
        self.error                 = 0
        self.total_sample          = 0
        self.features_extractor    = Features_Extractor()
        
        self.females_gmm = pickle.load(open(females_model_path, 'rb'))
        self.males_gmm   = pickle.load(open(males_model_path, 'rb'))

#------------------------------------------------------------------------------------------------------------------------------
    
    # Function #2
    
    def get_file_paths(self, females_training_path, males_training_path):

        females = [ os.path.join(females_training_path, f) for f in os.listdir(females_training_path) ]
        males   = [ os.path.join(males_training_path, f) for f in os.listdir(males_training_path) ]
        files   = females + males
        return files

#------------------------------------------------------------------------------------------------------------------------------    
    
    # Function #3
    
    def identify_gender(self, vector):

        female_scores         = np.array(self.females_gmm.score(vector))
        female_log_likelihood = female_scores.sum()
        
        male_scores         = np.array(self.males_gmm.score(vector))
        male_log_likelihood = male_scores.sum()

        print("%10s %5s %1s" % ("+ Female Score",":", str(round(female_log_likelihood, 3))))
        print("%10s %7s %1s" % ("+ Male Score", ":", str(round(male_log_likelihood,3))))

        if male_log_likelihood > female_log_likelihood:
            winner = "male"
        else: 
            winner = "female"
        return winner

    
#---------------------------------------------------------------------------------------------------------------------------
    
    # Function #4
    
    def process(self):
        files = self.get_file_paths(self.females_training_path, self.males_training_path)

        for file in files:
            self.total_sample += 1
            print("%10s %8s %1s" % ("--> Testing", ":", os.path.basename(file)))

            vector = self.features_extractor.extract_features(file)
            winner = self.identify_gender(vector)
            expected_gender = file.split("/")[1][:-26]

            print("%10s %6s %1s" % ("+ Expectation",":", expected_gender))
            print("%10s %3s %1s" %  ("+ Identification", ":", winner))

            if winner != expected_gender:
                self.error += 1
            print("----------------------------------------------------")

        accuracy     = ( float(self.total_sample - self.error) / float(self.total_sample) ) * 100
        accuracy_msg = "*** Accuracy = " + str(round(accuracy, 3)) + "% ***"
        print(accuracy_msg)


#------------------------------------------------------------------------------------------------------------------------------

if __name__== "__main__":
    gender_identifier = Gender_Identifier("TestingData/females", "TestingData/males", "females.gmm", "males.gmm")
    gender_identifier.process()

FileNotFoundError: [Errno 2] No such file or directory: 'females.gmm'

---

# 6. Conclusions & Analysis

Looking at the predictions, it's pretty evident that it resulted into 95.749% accuracy. It maybe different for other voice samples. The accuracy can be further improved using GMM Normalization also known as UBM-GMM system. 

It was fun contributing my time towards this project!

---

# 7. Acknowledgements & References

- Machine Learning in Action: Voice Gender Detection

- Reynolds et al. : Using Adapted Gaussian Mixture Models, Digital signal processing 10.1 (2000)

- Ayoub Malek's Blog