# Is this voice Male of Female?

In [8]:
from sklearn.tree import DecisionTreeClassifier
import pandas as pd
import numpy as np

# Data preparation

In [9]:
data = pd.read_csv("voice.csv", usecols=range(20), dtype=np.float64)    # read data (without labels)
data.head()

Unnamed: 0,meanfreq,sd,median,Q25,Q75,IQR,skew,kurt,sp.ent,sfm,mode,centroid,meanfun,minfun,maxfun,meandom,mindom,maxdom,dfrange,modindx
0,0.059781,0.064241,0.032027,0.015071,0.090193,0.075122,12.863462,274.402906,0.893369,0.491918,0.0,0.059781,0.084279,0.015702,0.275862,0.007812,0.007812,0.007812,0.0,0.0
1,0.066009,0.06731,0.040229,0.019414,0.092666,0.073252,22.423285,634.613855,0.892193,0.513724,0.0,0.066009,0.107937,0.015826,0.25,0.009014,0.007812,0.054688,0.046875,0.052632
2,0.077316,0.083829,0.036718,0.008701,0.131908,0.123207,30.757155,1024.927705,0.846389,0.478905,0.0,0.077316,0.098706,0.015656,0.271186,0.00799,0.007812,0.015625,0.007812,0.046512
3,0.151228,0.072111,0.158011,0.096582,0.207955,0.111374,1.232831,4.177296,0.963322,0.727232,0.083878,0.151228,0.088965,0.017798,0.25,0.201497,0.007812,0.5625,0.554688,0.247119
4,0.13512,0.079146,0.124656,0.07872,0.206045,0.127325,1.101174,4.333713,0.971955,0.783568,0.104261,0.13512,0.106398,0.016931,0.266667,0.712812,0.007812,5.484375,5.476562,0.208274


In [10]:
y_data = pd.read_csv("voice.csv", usecols=[20])    # read labels
y_data.head()

Unnamed: 0,label
0,male
1,male
2,male
3,male
4,male


In [15]:
data = (data - data.mean()) / data.std()    # normalise the data
x = data.to_numpy(dtype=np.float64)

y = []
for label in y_data['label']:               # map labels to integers
    if label == 'male':
        y.append(0)
    else:
        y.append(1)

y = np.array(y)

In [16]:
x_train = np.vstack((x[:800], x[2300:]))    # test-train split
y_train = np.hstack((y[:800], y[2300:]))

x_test = x[800:2300]
y_test = y[800:2300]

# Model training

In [6]:
model = DecisionTreeClassifier(random_state=1)
model.fit(x_train, y_train)

DecisionTreeClassifier(random_state=1)

# Inference

In [21]:
y_pred = model.predict(x_test)    # make prediction
diff = abs((y_pred - y_test))

print("Accuracy: {:.3f}".format(diff.mean()))

Accuracy: 0.081


# Discussion

First I tried to use a fully-connected NN but it gave poor results (just 50% of accuracy).

I also studied some theory and the pipeline of the digitization of sound like _**framing**_, _**Hamming window**_, _**Fourier transform and its spectrum**_, _**Mel-scale**_ and _**filter banks**_, even called my physics teacher to clarify some details :)

Unfortunately, I did not have enough time to get acquainted with frameworks for audio data. My idea was to use CNN with features in the form of Mel spectogram and then have a look at that model's quality. (of course, not just in this task but also for British and American speech classifier)

Though I do not have enough experience with it, I intended to use some kind of RNN as well.