# TTT4185 Machine learning for Speech technology

## Computer assignment 2: Classification using the Bayes Decision Rule and Support Vector Machines

This assignment assumes that the student has knowledge about the Bayes Decision Rule, maximum likelihood estimation and support vector machines.

In this assignment we will use `scikit-learn` (http://scikit-learn.org/stable/), which is a powerful and very popular Python toolkit for data analysis and machine learning, and `pandas` (https://pandas.pydata.org), which implements the all-powerful `DataFrame`.

We will also be using a small database of phonemes, where each phoneme is represented by the four first formant positions ("F1"-"F4") and their corresponding bandwidths ("B1"-"B4"). All numbers are in kHz. In addition, the speaker ID and the gender of the speaker are given for each phoneme.

### Problem 1

In this problem we will use the Bayes decision rule to classify vowels based on their formants. The formants have been extracted from the open database `VTR Formants database` (http://www.seas.ucla.edu/spapl/VTRFormants.html) created by Microsoft and UCLA.

(a) Download the files `Train.csv` and `Test.csv` from Blackboard, and load them into a `pandas` dataframe using the command `pd.read_csv`. Using the training data, create a single scatter plot of "F1" vs "F2" for the three vowels
- "ae" as in "bat"
- "ey" as in "bait"
- "ux" as in "boot"

Just eyeing the plots, discuss which classes will be hardest to classify correctly.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib notebook
%matplotlib inline
# Load data
train = pd.read_csv("Train.csv")
test = pd.read_csv("Test.csv")

# Extract vowels
aes = train[train["Phoneme"] == 'ae']
eys = train[train["Phoneme"] == 'ey']
uxs = train[train["Phoneme"] == 'ux']

# Plotting here

plt.figure(figsize=(6,6))
plt.scatter(aes["F1"].values, aes["F2"].values, color='red', label='ae')
plt.scatter(eys["F1"].values, eys["F2"].values, color='blue', label='ey')
plt.scatter(uxs["F1"].values, uxs["F2"].values, color='green', label='ux')
plt.show()


'Ey' vowel (blue) looks the most difficult to classify according to the plot. Its values are overlayed with 'ae' and 'ux' values on F1 and F2.

(b) Use the Bayes Decision Rule to create a classifier for the phonemes 'ae', 'ey' and 'ux' under the following constraints:
- The feature vector $x$ contains the first two formants, "F1" and "F2".
- The distribution of $x$ given a phoneme $c$, $P(x|c)$, is Gaussian.
- Use the maximum likelihood estimator to estimate the model parameters.

In [None]:
import scipy.stats

X_columns = ['SpeakerID', 'F1', 'F2']
X_aes, X_eys, X_uxs = aes[X_columns], eys[X_columns], uxs[X_columns]
X_eg = {'ae': X_aes, 'ey': X_eys, 'ux': X_uxs}

class BayesClassificator:
    def __init__(self, X = X_eg, vowels = ['ae', 'ey', 'ux'], features = ['F1', 'F2'], cov_mode: str = 'cov'):
        self.vowels = vowels
        self.features = features
        X_t = { t: X[t][self.features] for t in self.vowels }
        
        self.mean_ = { t: X_t[t].mean() for t in self.vowels }
        self.cov_ = { t: X_t[t].cov() for t in self.vowels } if cov_mode == 'cov' else { t: np.diag(X_t[t].cov()) for t in self.vowels } if cov_mode == 'answer_g' else Exception("Not Implemented")

        len_Xs = 0
        for x in X_t.values(): len_Xs += len(x) 
        len_Xs = len(features)
        self.prior_ = { t: len(X_t[t]) / len_Xs for t in self.vowels }
    
    def likelihood(self, x):
        return { t: scipy.stats.multivariate_normal.pdf(x, mean=self.mean_[t], cov=self.cov_[t]) for t in self.vowels }
    
    def posterior(self, x):
        likelihood_ = self.likelihood(x)
        evidence = self.evidence(x)
        return { t: float(likelihood_[t] * self.prior_[t] / evidence) for t in self.vowels}

    def evidence(self, x):
        likelihood_ = self.likelihood(x)
        return sum(likelihood_[t] * self.prior_[t] for t in self.vowels)

    def predict(self, x):
        posterior_ = self.posterior(x)
        prediction = self.vowels[0]
        for t in self.vowels:
            if posterior_[t] > posterior_[prediction]: prediction = t
        return prediction
        
    def predict_whole_frame(self, X, target):
        error = 0
        
        for x in np.array(X[self.features]):
            if self.predict(x) != target: error += 1
        return error

In [None]:
X = {'ae': X_aes, 'ey': X_eys, 'ux': X_uxs}
vowels = ['ae', 'ey', 'ux']
bc = BayesClassificator()
for k, x in X.items():
    print(f"success for {k} vowel:", 1 - bc.predict_whole_frame(X=x, target=k) / len(x))

(c) To visualize the classes models and the classifier created in (b), plot the contours for each Gaussian distribution in the model, that is the class conditional likelihoods $P(x|c)$, by using the following function.

In [None]:
import scipy.stats

def plotGaussian(mean, cov, color, ax):
    """ 
        Creates a contour plot for a bi-variate normal distribution
        
        mean: numpy array 2x1 with mean vector
        cov: numpy array 2x2 with covarince matrix
        color: name of color for the plot (see https://matplotlib.org/stable/gallery/color/named_colors.html)
        ax: axis handle where the plot is drawn (can for example be returned by plt.gca() or plt.subplots())
    """
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()
    x, y = np.mgrid[xlim[0]:xlim[1]:(xlim[1]-xlim[0])/500.0, ylim[0]:ylim[1]:(ylim[1]-ylim[0])/500.0]
    xy = np.dstack((x, y))
    mvn = scipy.stats.multivariate_normal(mean, cov)
    lik = mvn.pdf(xy)
    ax.contour(x,y,lik,colors=color)

In [None]:
vowels = ['ae', 'ey', 'ux']
colors = { vowels[k]: ['red', 'blue', 'green'][k] for k in range(len(vowels))}
X_ = X_eg

minF1, maxF1 = pd.concat([X_[t]['F1'] for t in vowels]).min(), pd.concat([X_[t]['F1'] for t in vowels]).max()
minF2, maxF2 = pd.concat([X_[t]['F2'] for t in vowels]).min(), pd.concat([X_[t]['F2'] for t in vowels]).max()

_, ax = plt.subplots()
ax.set_xlim(minF1, maxF1), ax.set_ylim(minF2, maxF2)
[plotGaussian(X_[t][['F1','F2']].mean(), X_[t][['F1','F2']].cov(), color=colors[t], ax=ax) for t in vowels]
plt.grid()
plt.show()

*Try:* Plot the decision regions for the Bayesian classifier. Tips: Calculate the posterior for each class, use the `numpy.argmax` function to get the decision regions, and `matplotlib.pyplot.contourf` to plot them.

In [None]:
X = {'ae': X_aes[['F1', 'F2']], 'ey': X_eys[['F1', 'F2']], 'ux': X_uxs[['F1', 'F2']]}
bc = BayesClassificator()

posterior_ = {}
for t in bc.vowels:
    for x in np.array(X[t]):
        likelihood_ = bc.likelihood(x)
        evidence = bc.evidence(x)
        posterior_[tuple(x)] = { t: likelihood_[t] * bc.prior_[t] / evidence for t in bc.vowels }
posterior_


argmax = { t: np.array(X[t])[np.argmax([posterior_[tuple(x)][t] for x in np.array(X[t])])] for t in bc.vowels}

_, ax = plt.subplots()
ax.set_xlim(minF1, maxF1), ax.set_ylim(minF2, maxF2)
xlim = ax.get_xlim()
ylim = ax.get_ylim()
x, y = np.mgrid[xlim[0]:xlim[1]:(xlim[1]-xlim[0])/500.0, ylim[0]:ylim[1]:(ylim[1]-ylim[0])/500.0]

(d) Test your classifier on the 'ae', 'ey' and 'ux' phonemes from the test set and present your results in a _confusion matrix_, that is, a table where you see how many times 'ae' was correctly classified, how many times it was wrongly classified as 'ey' and so on.

In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

test = pd.read_csv("Test.csv")
bc = BayesClassificator()

X_test = { t: np.array(test[test["Phoneme"] == t][['F1', 'F2']]) for t in bc.vowels }


prediction = []
actual = []
for t in bc.vowels:
    for x in X_test[t]:
        prediction.append(bc.predict(x))
        actual.append(t)

print("Confusion matrix of the test set:\n", confusion_matrix(actual, prediction))

(e) Extend your classifier to include the features "F1"-"F4" and compare the results with those in (d). Finally use all available information "F1"-"F4" and "B1-B4". How does the performance of this classifier compare with the simpler classifiers using fewer features?

In [None]:
features = ['F1', 'F2', 'F3', 'F4']
vowels = ['ae', 'ey', 'ux']
X_train = { t: train[train["Phoneme"] == t][features] for t in vowels }

bcF1_F4 = BayesClassificator(X=X_train, features=features)
X_test = { t: test[test["Phoneme"] == t][features] for t in vowels }

print("Model with features F1 - F4 included:\n")
print('Results on training:')
for k, x in X_train.items():
    print(f"success for {k} vowel:", 1 - bcF1_F4.predict_whole_frame(X=x, target=k) / len(x))
print('\nResults on testing:')
for k, x in X_test.items():
    print(f"success for {k} vowel:", 1 - bcF1_F4.predict_whole_frame(X=x, target=k) / len(x))

prediction = []
actual = []
for t in vowels:
    for x in np.array(X_test[t]):
        prediction.append(bcF1_F4.predict(x))
        actual.append(t)

print("\nConfusion matrix of the test set:\n", confusion_matrix(actual, prediction))

In [None]:
features = ['F1', 'F2', 'F3', 'F4', 'B1', 'B2', 'B3', 'B4']

X_train = { t: train[train["Phoneme"] == t][features] for t in vowels }

bcF1_B4 = BayesClassificator(X=X_train, features=features)
X_test = { t: test[test["Phoneme"] == t][features] for t in vowels }

print("Model with features F1 - F4 - B1 - B4 included:\n")
print('Results on training:')
for k, x in X_train.items():
    print(f"success for {k} vowel:", 1 - bcF1_B4.predict_whole_frame(X=x, target=k) / len(x))
print('\nResults on testing:')
for k, x in X_test.items():
    print(f"success for {k} vowel:", 1 - bcF1_B4.predict_whole_frame(X=x, target=k) / len(x))

predictions = []
actual = []
score = 0
for t in bcF1_B4.vowels:
    for x in np.array(X_test[t]):
        prediction = bcF1_B4.predict(x)
        if prediction == t: score += 1
        predictions.append(prediction)
        actual.append(t)

print("\nScore:", score / len(predictions))
print("\nConfusion matrix of the test set:\n", confusion_matrix(actual, predictions))

(f) We want to make the model slightly more powerful by modeling the feature vector conditional on both the vowel and gender of speaker, that is $P(x|g,c)$, where $g$ is the gender of the speaker and $c$ is the phoneme label. Show how these models can be used for phoneme classification using marginalization over the gender.

Assume that $P(x|g,c)$ is a multivariate Gaussian and compute the maximum likelihood estimates for the models. Compare the result on the test set with the results in (e).

In [None]:
def evaluate_bayes_classificator_with_gender(features = ['F1', 'F2'], cov_mode = 'cov'):
    vowels = ['ae', 'ey', 'ux']
    genders = ['F', 'M']
    X_train = { g: {} for g in genders}
    for t in vowels:
        for g in genders:
            X_train[g][t] = train[(train["Phoneme"] == t) & (train["Gender"] == g)][features]

    bc_features = { g: BayesClassificator(X=X_train[g], features=features, vowels=vowels, cov_mode=cov_mode) for g in genders }

    X_test = { g: {} for g in genders }
    for t in vowels:
        for g in ['F', 'M']:
            X_test[g][t] = test[(test["Phoneme"] == t) & (test["Gender"] == g)][features]
    
    print(f"Model with features {features[0]} - {features[len(features) - 1]} knowing the gender:\n")
    
    train_card_ = { g:{ t: 0 for t in vowels } for g in genders }
    test_card_ = { g:{ t: 0 for t in vowels } for g in genders }

    for t in vowels:
        for g in genders:
            train_card_[g][t] += len(X_train[g][t])
            test_card_[g][t] += len(X_test[g][t])

    scores_train = { g: {} for g in genders }
    scores_test = { g: {} for g in genders }
    for g in genders: 
        for k, x in X_train[g].items():
            scores_train[g][k] = 1 - bc_features[g].predict_whole_frame(X=x, target=k) / len(x)
        for k, x in X_test[g].items():
            scores_test[g][k] = 1 - bc_features[g].predict_whole_frame(X=x, target=k) / len(x)
    
    prediction = []
    actual = []
    for g in genders:
        for t in bc_features[g].vowels:
            for x in np.array(X_test[g][t]):
                prediction.append(bc_features[g].predict(x))
                actual.append(t)

    score_train = 0
    score_test = 0
    total_train = 0
    total_test = 0
    for t in vowels:
        score_train += (scores_train['F'][t] * len(X_train['F'][t]) + scores_train['M'][t] * len(X_train['M'][t])) 
        score_test += (scores_test['F'][t] * len(X_test['F'][t]) + scores_test['M'][t] * len(X_test['M'][t])) 
        total_train += (len(X_train['F'][t]) + len(X_train['M'][t]))
        total_test += (len(X_test['F'][t]) + len(X_test['M'][t]))
        print(f"Score on phoneme {t} on training: {(scores_train['F'][t] * len(X_train['F'][t]) + scores_train['M'][t] * len(X_train['M'][t])) / (len(X_train['F'][t]) + len(X_train['M'][t])) }")
        print(f"Score on phoneme {t} on testing: {(scores_test['F'][t] * len(X_test['F'][t]) + scores_test['M'][t] * len(X_test['M'][t])) / (len(X_test['F'][t]) + len(X_test['M'][t])) }")


    print("\nScore on training:", score_train / total_train)
    print("Score on testing:",  score_test / total_test)
    print("\nConfusion matrix of the test set:\n", confusion_matrix(actual, prediction))

In [None]:
evaluate_bayes_classificator_with_gender(['F1', 'F2'])

In [None]:
evaluate_bayes_classificator_with_gender(['F1', 'F2', 'F3', 'F4'])

In [None]:
evaluate_bayes_classificator_with_gender(['F1', 'F2', 'F3', 'F4', 'B1', 'B2', 'B3', 'B4'])

(g) When using Gaussian classifiers we often avoid computing the entire covariance matrix, but instead we only use the diagonal of the matrix. Repeat the results in (f) using only diagonal covariance matrices and compare the results.

In [None]:
evaluate_bayes_classificator_with_gender(['F1', 'F2', 'F3', 'F4'], cov_mode='answer_g')

In [None]:
evaluate_bayes_classificator_with_gender(['F1', 'F2', 'F3', 'F4', 'B1', 'B2', 'B3', 'B4'], cov_mode='answer_g')

These results are a bit lower than the previous one (except to predict ux on testing). This result is not surprising because we remove a lot of complexity. However this results can be satisfying.

### Problem 2

In this problem we use the support vector machine (SVM) to build classifiers. We use the same dataset as in Problem 1. It is up to you to select which features to use.

We use the function `sklearn.svm.SVC` from `scikit-learn` in this problem. First you need to get your data on the format that `SVC` expects, which is a matrix where every row is a feature vector, and a list of integer labels corresponding to each row. We suggest using "ae" = 0, "ey" = 1 and "ux" = 2.

An example on how to use the `SVC` is given in http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC. In short, we do the following (for a linear kernel):
- Instantiate an SVC object: `cls = SVC(kernel='linear')`
- Train the SVM using the feature vector matrix `train_X`, and label vector `train_Y`: `cls.fit(train_X, train_Y)`
- Predict labels on the test set `Test_X` using: `cls.predict(Test_X)`

You can use or adapt the following functions to visualize the SVM decision regions and support vectors in 2D.

In [None]:
import seaborn as sns
from sklearn.preprocessing import LabelEncoder

import warnings
warnings.filterwarnings('ignore')

def Plot_SVM_decision_regions(clf,data,labels):
    '''
    This function is for plotting the decision area of SVM
    
    Args:
    - clf: SVM model
    - data: Data with two features
    - labels: Corresponding labels of the data
    '''
    phonemes = np.array(["ae","ey","ux"])
    x_min, x_max = data[:,0].min() - 0.2, data[:,0].max() + 0.2
    y_min, y_max = data[:,1].min() - 0.2, data[:,1].max() + 0.2
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.002),np.arange(y_min, y_max, 0.002))
    label_encoder = LabelEncoder()
    integer_encoded = label_encoder.fit_transform(phonemes)
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = label_encoder.transform(Z)
    Z = Z.reshape(xx.shape)
    #Plotting
    plt.figure(figsize=(10,6))
    # sns.scatterplot(data[:,0],data[:,1],hue=labels)
    plt.scatter(data[:,0],data[:,1])
    plt.contourf(xx, yy, Z, cmap=plt.cm.ocean, alpha=0.2)
    plt.legend()
    plt.title('Decision Area of SVM')
    plt.show()

def Plot_Support_Vectors(clf,data):
    '''
    This function is for plotting the support vectors of the SVM model
    
    Args:
    - clf: SVM model
    - data: Data with two features
    '''
    x_min, x_max = data[:,0].min() - 0.2, data[:,0].max() + 0.2
    y_min, y_max = data[:,1].min() - 0.2, data[:,1].max() + 0.2
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.002),np.arange(y_min, y_max, 0.002))
    label_encoder = LabelEncoder()
    phonemes = np.array(["ae","ey","ux"])
    integer_encoded = label_encoder.fit_transform(phonemes)
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = label_encoder.transform(Z)
    Z = Z.reshape(xx.shape)
    #Plotting
    plt.figure(figsize=(10,6))
    plt.scatter(clf.support_vectors_[:,0], clf.support_vectors_[:,1], c='k',alpha=0.4,label='support vector')
    plt.contourf(xx, yy, Z, cmap=plt.cm.ocean, alpha=0.2)
    plt.legend()
    plt.title('Support Vectors')
    plt.show()

(a) Create a linear SVM with different penalty terms $C=\{0.1, 1, 10\}$ and compare with the results in Problem 1.

In [None]:
from sklearn.svm import SVC, LinearSVC
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification

C = [ 0.1, 1, 10 ]
features = ['F1', 'F2', 'F3', 'F4', 'B1', 'B2', 'B3', 'B4']

X_train_linear = []
y_train_linear = []
X_test_linear = []
y_test_linear = []

for t in vowels:
    for x in np.array(train[(train["Phoneme"] == t)][features]):
        X_train_linear.append(x)
        y_train_linear.append(t)
        
    for x in np.array(test[(test["Phoneme"] == t)][features]):
        X_test_linear.append(x)
        y_test_linear.append(t)

clf_linear = {}
y_pred_linear = {}
score_linear = {}
for c in C:
    # clf_linear[c] = make_pipeline(StandardScaler(), LinearSVC(dual="auto", random_state=0, tol=1e-6, C=c))
    clf_linear[c] = LinearSVC(dual="auto", random_state=0, tol=1e-6, C=c)
    clf_linear[c].fit(X_train_linear, y_train_linear)
    y_pred_linear[c] = clf_linear[c].predict(X_test_linear)
    score_linear[c] = clf_linear[c].score(X_test_linear, y_test_linear)
    print(f"Confusion matrix of linear SVM with penalty term C = {c}:\n", confusion_matrix(y_test_linear, y_pred_linear[c]))
    print('Score (success / length(y)): ', score_linear[c], end="\n\n" )


This results are a bit worst than the results found in problem 1 when knowing features F1 to B4 without knowing the gender (except to predict 'ey'). As a reminder, we get: 

Score: 0.7121771217712177

Confusion matrix of the test set:

 [[82 23  0]

 [13 90 11]
 
 [ 3 28 21]]

If we take all the features into consideration, the model in problem 1 seams to be better than the linear SVM model (for every penalty coefficient). We get higher scores and the components of the diagonal of the confusion matrix are higher. 

If we do not look on the gender (as following), we get a similar score than the problem 1 model ($\approx 0.71$). 

In [None]:
C = [ 0.1, 1, 10 ]
features = ['F1', 'F2', 'F3', 'F4', 'B1', 'B2', 'B3', 'B4']
genders = ['F', 'M']

X_train_linear = { g: [] for g in genders}
y_train_linear = { g: [] for g in genders}
X_test_linear = { g: [] for g in genders}
y_test_linear = { g: [] for g in genders}


for g in genders:
    for t in vowels:
        for x in np.array(train[(train["Phoneme"] == t)& (train["Gender"] == g)][features]):
            X_train_linear[g].append(x)
            y_train_linear[g].append(t)
            
        for x in np.array(test[(test["Phoneme"] == t)& (test["Gender"] == g)][features]):
            X_test_linear[g].append(x)
            y_test_linear[g].append(t)

clf_linear = {g: {} for g in genders}
y_pred_linear = {g: {} for g in genders}
score_linear = {g: {} for g in genders}
for c in C:
    print(f"Penalty term C = {c}:")
    for g in genders:
        clf_linear[g][c] = make_pipeline(StandardScaler(), LinearSVC(dual="auto", random_state=0, tol=1e-6, C=c))
        # clf_linear[g][c] = LinearSVC(dual="auto", random_state=0, tol=1e-6, C=c)
        clf_linear[g][c].fit(X_train_linear[g], y_train_linear[g])
        y_pred_linear[g][c] = clf_linear[g][c].predict(X_test_linear[g])
        score_linear[g][c] = clf_linear[g][c].score(X_test_linear[g], y_test_linear[g])
        print(f"Confusion matrix of linear SVM knowing gender {g}\n", confusion_matrix(y_test_linear[g], y_pred_linear[g][c]))
        print(f'Score (success / length(y)), knowing gender {g}: ', score_linear[g][c], end="\n\n" )

The results are similar to the one in the previous case.

In [None]:
C = [ 0.1, 1, 10 ]

vowel_labels = { vowels[k]: k for k in range(len(vowels)) }
vowel_labels = { vowels[k]: vowels[k] for k in range(len(vowels)) }
features = ['F1', 'F2']
X_train = []
y_train = []
for t in vowels:
    for x in np.array(train[train["Phoneme"] == t][features]):
        X_train.append(x)
        y_train.append(vowel_labels[t])

X_test = []
y_test = []
for t in vowels:
    for x in np.array(test[test["Phoneme"] == t][features]):
        X_test.append(x)
        y_test.append(vowel_labels[t])

clf_linear_to_plot = {}
y_pred = {}
score = {}

X_train, X_test = np.array(X_train), np.array(X_test)
X_train[:, 0] = (X_train[:, 0] - min(X_train[:, 0])) / (max(X_train[:, 0]) - min(X_train[:, 0]))
X_train[:, 1] = (X_train[:, 1] - min(X_train[:, 1])) / (max(X_train[:, 1]) - min(X_train[:, 1]))
X_test[:, 0] = (X_test[:, 0] - min(X_test[:, 0])) / (max(X_test[:, 0]) - min(X_test[:, 0]))
X_test[:, 1] = (X_test[:, 1] - min(X_test[:, 1])) / (max(X_test[:, 1]) - min(X_test[:, 1]))

for c in C:
    clf_linear_to_plot[c] = SVC(kernel='linear', C=c)
    clf_linear_to_plot[c].fit(X_train, y_train)
    y_pred[c] = clf_linear_to_plot[c].predict(X_test)
    score[c] = clf_linear_to_plot[c].score(X_test, y_test)

for c in C:
    print(f"Linear SVM descision region for C={c}:")
    Plot_SVM_decision_regions(clf=clf_linear_to_plot[c], data=X_train, labels=["F1","F2"])
    print(f"Linear SVM support vector for C={c}:")
    Plot_Support_Vectors(clf=clf_linear_to_plot[c], data=X_train)

(b) Try different kernels ('rbf', 'poly', 'sigmoid') and compare the results. Choose one of the kernels and use different penalty terms $C$. What happens with the performance on the training set when you increase $C$? What happens with the performance on the test set?

In [None]:
C = [ 0.1, 1, 10 ]
features = ['F1', 'F2', 'F3', 'F4', 'B1', 'B2', 'B3', 'B4']
kernels = ['rbf', 'poly', 'sigmoid']
X_train = []
y_train = []
X_test = []
y_test = []

for t in vowels:
    for x in np.array(train[(train["Phoneme"] == t)][features]):
        X_train.append(x)
        y_train.append(t)
        
    for x in np.array(test[(test["Phoneme"] == t)][features]):
        X_test.append(x)
        y_test.append(t)

clf = {}
y_pred = {}
score_train = {}
score_test = {}

for k in kernels: 
    clf[k] = {}
    y_pred[k] = {}
    score_train[k] = {}
    score_test[k] = {}
    for c in C:
        clf[k][c] = make_pipeline(StandardScaler(), SVC(kernel=k, C=c))
        clf[k][c].fit(X_train, y_train)
        y_pred[k][c] = clf[k][c].predict(X_test)
        score_train[k][c] = clf[k][c].score(X_train, y_train)
        score_test[k][c] = clf[k][c].score(X_test, y_test)

In [None]:
score_train, score_test

We can see that on training set the score increases with C on both rbf and poly methods. On sigmoid the result is weird, as we can see the score is decreasing.

On the testing set we can see the same thing, except for sigmoid again. It decrease between 0.1 and 1 and then in crease from 1 to 10.

We can see now what happens if we know the gender.

In [None]:
C = [ 0.1, 1, 10 ]
features = ['F1', 'F2', 'F3', 'F4', 'B1', 'B2', 'B3', 'B4']
kernels = ['rbf', 'poly', 'sigmoid']

X_train = { g: [] for g in genders}
y_train = { g: [] for g in genders}
X_test = { g: [] for g in genders}
y_test = { g: [] for g in genders}


for g in genders:
    for t in vowels:
        for x in np.array(train[(train["Phoneme"] == t)& (train["Gender"] == g)][features]):
            X_train[g].append(x)
            y_train[g].append(t)
            
        for x in np.array(test[(test["Phoneme"] == t)& (test["Gender"] == g)][features]):
            X_test[g].append(x)
            y_test[g].append(t)

clf = { k: { g: {} for g in genders } for k in kernels }
y_pred = { k: { g: {} for g in genders } for k in kernels }
score_train = { k: { g: {} for g in genders } for k in kernels }
score_test = { k: { g: {} for g in genders } for k in kernels }

for k in kernels: 
    for g in genders:
        for c in C:
            clf[k][g][c] = make_pipeline(StandardScaler(), SVC(kernel=k, C=c))
            clf[k][g][c].fit(X_train[g], y_train[g])
            y_pred[k][g][c] = clf[k][g][c].predict(X_test[g])
            score_train[k][g][c] = clf[k][g][c].score(X_train[g], y_train[g])
            score_test[k][g][c] = clf[k][g][c].score(X_test[g], y_test[g])

In [None]:
score_train, score_test

We can see that the value of the score depends on a lot of parameters. If we know the gender, when C increase, there is two scenarios. The score increases from 0.1 to 10 or it increases from 0.1 to 1 and then decrease between 1 and 10.

To see the support vectors considering F1 and F2 on all kernels:

In [None]:
C = [ 0.1, 1, 10 ]

features = ['F1', 'F2']
kernels = ['rbf', 'poly', 'sigmoid']
X_train = []
y_train = []
for t in vowels:
    for x in np.array(train[train["Phoneme"] == t][features]):
        X_train.append(x)
        y_train.append(vowel_labels[t])

X_test = []
y_test = []
for t in vowels:
    for x in np.array(test[test["Phoneme"] == t][features]):
        X_test.append(x)
        y_test.append(vowel_labels[t])
clf = {}
y_pred = {}
score_train = {}
score_test = {}

for k in kernels: 
    clf[k] = {}
    y_pred[k] = {}
    score_train[k] = {}
    score_test[k] = {}
    for c in C:
        # clf[k][c] = make_pipeline(StandardScaler(), SVC(kernel=k, C=c))
        clf[k][c] = SVC(kernel=k, C=c)
        clf[k][c].fit(X_train, y_train)
        y_pred[k][c] = clf[k][c].predict(X_test)
        score_train[k][c] = clf[k][c].score(X_train, y_train)
        score_test[k][c] = clf[k][c].score(X_test, y_test)

In [None]:
for k in kernels:
    Plot_Support_Vectors(clf=clf[k][c], data=np.array(X_train))