# SLU18 - Support Vector Machines (SVM) -- Exercises

In this notebook we will be covering the following:


*  Hyperplanes
*  Maximal Margin Classifier
* Support Vector Classifier
* Support Vector Machine
* Multi-Class extension
* Support Vector Regression

New tools in this unit

* [SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)
* [SVR](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html)

In [None]:
import pandas as pd
import numpy as np
from hashlib import sha256
import json

import sklearn
# These will be needed to prepare the dataset
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Seed for reproducibility
np.random.seed(42)

**Let the Music Play**

The year is 2020 and due to the Covid-19 pandemic you spend a lot more time inside than you used to. You realize that one of the few things that people can still do (almost) the same way as before is listen to music. Thus, you decide to use your data skills to surprise one of your friends. To do so, you use data about your friend's listening habits and try to make a classifier that predicts whether your friend will like a song based on some attributes. 

In [None]:
songs_df = pd.read_csv("data/song_data.csv", index_col="id")
print(songs_df.shape)
songs_df.head()

The data contains information about which songs your friend liked or not in the *target* column. It also contains several attributes about each song that you suspect will be useful to infer your friend's musical taste. In this case, you decide to drop the song title and artist as you are more interested in the musical attributes. 

In [None]:
songs_df = songs_df.drop(columns=["song_title", "artist"])

In [None]:
songs_df.head()

In [None]:
songs_df.target.value_counts(normalize=True)

Since the target variable is binary, you are faced with a binary classification problem. You remember that really cool class you had about Support Vector Machines, and so decide to give them a shot. 

In order to properly train and evaluate your models, you split your dataset into train set and test set.

In [None]:
def get_X_y_train_test(df, target_col):
    """
    Convert the input dataframe df into the
    train and test features and targets
    """
    X = df.drop(target_col, axis=1)
    y = df[target_col]
    # train test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # SVMs are not scale invariant, so you scale your data beforehand
    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)

    print("X_train of shape ", X_train.shape)
    print("y_train of shape ", y_train.shape)
    print("X_test of shape  ", X_test.shape)
    print("y_test of shape  ", y_test.shape)
    
    return X_train, X_test, y_train, y_test 

In [None]:
X_train, X_test, y_train, y_test = get_X_y_train_test(songs_df, target_col="target")

## Exercise 1: Support Vector Classifier


1.1) Use a support vector classifier to predict which songs your friend will like

In [None]:
# Create an SVC estimator using sklearn with a linear kernel 
# train it on the data 
# assign your trained estimator to linear_svc
# linear_svc = ...

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
svc_argument_hash = '7f2fe580edb35154041fa3d4b41dd6d3adaef0c85d2ff6309f1d4b520eeecda3'

assert isinstance(linear_svc, sklearn.svm.SVC) #check if SVC is of the right type
assert svc_argument_hash == sha256(linear_svc.kernel.encode()).hexdigest()  #check if kernel is of the right type
np.testing.assert_almost_equal(linear_svc.score(X_test, y_test), 0.6534653465346535) # check if score is close to what is expected

1.2) Obtain the number of support vectors for each class

In [None]:
# Obtain the number of support vectors for each class of the target variable
# assign the result to n_s_vectors, which should be an array whose first element
# is the number of support vectors of class 1 and the second element the number
# of support vectors of class 2
# n_s_vectors = ...

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
n_s_vectors_hash = 'cb0ebcc1d1c85083fd69512983299369ac096ac1c3342bd2e42309e089403af0'
assert sha256(np.array([elem for elem in linear_svc.n_support_], dtype=np.int32)).hexdigest() == n_s_vectors_hash 

1.3) Obtain the support vectors for the above classifier

In [None]:
# Obtain the support vectors for the classifier defined in 1.1
# assign the result to a variable s_vectors
# s_vectors = ...

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
s_vectors_hash = '8546938269b43713b416e3f1426464b79138346aab87385e49ea4cf37221192d'
assert sha256(np.around(s_vectors, decimals=2)).hexdigest() == s_vectors_hash

1.4) Create a new SVC estimator that allows for, at most, 100 training obervations to be on the wrong side of the decision hyperplane

In [None]:
# Create a new estimator that allows for, at most, 100 training obervations to 
# be on the wrong side of the decision hyperplane and train it on the data
# assign the result to linear_svc_100
# linear_svc_100 = ...

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
svc_parameters_hash = '85a1b9f29e87effe2c0f53b4343a231527a89fcf886be951a8a89ec3558646fa'

assert isinstance(linear_svc_100, sklearn.svm.SVC) # check if SVC is of the right type
# check if SVC parameters are according to what is expected
assert svc_parameters_hash == sha256(json.dumps(linear_svc_100.get_params()).encode()).hexdigest() 
np.testing.assert_almost_equal(linear_svc_100.score(X_test, y_test), 0.6534653465346535) # check if SVC score matches expected

## Exercise 2 : Support Vector Machines
Having tried the Support Vector Classifier, you turn to Support Vector Machines to see if they can improve the performance of your classifier. You wonder which kernel you should use, and decide to start with the polynomial kernel

2.1) Create an SVM with polynomial kernel of degree 2. Fit the model to the data and create new predictions

In [None]:
# Use an SVM with a polynomial kernel to create predictions
# Begin by creating the estimator
# then train it on the data
# assign your estimator to the variable poly_svm
# and its predictions to the variable poly_preds
# poly_svm = ...
# poly_preds = ...

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
poly_parameters_hash = '0e941e9e50803e150072a70406be935c3d85e513c8f70ec1f9b5f9fcba7b0fa1'
poly_preds_hash = '95de367ef41c1dca7fbf01380952a5dcc714daa7aed42679c08c58798e84b3d2'

assert isinstance(poly_svm, sklearn.svm.SVC) # check if SVC is of the right type
# check if SVC parameters are according to what is expected
assert poly_parameters_hash == sha256(json.dumps(poly_svm.get_params()).encode()).hexdigest()
# check if SVC score matches expected
np.testing.assert_almost_equal(poly_svm.score(X_test, y_test), 0.7079207920792079) 
# check if model predictions match expected result
assert poly_preds_hash == sha256(poly_preds.astype(np.int32)).hexdigest()

2.2) Create an SVM with a Radial kernel and fit it to the data

In [None]:
# Use an SVM with a radial kernel 
# Begin by creating the estimator
# then train it on the data
# assign your estimator to the variable radial_svm
# radial_svm = ...

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
radial_parameters_hash = 'ea58a5343263d35db801c24dd5b4815cbaf86dde2434e4e29294f1376f1699b6'

assert isinstance(radial_svm, sklearn.svm.SVC) # check if SVC is of the right type
# check if SVC parameters are according to what is expected
assert radial_parameters_hash == sha256(json.dumps(radial_svm.get_params()).encode()).hexdigest()
# check if model predictions match expected result
np.testing.assert_almost_equal(radial_svm.score(X_test, y_test), 0.7376237623762376)

## Exercise 3 : Support Vector Regression

You also wonder whether the energy of a song can be predicted by the remaining attributes. 

3.1)Use an SVR estimator to predict the energy of a song 

In [None]:
# Change the target variable to the energy (float)
X_train, X_test, y_train, y_test = get_X_y_train_test(songs_df.drop("target", axis=1), target_col="energy")

In [None]:
# Use an SVR with a radial kernel to create predictions
# Begin by creating the estimator
# then train it on the data
# assign your estimator to the variable svr
# svr = ...

# YOUR CODE HERE
raise NotImplementedError()


In [None]:
svr_parameters_hash = '1c7028317e2a4a052cfc51d74b47fc8e1eb78ee04999b792464352ebe15fa54a'

assert isinstance(svr, sklearn.svm.SVR) # check if SVR is of the right type
# check if SVR parameters are according to what is expected
assert svr_parameters_hash == sha256(json.dumps(svr.get_params()).encode()).hexdigest()
# check if model score is close enough to expected value
np.testing.assert_almost_equal(svr.score(X_test, y_test), 0.7369581003405523)