<a href="https://colab.research.google.com/github/saschaschworm/dsb/blob/master/Exercises/svm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Support Vector Machines

## Packages and Configuration

In [0]:
import numpy as np
import pandas as pd

from sklearn.linear_model import SGDClassifier, SGDRegressor
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error
from sklearn.svm import SVC, SVR

## Support Vector Regression on Rental Prices Dataset

### Data Import and Preprocessing

In [0]:
data = pd.read_csv('https://raw.githubusercontent.com/saschaschworm/dsb/master/Data%20Sets/Demos%20and%20Exercises/rental_prices/rental_prices_multiple.csv')
X, y = data[['living_space', 'age']].values, data['rental_price'].values

### Modelling and 10-Fold Cross-Validation

In [0]:
# Regression Models
multiple_linear_reg = SGDRegressor(max_iter=1000, eta0=0.0001, random_state=1909)
support_vector_reg = SVR(kernel='linear', C=100)

In [4]:
# 10-Fold Cross-Validation (The Easy Way)
cv_scores_mlr_reg = cross_val_score(multiple_linear_reg, X, y, cv=10, scoring='neg_mean_squared_error', n_jobs=-1, verbose=1)
cv_scores_svm_reg = cross_val_score(support_vector_reg, X, y, cv=10, scoring='neg_mean_squared_error', n_jobs=-1, verbose=1)

[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.1s finished
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.1s finished


### Evaluation

In [5]:
print(f'Average RMSE in Multiple Linear Regression: {np.sqrt(-np.mean(cv_scores_mlr_reg)):.2f}')
print(f'Average RMSE in Support Vector Machine: {np.sqrt(-np.mean(cv_scores_svm_reg)):.2f}')

Average RMSE in Multiple Linear Regression: 43.32
Average RMSE in Support Vector Machine: 34.77


## Support Vector Classification on Student Exam Performance Data

### Data Import and Preprocessing

In [0]:
data = pd.read_csv('https://github.com/saschaschworm/dsb/raw/master/Data%20Sets/Demos%20and%20Exercises/exam_passing/exam_passing.csv', header=None)
data.columns = ['hours_studied', 'hours_slept', 'passed']
X, y = data[['hours_studied', 'hours_slept']].values, data['passed'].values

### Modelling and 10-Fold Cross-Validation

In [0]:
# Regression Models
logistic_clf = SGDClassifier(max_iter=1000, eta0=0.0001, random_state=1909)
support_vector_clf = SVC(kernel='linear', C=100)

In [8]:
# 10-Fold Cross-Validation (The Easy Way)
cv_scores_log_clf = cross_val_score(logistic_clf, X, y, cv=10, scoring='f1', n_jobs=-1, verbose=1)
cv_scores_svm_clf = cross_val_score(support_vector_clf, X, y, cv=10, scoring='f1', n_jobs=-1, verbose=1)

[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.1s finished
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.1s finished


### Evaluation

In [9]:
print(f'Average F1 in Logistic Regreression: {np.mean(cv_scores_log_clf) * 100:.2f}%')
print(f'Average F1 in Support Vector Machine: {np.mean(cv_scores_svm_clf) * 100:.2f}%')

Average F1 in Logistic Regreression: 88.29%
Average F1 in Support Vector Machine: 88.99%
