<a href="https://colab.research.google.com/github/code4tomorrow/machine-learning/blob/main/2_intermediate/chapter6/support_vector_machines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Support Vector Machines**

In this notebook, we will apply Support Vector Machines to analyze admit rates, from the same data that we applied Decision Trees to. Most of us are headed to college after school, and maybe this is a relevant topic to analyze! In particular, we are studying graduate admissions, and trying to predict if a particular student will be admitted based on their criteria.

### **Imports**

In [31]:
import pandas as pd
import numpy as np
import scipy.optimize as opt #This is a functionality from SciPy, a library for Scientific Computing calculations, to find the optimal model.

## **Data Import**

Upload the same data as before, in order to analyze it.

In [32]:
from google.colab import files
files.upload()

Saving College_admission.csv to College_admission (2).csv


{'College_admission.csv': b'admit,gre,gpa,ses,Gender_Male,Race,rank\r\n0,380,3.61,1,0,3,3\r\n1,660,3.67,2,0,2,3\r\n1,800,4,2,0,2,1\r\n1,640,3.19,1,1,2,4\r\n0,520,2.93,3,1,2,4\r\n1,760,3,2,1,1,2\r\n1,560,2.98,2,1,2,1\r\n0,400,3.08,2,0,2,2\r\n1,540,3.39,1,1,1,3\r\n0,700,3.92,1,0,2,2\r\n0,800,4,1,1,1,4\r\n0,440,3.22,3,0,2,1\r\n1,760,4,3,1,2,1\r\n0,700,3.08,2,0,2,2\r\n1,700,4,2,1,1,1\r\n0,480,3.44,3,0,1,3\r\n0,780,3.87,2,0,3,4\r\n0,360,2.56,3,1,3,3\r\n0,800,3.75,1,1,3,2\r\n1,540,3.81,1,0,3,1\r\n0,500,3.17,3,0,2,3\r\n1,660,3.63,1,0,1,2\r\n0,600,2.82,1,0,3,4\r\n0,680,3.19,1,0,1,4\r\n1,760,3.35,2,0,2,2\r\n1,800,3.66,2,1,1,1\r\n1,620,3.61,2,0,1,1\r\n1,520,3.74,2,0,3,4\r\n1,780,3.22,1,0,1,2\r\n0,520,3.29,1,0,1,1\r\n0,540,3.78,1,1,1,4\r\n0,760,3.35,2,1,1,3\r\n0,600,3.4,3,0,1,3\r\n1,800,4,3,0,1,3\r\n0,360,3.14,1,1,2,1\r\n0,400,3.05,3,0,2,2\r\n0,580,3.25,1,0,2,1\r\n0,520,2.9,2,0,2,3\r\n1,500,3.13,2,0,2,2\r\n1,520,2.68,2,0,1,3\r\n0,560,2.42,1,1,3,2\r\n1,580,3.32,1,0,1,2\r\n1,600,3.15,2,1,1,2\r\n0,5

In [33]:
dataDf = pd.read_csv("College_admission.csv") 
dataDf.head() #This command will help us look at what the data set looks like in general

Unnamed: 0,admit,gre,gpa,ses,Gender_Male,Race,rank
0,0,380,3.61,1,0,3,3
1,1,660,3.67,2,0,2,3
2,1,800,4.0,2,0,2,1
3,1,640,3.19,1,1,2,4
4,0,520,2.93,3,1,2,4


### **Cleaning up Data**

Data Engineering is a huge topic in itself, so in this course, we will largely avoid any problematic data. In the following cells, we will remove any rows with empty cells, and the like. Feel free to try this on your own on another data set, if you'd like to.

In [34]:
dataDf = dataDf.dropna() #Removes rows with Null Values.
dataDf.head()

Unnamed: 0,admit,gre,gpa,ses,Gender_Male,Race,rank
0,0,380,3.61,1,0,3,3
1,1,660,3.67,2,0,2,3
2,1,800,4.0,2,0,2,1
3,1,640,3.19,1,1,2,4
4,0,520,2.93,3,1,2,4


### **Train-Test Split**

First, we convert the data to train and test datasets. 

Here, we split the data into input and output dataframes.

In [35]:
X = dataDf[["gre","gpa","ses","Gender_Male","Race","rank"]]
y = dataDf[["admit"]]

Then, we use the train_test_split method from sklearn to split our data into train and test data sets.

In [36]:
from sklearn.model_selection import train_test_split
train_x, test_x, train_y, test_y = train_test_split(X,y,test_size=0.2, random_state = 4)

### **Training**

Now, we will train a Support Vector Machine to predict the admittance of the student.

In [37]:
from sklearn import svm #Importing this library is required to create an SVM-based model.
clf = svm.SVC(kernel="rbf") #This creates an instance of an SVM that can be trained on the data.
clf.fit(train_x,train_y) #This fits the model to the training data

  y = column_or_1d(y, warn=True)


SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

### **Predictions and Accuracy Test**

We use the following syntax to predict the output from the Test Input Data.

In [38]:
predictions = clf.predict(test_x) #The predict functionality allows one to predict based on the model trained.

Here, we apply the Accuracy Score to see how good this model is.

In [39]:
from sklearn import metrics
print(metrics.accuracy_score(test_y,predictions))

0.7


The accuracy score becomes closer to 1, as accuracy increases. You may remember that we applied Decision Trees to this very same problem, to somewhat poorer results, depending on what you ended up with. After all, this is a somewhat randomized process. Clearly, SVMs do better on this problem, and it is important to consider which algorithms work best for different problems when implementing a solution.

In the following cells, repeat what you have seen above, starting from the Train-Test Split, except change the kernel type to one of these, except for 'rbf': *{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’}*.
See if this makes your accuracy better or worse.