# Support Vector Machines (SVMs) using scikit-learn in Python - Project Overview
### Let's tain and deploy SVMs on another dataset from UCI Machine Learning Repository 
### [Statlog (Australian Credit Approval) Data Set ](https://archive.ics.uci.edu/ml/datasets/Statlog+%28Australian+Credit+Approval%29)<br>

This real dataset concerns credit card applications. All attribute names and values have been changed to meaningless symbols to protect confidentiality of the data. <br>

You can download the datafile from the UCI website or use the one which is processed and provided in the course material. Later on, you can download the file from UCI website and do the data cleaning your self for practice your skills. <br>


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

**Read the datafile and display the head of your dataframe**

In [3]:
#Code here please, so that you dont lose the output
df = pd.read_csv('Aust_Credit_Approval_Data.csv')
df.head()

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,target
0,22.08,11.46,2,4,4,1.585,0,0,0,1,2,100,1213,0
1,22.67,7.0,2,8,4,0.165,0,0,0,0,2,160,1,0
2,29.58,1.75,1,4,4,1.25,0,0,0,1,2,280,1,0
3,21.67,11.5,1,5,3,0.0,1,1,11,1,2,0,1,1
4,20.17,8.17,2,6,4,1.96,1,1,14,0,2,60,159,1


In [4]:
#Code here please, so that you dont lose the output
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 690 entries, 0 to 689
Data columns (total 14 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   x1      690 non-null    float64
 1   x2      690 non-null    float64
 2   x3      690 non-null    int64  
 3   x4      690 non-null    int64  
 4   x5      690 non-null    int64  
 5   x6      690 non-null    float64
 6   x7      690 non-null    int64  
 7   x8      690 non-null    int64  
 8   x9      690 non-null    int64  
 9   x10     690 non-null    int64  
 10  x11     690 non-null    int64  
 11  x12     690 non-null    int64  
 12  x13     690 non-null    int64  
 13  target  690 non-null    int64  
dtypes: float64(3), int64(11)
memory usage: 75.6 KB


In [5]:
#Code here please, so that you dont lose the output
df.describe()

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,target
count,690.0,690.0,690.0,690.0,690.0,690.0,690.0,690.0,690.0,690.0,690.0,690.0,690.0,690.0
mean,31.568203,4.758725,1.766667,7.372464,4.692754,2.223406,0.523188,0.427536,2.4,0.457971,1.928986,184.014493,1018.385507,0.444928
std,11.853273,4.978163,0.430063,3.683265,1.992316,3.346513,0.499824,0.49508,4.86294,0.498592,0.298813,172.159274,5210.102598,0.497318
min,13.75,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
25%,22.67,1.0,2.0,4.0,4.0,0.165,0.0,0.0,0.0,0.0,2.0,80.0,1.0,0.0
50%,28.625,2.75,2.0,8.0,4.0,1.0,1.0,0.0,0.0,0.0,2.0,160.0,6.0,0.0
75%,37.7075,7.2075,2.0,10.0,5.0,2.625,1.0,1.0,3.0,1.0,2.0,272.0,396.5,1.0
max,80.25,28.0,3.0,14.0,9.0,28.5,1.0,1.0,67.0,1.0,3.0,2000.0,100001.0,1.0


Let's move on to machine learning. 
## Machine Learning 

** Do the train test split, use test_size .30 and default value of random state at the moment**

In [6]:
# Code here please, so that you dont lose the output
X = df.drop('target', axis=1)
y = df['target']

In [7]:
X

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13
0,22.08,11.460,2,4,4,1.585,0,0,0,1,2,100,1213
1,22.67,7.000,2,8,4,0.165,0,0,0,0,2,160,1
2,29.58,1.750,1,4,4,1.250,0,0,0,1,2,280,1
3,21.67,11.500,1,5,3,0.000,1,1,11,1,2,0,1
4,20.17,8.170,2,6,4,1.960,1,1,14,0,2,60,159
...,...,...,...,...,...,...,...,...,...,...,...,...,...
685,31.57,10.500,2,14,4,6.500,1,0,0,0,2,0,1
686,20.67,0.415,2,8,4,0.125,0,0,0,0,2,0,45
687,18.83,9.540,2,6,4,0.085,1,0,0,0,2,100,1
688,27.42,14.500,2,14,8,3.085,1,1,1,0,2,120,12


In [8]:
y

0      0
1      0
2      0
3      1
4      1
      ..
685    1
686    0
687    1
688    1
689    1
Name: target, Length: 690, dtype: int64

In [9]:
from sklearn.model_selection import train_test_split

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### Importing and  training the Support Vector Classifier

**Import SVC and create its instance `svm_model`**

In [12]:
# Code here please, so that you dont lose the output

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

svm_model = SVC(kernel='linear', C=30, gamma='auto')

**train the model please** 

In [13]:
# Code here please, so that you dont lose the output
svm_model.fit(X_train, y_train)

**Do the predictions please** 

In [None]:
#Code here please, so that you dont lose the output

In [14]:
svm_model.predict(X_train)

array([1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0,
       1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1,
       1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0,
       0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0,
       1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0,
       0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0,
       1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0,
       1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
       1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1,
       0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1,
       1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0,
       1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1,
       1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1,
       0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0,

**How the calculate the score?**

In [None]:
#Code here please, so that you dont lose the output

In [15]:
svm_model.score(X_test, y_test)

0.8502415458937198

The model is improved after GridSearch but the prediction are not great. In the profession setup, you try different strategies including feature engineering along with using different models to see which one works best for your data. This section is specifically for the SVMs to learn how the model work and how to improve its performance.
# Excellent work!