# Support Vector Machine (SVM)

Build an SVM model using the scikit-learn library. Explore both linear and
kernel-based SVMs and examine the effects of different kernel functions.


## Steps

### 1. Understanding SVMs:

https://github.com/asliulusoy/Machine-Learning-CMPE302/blob/main/Week10/Support_Vector_Machines-Main-Ideas.md

### 2. Exploring sckit-learn Documentation:
* **C (Regularization parameter)**
    - Default: 1.0
    - Controls the trade-off between smooth decision boundary and classifying training points correctly. A high value fits to the training data more closely.
* **kernel**
    - Default: 'rbf'
    - Options: 'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'
    - Specifies the kernel type to be used in the SVM.
* **degree**
    - Default: 3
    - Description: Degree of the polynomial kernel function. Important for polynomial kernels to capture the complexity in the data.
* **gamma**
    - Default: 'scale'
    - Determines the influence of individual training examples. High values closely fit the training dataset, which can cause overfitting.
        * if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,
        * if ‘auto’, uses 1 / n_features
        * if float, must be non-negative.
* **coef0**
    - Default: 0.0
    - Independent term in kernel function. Significant in 'poly' and 'sigmoid' kernels.
* **shrinking**
    - Default: True
    - Whether to use the shrinking heuristic to speed up optimization.
* **probability**
    - Default: False
    - Whether to enable probability estimates. This is computationally expensive as it internally uses cross-validation.
* **class_weight**
    - Default: None
    -  Adjusts weights inversely proportional to class frequencies or as specified by the user.

### 3. Data Preprocessing:

In [26]:
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

* 1. Loading and cleaning the "iris" dataset. Determining how to handle missing values if present.
* 2. Preparing the data by scaling or normalizing.

#### Load & Clean

In [2]:
#LOADING DATASET
iris = datasets.load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target

In [3]:
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [4]:
df.shape

(150, 5)

Checking whether there is null or duplicated values or not and handling them:

In [5]:
print(df.isnull().sum().sum()) #check null values

0


In [6]:
df.duplicated().sum() #check duplicated values

1

In [7]:
df = df.drop_duplicates() #drop duplicated values

In [8]:
df.shape

(149, 5)

#### Preparing Scaling/Normalizing

In [9]:
#DATA SCALING
features = df.columns[:-1]
scaler = StandardScaler()
df[features] = scaler.fit_transform(df[features])

### 4. Model Building

* 1. Start by building and training a linear SVM model.
* 2. Then, create kernel-based SVM models using different kernel functions such as RBF, Polynomial, and Sigmoid.

In [10]:
#SPLITTING DATASET

X=df[features]
y=df['target']
X_train, X_test, y_train, y_test=train_test_split(X,y, test_size=0.3, random_state=42)

#### Linear SVM

In [11]:
linearSVMmodel=SVC(kernel='linear')
linearSVMmodel.fit(X_train, y_train)
y_predLinear= linearSVMmodel.predict(X_test)

In [28]:
print("Linear SVM Confusion Matrix Results")
print("\nAccuracy Score:")
print(accuracy_score(y_test,y_predLinear))
print("\nCross Validation Score:")
print(cross_val_score(linearSVMmodel, X_train, y_train, cv=5))
print("\nMean CV Score:")
print(cross_val_score(linearSVMmodel, X_train, y_train, cv=5).mean()) 
print("\nConfusion Matrix:")
print(confusion_matrix(y_test,y_predLinear))
print("\nClassification Report:")
print(classification_report(y_test, y_predLinear))

Linear SVM Confusion Matrix Results

Accuracy Score:
0.9777777777777777

Cross Validation Score:
[1.         0.85714286 0.9047619  1.         0.95      ]

Mean CV Score:
0.9423809523809524

Confusion Matrix:
[[19  0  0]
 [ 0 12  1]
 [ 0  0 13]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.92      0.96        13
           2       0.93      1.00      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45



#### Other Kernel Based SVM Models

In [13]:
#RBF

RBFSVMmodel=SVC(kernel='rbf')
RBFSVMmodel.fit(X_train, y_train)
y_predRBF= RBFSVMmodel.predict(X_test)

In [30]:
print("RBF SVM Confusion Matrix Results")
print("\nAccuracy Score:")
print(accuracy_score(y_test,y_predRBF))
print("\nCross Validation Score:")
print(cross_val_score(RBFSVMmodel, X_train, y_train, cv=5))
print("\nMean CV Score:")
print(cross_val_score(RBFSVMmodel, X_train, y_train, cv=5).mean()) 
print("\nConfusion Matrix:")
print(confusion_matrix(y_test,y_predRBF))
print("\nClassification Report:")
print(classification_report(y_test, y_predRBF))

RBF SVM Confusion Matrix Results

Accuracy Score:
1.0

Cross Validation Score:
[1.         0.80952381 0.9047619  1.         0.95      ]

Mean CV Score:
0.9328571428571429

Confusion Matrix:
[[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



In [15]:
#POLYNOMIAL
polynomialSVMmodel=SVC(kernel='poly')
polynomialSVMmodel.fit(X_train, y_train)
y_predpoly= polynomialSVMmodel.predict(X_test)

In [31]:
print("Polynomial SVM Confusion Matrix Results")
print("\nAccuracy Score:")
print(accuracy_score(y_test,y_predpoly))
print("\nCross Validation Score:")
print(cross_val_score(polynomialSVMmodel, X_train, y_train, cv=5))
print("\nMean CV Score:")
print(cross_val_score(polynomialSVMmodel, X_train, y_train, cv=5).mean()) 
print("\nConfusion Matrix:")
print(confusion_matrix(y_test,y_predpoly))
print("\nClassification Report:")
print(classification_report(y_test, y_predpoly))

Polynomial SVM Confusion Matrix Results

Accuracy Score:
0.9777777777777777

Cross Validation Score:
[0.95238095 0.85714286 0.9047619  1.         0.85      ]

Mean CV Score:
0.9128571428571428

Confusion Matrix:
[[19  0  0]
 [ 0 13  0]
 [ 0  1 12]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       0.93      1.00      0.96        13
           2       1.00      0.92      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45



In [17]:
#SIGMOID
sigmoidSVMmodel=SVC(kernel='sigmoid')
sigmoidSVMmodel.fit(X_train, y_train)
y_predsigmoid= sigmoidSVMmodel.predict(X_test)

In [32]:
print("Sigmoid SVM Confusion Matrix Results")
print("\nAccuracy Score:")
print(accuracy_score(y_test,y_predsigmoid))
print("\nCross Validation Score:")
print(cross_val_score(sigmoidSVMmodel, X_train, y_train, cv=5))
print("\nMean CV Score:")
print(cross_val_score(sigmoidSVMmodel, X_train, y_train, cv=5).mean()) 
print("\nConfusion Matrix:")
print(confusion_matrix(y_test,y_predsigmoid))
print("\nClassification Report:")
print(classification_report(y_test, y_predsigmoid))

Sigmoid SVM Confusion Matrix Results

Accuracy Score:
0.8888888888888888

Cross Validation Score:
[1.         0.9047619  0.85714286 0.95238095 0.85      ]

Mean CV Score:
0.9128571428571428

Confusion Matrix:
[[19  0  0]
 [ 0  9  4]
 [ 0  1 12]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       0.90      0.69      0.78        13
           2       0.75      0.92      0.83        13

    accuracy                           0.89        45
   macro avg       0.88      0.87      0.87        45
weighted avg       0.90      0.89      0.89        45



### Results

* **Linear SVM Confusion Matrix Results**
    - Accuracy Score: 0.97
    - Mean CV Score: 0.9423809523809524

* **RBF SVM Confusion Matrix Results**
    - Accuracy Score: 1.0
    - Mean CV Score: 0.9328571428571429

* **Polynomial SVM Confusion Matrix Results**
    - Accuracy Score: 0.97
    - Mean CV Score: 0.9128571428571428

* **Sigmoid SVM Confusion Matrix Results**
    - Accuracy Score: 0.88
    - Mean CV Score: 0.9128571428571428