# Support Vector Machine (SVM) Overview

Support Vector Machine (SVM) is a supervised learning algorithm used primarily for classification tasks, though it can also be applied to regression problems. The core idea behind SVM is to find the optimal hyperplane that best separates data points from different classes.

## Key Concepts

1. **Hyperplane**:  
   In an n-dimensional space (where n is the number of features), a hyperplane is a flat affine subspace of dimension n-1. For a 2D space, it's a line; for 3D, it's a plane. SVM finds the hyperplane that best separates the classes.

2. **Support Vectors**:  
   These are the data points that are closest to the hyperplane. The position of these points is critical as they define the hyperplane.

3. **Margin**:  
   The distance between the hyperplane and the nearest data point of each class is called the margin. SVM aims to maximize this margin. A larger margin equates to a better generalization of the model.

4. **Kernel Trick**:  
   When data is not linearly separable, SVM uses a kernel function to transform the data into a higher-dimensional space where it becomes linearly separable. Common kernels include:
   - **Linear**: No transformation, used when data is already linearly separable.
   - **Polynomial**: Adds polynomial features.
   - **Radial Basis Function (RBF)**: Adds features that correspond to the distance from the center of a Gaussian function.
   - **Sigmoid**: Maps the input features into a higher dimensional space using a sigmoid function.

5. **Soft Margin vs. Hard Margin**:
   - **Hard Margin**: Assumes that data is perfectly linearly separable and aims to find a hyperplane that separates all points correctly. This approach is sensitive to outliers.
   - **Soft Margin**: Allows some misclassification or errors but tries to minimize them. This approach is more robust to outliers.

## Advantages of SVM
- Effective in high-dimensional spaces.
- Works well with clear margin separation.
- Robust to overfitting, especially in high-dimensional spaces.

## Disadvantages of SVM
- Can be less effective on very large datasets.
- Not suitable for datasets with a lot of noise.
- Choosing the right kernel and regularization parameter can be tricky.

## Real-World Implementation

We'll implement SVM using a real-world dataset. The steps will include:
1. **Loading the dataset**: We'll use a popular dataset, such as the one related to breast cancer classification or another suitable one.
2. **Preprocessing**: Handle missing values, normalize/standardize features if needed.
3. **Training the Model**: Apply SVM with an appropriate kernel.
4. **Model Evaluation**: Evaluate the model using accuracy, precision, recall, and other metrics.
5. **Tuning Parameters**: Use GridSearchCV to find the best hyperparameters.




A great dataset for implementing SVM is the **Breast Cancer Wisconsin (Diagnostic) Dataset**. It's a well-known dataset in the machine learning community, often used for binary classification tasks. The dataset includes features that describe the characteristics of cell nuclei in images of breast cancer biopsies, and the goal is to classify the samples into benign or malignant categories.

In [22]:
#import libraries
import pandas as pd
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import GridSearchCV


# Step 2: Load the datasets

In [23]:
data = load_diabetes()

#covert it into dataframe for easierexplorarion
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

#display the first row
df.head()

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,target
0,0.038076,0.05068,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,-0.017646,151.0
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,-0.092204,75.0
2,0.085299,0.05068,0.044451,-0.00567,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,-0.02593,141.0
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022688,-0.009362,206.0
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031988,-0.046641,135.0


## Step 3: Preprocessing
Before we train the SVM model, we need to preprocess the data. This includes splitting the dataset into training and testing sets and scaling the features.

In [24]:
#split the dataset into feature (X) and (y)
X = df.drop('target', axis=1)
y = df['target']

#Split into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test - scaler.transform(X_test)

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6
287,-0.903453,0.897537,0.162019,0.338926,-2.506567,-2.523968,-0.410300,-0.676729,-0.621168,0.141966
211,-1.876862,0.897537,-0.708109,-0.405669,0.486474,0.316624,-0.036853,0.795965,0.474392,0.468346
72,-1.277841,-1.010690,0.118513,0.271235,-2.067222,-0.993936,-1.157195,0.059618,-1.659115,0.386751
321,-1.951740,0.897537,-1.012653,-1.534068,-1.106153,-0.749382,1.531626,-2.812135,-1.941366,-1.163553
73,-0.229555,-1.010690,0.444811,0.068164,-0.776644,-1.081725,0.112526,-0.676729,0.127986,0.223561
...,...,...,...,...,...,...,...,...,...,...
255,-0.004922,0.897537,1.358445,0.135855,0.129506,0.373060,-0.858437,0.795965,0.091273,-0.102819
90,-0.229555,0.897537,0.553577,0.812759,0.596310,0.887251,-1.605331,1.532312,1.463632,-0.184414
57,0.594099,0.897537,1.314938,1.015830,1.777051,2.072399,-1.082505,1.532312,1.145032,1.365891
391,0.519222,0.897537,1.445457,1.286592,1.172951,0.993852,-0.410300,0.795965,1.802587,1.039511


# Step 4: Train the SVM Model
Now, let's train the SVM model using the RBF kernel (a popular choice for non-linear problems).

In [27]:


#create an svm classifier with rbf kernel
model = SVC(kernel='rbf', C=100, gamma=0.001)


from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)



#train the model
model.fit(X_train_scaled, y_train)



# Step 5: Evaluate the Model
After training, we can evaluate the model's performance on the test set.

In [28]:
# Make predictions on the test set
y_pred = model.predict(X_test_scaled)

# Evaluate the model
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

print("\nAccuracy Score:", accuracy_score(y_test, y_pred))


Confusion Matrix:
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]

Classification Report:
              precision    recall  f1-score   support

        37.0       0.00      0.00      0.00         1
        42.0       0.00      0.00      0.00         1
        48.0       0.00      0.00      0.00         2
        52.0       0.00      0.00      0.00         2
        60.0       0.00      0.00      0.00         1
        61.0       0.00      0.00      0.00         1
        63.0       0.00      0.00      0.00         2
        64.0       0.00      0.00      0.00         1
        67.0       0.00      0.00      0.00         1
        68.0       0.00      0.00      0.00         1
        69.0       0.00      0.00      0.00         1
        70.0       0.00      0.00      0.00         1
        72.0       0.00      0.00      0.00         3
        77.0       0.00      0.00      0.00         1
        84.0       0.00    

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


# Step 6: Hyperparameter Tuning with GridSearchCV
To optimize the model, we'll perform hyperparameter tuning using GridSearchCV.

In [29]:
from sklearn.model_selection import RandomizedSearchCV

param_grid = {
    'C': [0.1, 1, 10, 100, 1000],
    'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
    'kernel': ['rbf', 'poly', 'sigmoid']
}

grid = RandomizedSearchCV(SVC(), param_grid, refit=True, verbose=3, n_iter=100)
grid.fit(X_train_scaled, y_train)


#print the best parameters
print("Best parameters:", grid_search.best_params_)

#Evaluate the tuned model
y_pred_tuned = grid_search.predict(X_test)
print("\nTuned Model Accuracy Score:", accuracy_score(y_test, y_pred_tuned))



Fitting 5 folds for each of 75 candidates, totalling 375 fits
[CV 1/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.014 total time=   0.0s
[CV 2/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.014 total time=   0.0s
[CV 3/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.014 total time=   0.0s
[CV 4/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.014 total time=   0.0s
[CV 5/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.014 total time=   0.0s
[CV 1/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.014 total time=   0.0s
[CV 2/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.000 total time=   0.0s
[CV 3/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.000 total time=   0.0s
[CV 4/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.043 total time=   0.0s
[CV 5/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.014 total time=   0.0s
[CV 1/5] END ....C=0.1, gamma=1, kernel=sigmoid;, score=0.014 total time=   0.0s
[CV 2/5] END ....C=0.1, gamma=1, kernel=sigmoid



ValueError: X has 10 features, but SVC is expecting 30 features as input.