### SVM Hyperparameter Tuning using GridSearchCV - ML
- SVM are used for classification tasks but their performance depends on the right choice of hyperparameter like C and gamma.
- Finding optimal combination of these hyperparameters can be issue.
- GridSearchCV automates this process by systematically testing various combinations of hyperparameters and selecting the best one based on cross-validation results.

### Step 1: Importing Necessary Libraries

In [5]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix

### Step 2: Loading and Printing the Dataset

In [6]:
cancer = load_breast_cancer()

df_features = pd.DataFrame(cancer['data'], columns = cancer['feature_names'])
df_target = pd.DataFrame(cancer['target'], columns=['Cancer'])

print('Feature Variables:')
print(df_features.info())
print('DataFrame looks like:')
print(df_features.head())

Feature Variables:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 30 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoothn

### Step 3: Splitting the Data into Training and Testing Sets
- np.ravel() - function gives contiguous array.

In [7]:
X_train, X_test, y_train, y_test = train_test_split( df_features, np.ravel(df_target), test_size=0.30, random_state=101)

### Step 4: Training an SVM Model without Hyperparameter Tuning

In [8]:
model = SVC()
model.fit(X_train, y_train)

predictions = model.predict(X_test)
print(classification_report(y_test, predictions))

              precision    recall  f1-score   support

           0       0.95      0.85      0.90        66
           1       0.91      0.97      0.94       105

    accuracy                           0.92       171
   macro avg       0.93      0.91      0.92       171
weighted avg       0.93      0.92      0.92       171



### Step 5: Hyperparameter Tuning with GridSearchCV
- Let's use GridSearchCV to find the best combination of C, gamma and kernel hyperparameters for the SVM model.
- c - Controls the trade-off between a wider margin(low C) and correctly classifying all points (highC)
- gamma - Determines how far the influence of each data points reaches with high gamma fitting tightly to the data.
- kernel - Defines the function used for to transform data for separating classes.

In [9]:
from sklearn.model_selection import GridSearchCV
param_grid = {
    'C':[0.1, 1, 10, 100, 1000],
    'gamma':[1, 0.1, 0.01, 0.001, 0.0001],
    'kernel':['rbf']
}
grid = GridSearchCV(SVC(), param_grid, refit = True, verbose = 3)
grid.fit(X_train, y_train)


Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV 1/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.637 total time=   0.0s
[CV 2/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.637 total time=   0.0s
[CV 3/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.625 total time=   0.0s
[CV 4/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.633 total time=   0.0s
[CV 5/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.633 total time=   0.0s
[CV 1/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.637 total time=   0.0s
[CV 2/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.637 total time=   0.0s
[CV 3/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.625 total time=   0.0s
[CV 4/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.633 total time=   0.0s
[CV 5/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.633 total time=   0.0s
[CV 1/5] END .....C=0.1, gamma=0.01, kernel=rbf;, score=0.637 total time=   0.0s
[CV 2/5] END .....C=0.1, gamma=0.01, kernel=rbf

### Step 6: Get the Best Hyperparameters and Model 

In [10]:
print(grid.best_params_) 
 
print(grid.best_estimator_)

{'C': 1, 'gamma': 0.0001, 'kernel': 'rbf'}
SVC(C=1, gamma=0.0001)


### Step 7: Evaluating the Optimized Model

In [11]:
grid_predictions = grid.predict(X_test) 

print(classification_report(y_test, grid_predictions))

              precision    recall  f1-score   support

           0       0.94      0.89      0.91        66
           1       0.94      0.96      0.95       105

    accuracy                           0.94       171
   macro avg       0.94      0.93      0.93       171
weighted avg       0.94      0.94      0.94       171

