<a href="https://colab.research.google.com/github/2303A52083/23CSBTB39-AIML/blob/main/AIML_A7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Implement Support Vector Machine Classification using Breast Cancer Dataset

In machine learning, support vector machines (SVMs, also support vector networks) are
supervised learning models with associated learning algorithms that analyze data used for
classification and regression analysis.
An SVM model is a representation of the examples as points in space, mapped so that
the examples of the separate categories are divided by a clear gap that is as wide as possible.
New examples are then mapped into that same space and predicted to belong to a category
based on which side of the gap they fall. This gap is also called maximum margin and the
SVM classifier is called maximum margin clasifier.
In addition to performing linear classification, SVMs can efficiently perform a non-linear
classification using what is called the kernel trick, implicitly mapping their inputs into highdimensional feature spaces.
1. Import the Libraries required for SVM. [CO2]

  Import all libraries required along with visualization to completed the task on SVM
2. Import the Breast Cancer Dataset from Sklearn Packages. [CO1]

  • Once downloaded the Breast Cancer data, prepare the dictionary format to access the data using its keys.

  • Describe all the features from the dictionary, feature names.

  • Setup the dataframe , describe itd details, check for missing values.

  • Identify the target class and assign it to the dataframe.

  • Perform exploratory analysis of dataframe using seabon - sns package

  • the Draw the boxplot of first 10 columns to verify their role in cancer.
3. Train and Test Data . [CO3]

  • Prepare the Train and Test data from the dataframe.

  • Drop the cancer columns and define the dataframe with only target results

  • Split the data into train, test using train test split

4. Train the SVC using the Train Dataset. [CO3]

  • import sklearn.svm import SVC

  • Apply the model.fit to dataset

5. Predict and Analysis the Performance of the SVC Model. [CO4]

  • Apply model.predict(X test)

  • Generate the classification report, confusion matrix using sklearn.metrics

6. Improve the Accuracy of Model using GridSearchCV Model. [CO4]

  • Given the following parameters validate the GridSearchCV Model:

  param grid = ’C’: [0.1,1, 10, 100, 1000], ’gamma’: [1,0.1,0.01,0.001,0.0001],
  ’kernel’: [’rbf’]

  • Implement the model.predict

  • Generate the classification report, confusion matrix using sklearn.metrics

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
cancer = datasets.load_breast_cancer()
cancer_data = pd.DataFrame(data=cancer.data, columns=cancer.feature_names)
cancer_data['target'] = cancer.target
print("Feature names:", cancer.feature_names)
print(cancer_data.describe())
print("Missing values in each column:\n", cancer_data.isnull().sum())
sns.pairplot(cancer_data, hue='target', diag_kind='kde')
plt.show()
plt.figure(figsize=(12, 6))
sns.boxplot(data=cancer_data.iloc[:, :10])
plt.title("Boxplot of the first 10 features")
plt.xticks(rotation=45)
plt.show()
X = cancer_data.drop('target', axis=1)
y = cancer_data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = SVC()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
param_grid = {'C': [0.1, 1, 10, 100, 1000],
              'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
              'kernel': ['rbf']}

grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=2)
grid.fit(X_train, y_train)
best_model = grid.best_estimator_
y_pred_best = best_model.predict(X_test)
print("Best parameters from GridSearchCV:", grid.best_params_)
print("Classification Report for best model:\n", classification_report(y_test, y_pred_best))
print("Confusion Matrix for best model:\n", confusion_matrix(y_test, y_pred_best))


Feature names: ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
       mean radius  mean texture  mean perimeter    mean area  \
count   569.000000    569.000000      569.000000   569.000000   
mean     14.127292     19.289649       91.969033   654.889104   
std       3.524049      4.301036       24.298981   351.914129   
min       6.981000      9.710000       43.790000   143.500000   
25%      11.700000     16.170000       75.170000   420.300000   
50%      13.370000     18.840000       8