
# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint
## Not for grading

## Learning Objective

At the end of the experiment, you will be able to :

* Perform SVM classifier

## Dataset

### History

Breast cancer (BC) is one of the most common cancers among women in the world today. Currently, the average risk of a woman in the United States developing breast cancer sometime in her life is about 13%, which means there is a 1 in 8 chance she will develop breast cancer!. An early diagnosis of BC can greatly improve the prognosis and chance of survival for patients. Thus an accurate identification of malignant tumors is of paramount importance.

### Description

The Breast cancer Data Set consists of 569 data instances. This is a binary classification problem which consists of 2 types of cancer classes. The tumor is classified as **benign (1)** or **malignant (0)** based on its geometry and shape. 

The features of the dataset include:

* ID number
* Diagnosis (M = malignant, B = benign) 

Ten real-valued features are computed for each cell nucleus:

* radius (mean of distances from center to points on the perimeter) 
* texture (standard deviation of gray-scale values) 
* perimeter 
* area 
* smoothness (local variation in radius lengths) 
* compactness (perimeter^2 / area - 1.0) 
* concavity (severity of concave portions of the contour) 
* concave points (number of concave portions of the contour) 
* symmetry 
* fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three
largest values) of these features were computed for each image. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.

Class distribution: 357 benign, 212 malignant



In [None]:
!  wget https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/Breast_Cancer.csv

### Importing Required Packages

In [None]:
import pandas as pd

In [None]:
# Loading the  dataset
breast_cancer = pd.read_csv("Breast_Cancer.csv")
breast_cancer.head()

In [None]:
breast_cancer.shape

In [None]:
breast_cancer['diagnosis'] = breast_cancer['diagnosis'].replace(['M','B'],[0,1])
# or
# breast_cancer = breast_cancer.replace('M', 0)
# breast_cancer = breast_cancer.replace('B', 1)

In [None]:
labels = breast_cancer.diagnosis
features = breast_cancer.drop(['diagnosis', 'id'], axis=1) # id is unnecessary columns so we can drop it

In [None]:
print(features.head())

In [None]:
labels

### Splitting the data into train and test sets 

Hint: [Train-Test split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
# YOUR CODE HERE to perform train test split on the given data
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size = 0.3)

### Training a  SVM Classifier 

Hint: [SVM](https://scikit-learn.org/0.16/modules/generated/sklearn.svm.SVC.html)

In [None]:
from sklearn.svm import SVC

In [None]:
# Create a svm Classifier
clf = SVC(kernel='linear') 

# Fitting the model 
clf.fit(X_train, y_train)

# Predicting on the test dataset
y_pred = clf.predict(X_test)

In [None]:
from sklearn import metrics
metrics.accuracy_score(y_test, y_pred)