<a href="https://colab.research.google.com/github/Shubham04689/colab_notebooks/blob/main/SVM_Linear_Classifer_Breast_Cancer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Learning Objective

At the end of the experiment, you will be able to :

* Perform SVM Linear Classifier

### Dataset

#### History

Breast cancer (BC) is one of the most common cancers among women in the world today. Currently, the average risk of a woman in the United States developing breast cancer sometime in her life is about 13%, which means there is a 1 in 8 chance she will develop breast cancer!. An early diagnosis of BC can greatly improve the prognosis and chance of survival for patients. Thus an accurate identification of malignant tumors is of paramount importance.

#### Description

The Breast cancer Data Set consists of 569 data instances. This is a binary classification problem which consists of 2 types of cancer classes. The tumor is classified as **benign (1)** or **malignant (0)** based on its geometry and shape.

The features of the dataset include:

* ID number
* Diagnosis (M = malignant, B = benign)

Ten real-valued features are computed for each cell nucleus:

* radius (mean of distances from center to points on the perimeter)
* texture (standard deviation of gray-scale values)
* perimeter
* area
* smoothness (local variation in radius lengths)
* compactness (perimeter^2 / area - 1.0)
* concavity (severity of concave portions of the contour)
* concave points (number of concave portions of the contour)
* symmetry
* fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three
largest values) of these features were computed for each image. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.

Class distribution: 357 benign, 212 malignant



### Import required packages

In [None]:
import pandas as pd


# Download the dataset from the provided link
!wget https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/Breast_Cancer.csv

# Load the dataset into a pandas DataFrame
iris_df = pd.read_csv('Breast_Cancer.csv')

# Print the first few rows of the DataFrame to check if the data loaded correctly
print(iris_df.head())

### Load the data

In [None]:
breast_cancer = pd.read_csv("Breast_Cancer.csv")
breast_cancer.head()

In [None]:
breast_cancer['diagnosis'] = breast_cancer['diagnosis'].replace(['M','B'],[0,1])
# or
# breast_cancer = breast_cancer.replace('M', 0)
# breast_cancer = breast_cancer.replace('B', 1)

In [None]:
features = breast_cancer.drop(['diagnosis', 'id'], axis=1) # id is unnecessary columns so we can drop it
labels = breast_cancer.diagnosis

In [None]:
print(features.head())

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
# Perform train test split for the given data
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size = 0.3)

In [None]:
from sklearn.svm import SVC

In [None]:
# Create a svm Classifier
clf = SVC(kernel='linear')

# Fitting the model
clf.fit(X_train, y_train)

# Predicting on the test dataset
y_pred = clf.predict(X_test)

In [None]:
from sklearn import metrics
metrics.accuracy_score(y_test, y_pred)

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

# Generate a classification report
print(classification_report(y_test, y_pred))

# Generate a confusion matrix
print(confusion_matrix(y_test, y_pred))

# Calculate precision, recall, and F1-score manually if needed
from sklearn.metrics import precision_score, recall_score, f1_score

precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-score: {f1:.4f}")