# Breast Cancer Detection using Machine Learning

Author: Venkat Siddish Gudla

Problem Statement: Create a classifier from the dataset of Breast cancer provided by Scikit-Learn and predict using Naive Bayes 
Algorithm

In [1]:
# importing necessary libraries and the dataset from Scikit-Learn

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score

In [2]:
# loading dataset
# creating necessary variables and assigning the attributes of the dataset to them

data = load_breast_cancer()
label_names = data["target_names"]
labels = data["target"]
feature_names = data["feature_names"]
features = data["data"]

In [3]:
#checking the attributes 

print(label_names)
print("Class label :", labels[0])
print(feature_names)
print(features[0], "\n")

['malignant' 'benign']
Class label : 0
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
[1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01 2.776e-01 3.001e-01
 1.471e-01 2.419e-01 7.871e-02 1.095e+00 9.053e-01 8.589e+00 1.534e+02
 6.399e-03 4.904e-02 5.373e-02 1.587e-02 3.003e-02 6.193e-03 2.538e+01
 1.733e+01 1.846e+02 2.019e+03 1.622e-01 6.656e-01 7.119e-01 2.654e-01
 4.601e-01 1.189e-01] 



# Spliting The Dataset

To evaluate the performance of a classifier, we should always test the model on invisible data.
I  divided the data into two parts: an 80% training set and a 20% test set:

In [4]:
train, test, train_labels, test_labels = train_test_split(features, labels,
                                                          test_size=0.2,
                                                          random_state=42)

# Using Naive Bayes for Breast Cancer Detection

I used a simple algorithm that generally works well in binary classification tasks, namely the Naive Bayes classifier:

In [5]:
nb = GaussianNB()
nb.fit(train, train_labels)

GaussianNB()

After training the model, we can then use the trained model to make predictions on our test set, for which we use the predict() function.

The predict() function returns an array of predictions for each data instance in the test set. We can then print out our predictions to get a see for what the model determined:

In [6]:
output = nb.predict(test)
print(output, "\n")

[1 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0
 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0
 1 1 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 1 0
 1 1 0] 



Using the array of true class labels, we can assess the accuracy of our model’s predictors by comparing the two arrays (test_labels vs output).

I used the accuracy_score () function provided by Scikit-Learn to determine the accuracy rate of our machine learning classifier:

In [7]:
print(accuracy_score(test_labels, output))

0.9736842105263158
