## Breast Cancer classification using sklearn Gaussian Naive Bayes Classifier

In this breast cancer classification here I use [sklearn breast cancer dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html#sklearn.datasets.load_breast_cancer) and the [Gaussian Naive Bayes](https://scikit-learn.org/stable/modules/naive_bayes.html) classifier to predict whether or not a tumor is malignant or benign.

![dainis-graveris-JkPDdLFY2qE-unsplash.jpg](attachment:dainis-graveris-JkPDdLFY2qE-unsplash.jpg)

**Import relevant libraries**

In [1]:
import sklearn
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

**Load dataset**

In [2]:
data = load_breast_cancer()

**Organize data taking from data dictionary**

In [3]:
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']

**Look at a portion of data**

In [4]:
print(label_names)
print(labels[0])
print(feature_names[0])
print(features[0])

['malignant' 'benign']
0
mean radius
[1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01 2.776e-01 3.001e-01
 1.471e-01 2.419e-01 7.871e-02 1.095e+00 9.053e-01 8.589e+00 1.534e+02
 6.399e-03 4.904e-02 5.373e-02 1.587e-02 3.003e-02 6.193e-03 2.538e+01
 1.733e+01 1.846e+02 2.019e+03 1.622e-01 6.656e-01 7.119e-01 2.654e-01
 4.601e-01 1.189e-01]


**Split dataset**

In [5]:
train, test, train_labels, test_labels = train_test_split(features,labels,test_size=0.33,random_state=42)

**Initiate the classifier**

In [6]:
gnb = GaussianNB()

**Train the model** 

In [7]:
model = gnb.fit(train,train_labels)

**Make Predictions8**

In [8]:
preds = gnb.predict(test)

In [9]:
print(preds)

[1 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0
 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0
 1 1 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 1 0
 1 1 0 0 0 1 1 1 0 0 1 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 0
 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 0 1 1 1 1 1 1 0 0
 0 1 1]


**Evaluate the accuracy of the model**

In [10]:
print(accuracy_score(test_labels,preds))

0.9414893617021277


So the naive bayes classifier is ~94% accurate. That means this classifier able to predict correctly whether or not the tumor is malignant or benign.
Also that shows the 30 features of the dataset are good too.  