# Breast cancer detection with MP Neuron

## Data

The data contain many of features, relationed with tumors from real patients. Also we can find a lable, that tell us if the current person in study has cancerigen tumor, the same way if the tumor don't represent a danger for the subject we could know it.   
### Ref

* W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870, San Jose, CA, 1993.
* O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), pages 570-577, July-August 1995.
* W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) 163-171.

## Reading Data

In [1]:
from sklearn.datasets import load_breast_cancer
import pandas as pd

breast_cancer = load_breast_cancer()

X = breast_cancer.data
Y = breast_cancer.target

df = pd.DataFrame(X, columns=breast_cancer.feature_names)
df


Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.80,1001.0,0.11840,0.27760,0.30010,0.14710,0.2419,0.07871,...,25.380,17.33,184.60,2019.0,0.16220,0.66560,0.7119,0.2654,0.4601,0.11890
1,20.57,17.77,132.90,1326.0,0.08474,0.07864,0.08690,0.07017,0.1812,0.05667,...,24.990,23.41,158.80,1956.0,0.12380,0.18660,0.2416,0.1860,0.2750,0.08902
2,19.69,21.25,130.00,1203.0,0.10960,0.15990,0.19740,0.12790,0.2069,0.05999,...,23.570,25.53,152.50,1709.0,0.14440,0.42450,0.4504,0.2430,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.14250,0.28390,0.24140,0.10520,0.2597,0.09744,...,14.910,26.50,98.87,567.7,0.20980,0.86630,0.6869,0.2575,0.6638,0.17300
4,20.29,14.34,135.10,1297.0,0.10030,0.13280,0.19800,0.10430,0.1809,0.05883,...,22.540,16.67,152.20,1575.0,0.13740,0.20500,0.4000,0.1625,0.2364,0.07678
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
564,21.56,22.39,142.00,1479.0,0.11100,0.11590,0.24390,0.13890,0.1726,0.05623,...,25.450,26.40,166.10,2027.0,0.14100,0.21130,0.4107,0.2216,0.2060,0.07115
565,20.13,28.25,131.20,1261.0,0.09780,0.10340,0.14400,0.09791,0.1752,0.05533,...,23.690,38.25,155.00,1731.0,0.11660,0.19220,0.3215,0.1628,0.2572,0.06637
566,16.60,28.08,108.30,858.1,0.08455,0.10230,0.09251,0.05302,0.1590,0.05648,...,18.980,34.12,126.70,1124.0,0.11390,0.30940,0.3403,0.1418,0.2218,0.07820
567,20.60,29.33,140.10,1265.0,0.11780,0.27700,0.35140,0.15200,0.2397,0.07016,...,25.740,39.42,184.60,1821.0,0.16500,0.86810,0.9387,0.2650,0.4087,0.12400


## Splitting data

In [2]:
from sklearn.model_selection import train_test_split

X_train,X_test,Y_train,Y_test = train_test_split(df,Y,stratify=Y)

print("Length of training data: ", len(X_train))
print("Length of test data : ", len(X_test))

Length of training data:  426
Length of test data :  143


## MP Neuron implementation

In [3]:
import numpy as np
from sklearn.metrics import accuracy_score

class MPNeuron :
    
    def __init__(self):
        self.threshold = None
        
    def model(self,x):
        return(sum(x) >= self.threshold)
    
    def predict (self, X ):
        Y=[]
        for x in X:
            result = self.model(x)
            Y.append(result)
        return np.array(Y)
    
    def fit(self,X,Y) :
        accuracy= {}
        for th in range(X.shape[1]+1):
            self.threshold = th
            Y_predict = self.predict(X)  
            accuracy[th] = accuracy_score(Y_predict,Y)
        self.threshold= max(accuracy, key=accuracy.get)

# Normalizing data  

In [4]:
import matplotlib.pyplot as plt

X_train_normalized = X_train.apply(pd.cut,bins=2, labels=[1,0])
X_test_normalized = X_test.apply(pd.cut,bins=2, labels=[1,0])

X_train
X_train_normalized

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
124,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
359,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
465,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,0,0,1,1,1
271,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
270,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
501,1,1,1,1,0,1,1,1,0,1,...,1,0,1,1,0,1,1,0,1,1
424,1,1,1,1,1,1,1,1,0,1,...,1,1,1,1,1,1,1,1,1,1
173,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
286,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1


# Let's train the model

In [5]:
mp_neuron = MPNeuron()

mp_neuron.fit(X_train_normalized.to_numpy(), Y_train)


###       The optimus threshold

In [6]:
mp_neuron.threshold

27

**mp_neuron.threshold** is the improved value that ajust the other variables to give us a optimus result, based in the training labels. 

# Let´s test the model

In [8]:
y_predict = mp_neuron.predict(X_test_normalized.to_numpy())
y_predict

array([ True, False,  True,  True, False,  True,  True, False,  True,
        True, False,  True,  True,  True,  True,  True, False,  True,
        True,  True,  True,  True, False,  True,  True, False, False,
       False, False,  True,  True,  True,  True,  True,  True,  True,
        True, False,  True,  True,  True,  True, False,  True,  True,
        True, False, False,  True,  True,  True, False,  True,  True,
       False,  True,  True,  True,  True,  True, False, False, False,
       False, False,  True,  True,  True,  True, False,  True,  True,
       False, False,  True,  True, False,  True, False, False,  True,
        True,  True,  True, False,  True, False, False,  True, False,
        True,  True,  True, False,  True,  True,  True, False,  True,
        True,  True,  True,  True,  True,  True, False,  True,  True,
        True, False,  True, False,  True,  True,  True, False, False,
       False, False,  True,  True, False,  True, False,  True,  True,
        True, False,

This result of **y_predict** means that each person in study has a bolean result about the cancer dignostic, based in the training. 

# The result


In [10]:
accuracy_score( Y_test, y_predict)

0.8321678321678322

**accuracy_score** is the percent value of accuracy, that means the model MP Neuron is capable of predict with **accuracy_score** of trust the sampled data.