# Introduction


I implemented a simple binary perceptron and trained it to perform simple classification. This task involves classifying tumors as malignant or benign (0 and 1 respectively) according to 30 different measurements. I needed an accuracy of 70% or higher and could not use sklearn within the perceptron algorithm

# Procedure

There are two phases for this project:
1. getting and loading the dataset
2. implementing, training and testing the perceptron.


In [2]:
import numpy as np
import sklearn
from sklearn import datasets
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import random

I imported the data and split it to training and testing sets. The following lines load the UCI ML Breast Cancer Wisconsin (Diagnostic) Data Set. which contains 569 cases of tumors (each represented by 30 measurements). I split this data to 500 training cases and the rest for testing.

In [3]:
# load the data set
img,label=sklearn.datasets.load_breast_cancer(return_X_y=True)
# split the data set
scaler = StandardScaler()
img = scaler.fit_transform(img)
TRAIN_SIZE = 511
label = 2*label-1
train_img,test_img = img[:TRAIN_SIZE], img[TRAIN_SIZE:]
train_label,test_label = label[:TRAIN_SIZE], label[TRAIN_SIZE:]


# Phase Two: Perceptron Model

In [4]:
# Perceptron Class
class Perceptron(object):
    # Initialize the perceptron
    def __init__(self, dim_input = 30, dim_out = 2, learning_rate = 1):
        # model parameters 
        self.w = np.zeros(dim_input)
        self.bias = 0.0
        
        # learning rate
        self.learning_rate = learning_rate
    
    
    def predict(self,input_array):
        
        if np.dot(self.w, input_array) + self.bias > 0: # compute linear combination
            return 1
        else:
            return -1
                    
            
    def train(self, training_inputs, labels):
        
        for i in range(len(labels)): # I want to iterate over training example by index
            input = training_inputs[i]
            label = labels[i]
            prediction = self.predict(input)
            if prediction != label:
                self.w += self.learning_rate * label * input # updates weight if prediction is wrong
                self.bias += self.learning_rate * label # updates bias if prediction is wrong
        

    
    def test(self, testing_inputs, labels):
        # number of correct predictions
        count_correct = 0
        # a list of the predicted labels the same order as the input 
        pred_list = []
        for test_array, label in zip(testing_inputs,labels):
            prediction = self.predict(test_array)
            if prediction == label:
                count_correct += 1
            pred_list.append(prediction)
        accuracy = float(count_correct)/len(test_label)
        print('Accuracy is '+str(accuracy))
        return np.asarray(pred_list)

In [5]:
# Number of epochs (iterations over the training set)
NUM_EPOCH = 6

In [6]:
perceptron = Perceptron(learning_rate=0.5)
for ii in range(NUM_EPOCH):
    perceptron.train(train_img, train_label)
print(str(NUM_EPOCH)+' epochs')
pred_array = perceptron.test(test_img, test_label)

6 epochs
Accuracy is 0.9655172413793104
