# Introduction


In this homework you will implement a simple binary perceptron and train it to perform simple classification. This task involves classifying tumors as malignant or benign (0 and 1 respectively) according to 30 different measurements. (While unnecessary for completing the project, you can dive into what each feature means here).

# Grading


1. As long as your perceptron classifier (and the other classifiers you will use) achieves an at least 70% accuracy and runs in less than 3 minutes you will get full credit!
2. No monkey business! Do not use sklearn or any off the shelf perceptron classifiers. Also, sending your code to someone else / asking for someone else for his code will get you an academic suspension.
3. We have a solution with less than 12 lines of code that gets over 94% accuracy. Just to give you ad idea that this is not a complicated/long coding assignment. Just make sure to document your code.


# Procedure

There are two phases for this project:
1. getting and loading the dataset
2. implementing, training and testing the perceptron.


# Template is not enfoced
You do not have to follow this template. As far as you can provide the correct code with a reasonable accuracy, you will be fine.

# Phase One: Packages, Data, and Setup


The package sklearn is a popular machine learning library for python. In addition to implementations of many algorithms and tools for  statistical analysis this package also contains many small datasets of anything. Please note you will not use that library for the perceptron algorithm --that is your task to implement. But you can use it for loading data, etc. 


In [32]:
import numpy as np
import sklearn
from sklearn import datasets
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import random

Next we will import the data and split it to training and testing sets. The following lines load the UCI ML Breast Cancer Wisconsin (Diagnostic) Data Set. which contains 569 cases of tumors (each represented by 30 measurements). We split this data to 500 training cases and the rest for testing.

In [33]:
# load the data set
img,label=sklearn.datasets.load_breast_cancer(return_X_y=True)
# split the data set
TRAIN_SIZE = 511
label = 2*label-1
train_img,test_img = img[:TRAIN_SIZE], img[TRAIN_SIZE:]
train_label,test_label = label[:TRAIN_SIZE], label[TRAIN_SIZE:]

# Phase Two: Perceptron Model

In [34]:
# Perceptron Class
class Perceptron(object):
    # Initialize the perceptron
    def __init__(self, dim_input = 30, dim_out = 2, learning_rate = 1):
        # model parameters 
        self.w = np.zeros(dim_input)
        self.bias = 0.0
        
        # learning rate
        self.learning_rate = learning_rate
    
    
    #https://machinelearningmastery.com/perceptron-algorithm-for-classification-in-python/
    #https://cmci.colorado.edu/classes/INFO-4604/files/slides-3_perceptron.pdf
    #https://numpy.org/doc/stable/reference/generated/numpy.dot.html
    #https://www.maxbartolo.com/ml-index-item/dot-scalar-product/#:~:text=In%20Python%2C%20one%20way%20to,comprehension%20performing%20element%2Dwise%20multiplication.&text=Alternatively%2C%20we%20can%20use%20the,dot()%20function.&text=Keeping%20to%20the%20convention%20of,Ty%20x%20T%20y%20.
    #https://en.wikipedia.org/wiki/Perceptron
    #https://medium.com/@thomascountz/19-line-line-by-line-python-perceptron-b6f113b161f3
    #used these for the predict method
    
    def predict(self,input_array):
        # See the "Perceptron learning rule" slides: w * x
        
        #
        # Complete! Complete! Complete!
        # Complete! Complete! Complete!
        #
        
        
        activation = np.dot(self.w, input_array) + self.bias
        if activation > 0:
            return 1
        else:
            return -1
        
        
        
                    
    #https://machinelearningmastery.com/perceptron-algorithm-for-classification-in-python/
    #http://www.phontron.com/slides/nlp-programming-en-05-perceptron.pdf
    #https://medium.com/@thomascountz/19-line-line-by-line-python-perceptron-b6f113b161f3
    
    #used this for weight and bias
    def train(self, training_inputs, labels):
        #
        # Complete! Complete! Complete!
        # Complete! Complete! Complete!
        #
        for j in labels: #going through labels
            for i in training_inputs: #going through training inputs
                predict = self.predict(i) #for predict
                if predict == labels[j]: #if it matches, continue
                    return True
                else: #if it doesnt, progress
                    self.w += self.learning_rate * (labels[j] - predict) * i
                    self.bias += self.learning_rate * (labels[j] - predict)
                    return False
    
    def test(self, testing_inputs, labels):
        # number of correct predictions
        count_correct = 0
        # a list of the predicted labels the same order as the input 
        pred_list = []
        for test_array, label in zip(testing_inputs,labels):
            prediction = self.predict(test_array)
            if prediction == label:
                count_correct += 1
            pred_list.append(prediction)
        accuracy = float(count_correct)/len(test_label)
        print('Accuracy is '+str(accuracy))
        return np.asarray(pred_list)

In [35]:
# Number of epochs (iterations over the training set)
NUM_EPOCH = 7

In [36]:
perceptron = Perceptron(learning_rate=0.5)
for ii in range(NUM_EPOCH):
    perceptron.train(train_img, train_label)
print(str(NUM_EPOCH)+' epochs')
pred_array = perceptron.test(test_img, test_label)

7 epochs
Accuracy is 0.7586206896551724
