# CSC-321: Data Mining and Machine Learning
# Your Name
## Assignment 6: Classification with a neuron

### Part 1: Perceptron classification

The perceptron, as we saw in class, is the simpliest form of neural network consisting of a single neuron. Because it's so simple, it can only be used for two-class classification problems.

The perceptron is inspired by a single neural cell, called a neuron. This accepts input signals via dendrites. Similarly, the perceptron receives inputs from examples of training data, that we weight and combine in a linear equation, called the activation function.

activation = bias + sum(weight(i) * xi)

You should notice the similarity between this, and the linear regression and logistic regression that we've implemented so far.

Once we've computed the activation, we then transform it into the output value, using a transfer function (such as the step transfer function below) 

prediction = 1.0 IF activation >= 0.0, ELSE 0.0

In order for this mechanism to work, we have to estimate the weights given in the activation function. Fortunately, we know how to do that using stochastic gradient descent.

Each epoch, the weights are updated using the equation:

w = w + learning_rate * (expected - predicted) * x

Where you know that (expected - predicted) is the measure of error.

This is enough information for you to implement the following (which will be closely related to previous assignments):

- a predict function
    - that takes a single instance, and a list of weights, where weights[0] is the bias
- a stochastic gradient descent function
    - that takes training data, learning rate and a number of epochs
    - where the weights are first assigned zero scores, and then iteratively updated based on the formula
        - w(i) = w(i) + learning_rate * (expected - predicted) * x(i)
    - where you also update the bias based on the formula:
        - bias = bias + learning_rate * (expected - predicted)
- a perceptron function
    - that takes training set, test set, learning rate and epochs
    - that learns the weights using SGD
    - then makes predictions over the test set using these weights
    - and returns these predictions as a list

I've given you a contrived data set for both your predict function, and for testing your SGD function. I've included sample output below. 

Then I want you to apply your classifier to the included sonar dataset, using the parameters given, as well as running a reasonable baseline comparison algorithm. You should perform a 3 fold cross validation. You can find out more about this data set here: https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks)

The extra twist here is that the data in the sonar data set should be converted to floats EXCEPT for the class (in the last position in each instance), that we should convert to an integer that represents...what? Currently, the class is a nominal category, and we should convert it to an integer: 1 for one class and 0 for the other. Also we will not normalize this data. Why not?

In [1]:
# Implement or copy your code here
import statistics
import random
import csv
import math
from collections import Counter


def load_data(filename):
    csvTxt = csv.reader(open(filename))
    data = []
    for row in csvTxt:
        data.append(row)
    return data

def column2Float(dataset,column):
    for instance in dataset:
        instance[column] = float(instance[column])
    return dataset

def mean(listOfValues):
    total = 0
    for num in listOfValues:
        total += num
    return total/len(listOfValues)

def zeroRR(train, test):
    trainY = [i[-1] for i in train]
    testY = [i[-1] for i in test]

    trainYMean = mean(trainY)
    predictions = []
    for i in testY:
        predictions.append(trainYMean)
    return predictions


def rmse_eval(actual, predicted):
    error = 0.0
    for i in range(len(actual)):
        error += (predicted[i] - actual[i])**2
    error = error/len(actual)
    error = error**0.5
    return error

def minmax(dataset):
    listMinMax = []
    for column in range(len(dataset[0])):
        columnData = [dataset[i][column] for i in range(len(dataset))]
        listMinMax.append([min(columnData), max(columnData)])
    return listMinMax

def normalize(dataset, minmax):
    for row in range(len(dataset)):
        for column in range(len(dataset[row])):
            dataset[row][column] = (dataset[row][column] - minmax[column][0]) / (minmax[column][1] - minmax[column][0])

def accuracy(actual, predicted):
    counter = 0
    for i in range(len(actual)):
        if actual[i] == predicted[i]:
            counter += 1
    return counter*100/len(actual)

def zeroRC(train, test):
    trainY = [i[-1] for i in train]
    count = Counter(trainY)
    dataMode = count.most_common(1)[0][0]
    return [dataMode for i in test]

random.seed(1)

def cross_validation_data(dataset, folds):
    dataCopy = dataset[:]
    foldLen = len(dataCopy)//folds
    crossData = []
    for i in range(folds - 1):
        currFold = []
        for j in range(foldLen):
            currData = random.choice(dataCopy)
            currFold.append(currData)
            dataCopy.pop(dataCopy.index(currData))
        crossData.append(currFold)
    currFold = []
    for i in range(len(dataCopy)):
            currData = random.choice(dataCopy)
            currFold.append(currData)
            dataCopy.pop(dataCopy.index(currData))
    crossData.append(currFold)
    return crossData

def predictPer(instance, weights):
    activation = weights[0]
    for i in range(len(instance)-1):
        activation += instance[i]*weights[i+1]
    if activation >= 0:
        return 1.0
    else:
        return 0.0


def sgd_per(dataset, learning_rate, epochs):
    weights = [0 for i in range(len(dataset[0]))]
    for e in range(epochs):
        totalError = 0
        for instance in dataset:
            pred = predictPer(instance, weights)
            error = instance[-1] - pred
            totalError += error**2
            weights[0] += learning_rate*error
            for i in range(1,len(weights)):
                weights[i] += learning_rate*error*instance[i-1]
        print('epoch=', e, 'lrate=', learning_rate, 'error=%.3f' %totalError)
    return weights

def perceptron(train, test, learning_rate, epoch):
    weights = sgd_per(train, learning_rate, epochs)
    print(weights)
    predictions = []
    for entry in test:
        prediction = predictPer(entry, weights)
        predictions.append(round(prediction))
    return predictions




# Contrived data
# Predict should work, given the weights below

def evaluate_algorithm(dataset, algorithm, folds, metric, *args):
    foldedData = cross_validation_data(dataset, folds)
    scores = []
    for i in range(len(foldedData)):
        copyFolded = foldedData[:]
        test_data = copyFolded.pop(i)
        test = [test_data[j][:-1] for j in range(len(test_data))]
        for j in test:
            j.append(None)
        train = []
        for fold in copyFolded:
            train += fold
        predicted = algorithm(train,test, *args)
        actual = [j[-1] for j in test_data]
        result = metric(actual,predicted)
        scores.append(result)

    return scores
dataset = [[2.7810836,2.550537003,0],
    [1.465489372,2.362125076,0],
    [3.396561688,4.400293529,0],
    [1.38807019,1.850220317,0],
    [3.06407232,3.005305973,0],
    [7.627531214,2.759262235,1],
    [5.332441248,2.088626775,1],
    [6.922596716,1.77106367,1],
    [8.675418651,-0.242068655,1],
    [7.673756466,3.508563011,1]]

weights = [-0.1, 0.20653640140000007, -0.23418117710000003]

# Using your SGD function with a learning rate of 0.1, and 5 epochs, should give you:
#
#>epoch=0, lrate=0.100, error=2.000
#>epoch=1, lrate=0.100, error=1.000
#>epoch=2, lrate=0.100, error=0.000
#>epoch=3, lrate=0.100, error=0.000
#>epoch=4, lrate=0.100, error=0.000
#
#[-0.1, 0.20653640140000007, -0.23418117710000003]

weightsTest = sgd_per(dataset, 0.1, 5)
print(weightsTest)


# Parameters for learning over real data

filename = 'sonar.all-data.csv'
sonarData = load_data(filename)
print('Number of Instances:', len(sonarData), 'Number of Features:', len(sonarData[0]))
for column in range(len(sonarData[0])-1):
    column2Float(sonarData, column)
    
def rocketMineClass(data):
    for i in data: 
        if i[-1] == 'R':
            i[-1] = 1
        elif i[-1] == 'M':
            i[-1] = 0
            
rocketMineClass(sonarData)
minmaxSonar = minmax(sonarData)
sonarCopy = sonarData[:]


folds = 3
learning_rate = 0.01
epochs = 500

scores = evaluate_algorithm(sonarCopy, perceptron, folds, accuracy, learning_rate, epochs)
zeroRCScores = evaluate_algorithm(sonarCopy, zeroRC, folds, accuracy)
print('Perceptron:', scores)
print('Perceptron Min: %.3f' % min(scores), 'Perceptron Max: %.3f' % max(scores), 'Perceptron Mean: %.3f' % mean(scores))
print('zeroRC: ', zeroRCScores)
print('zeroRC Min: %.3f' % min(zeroRCScores), 'zeroRC Max: %.3f' % max(zeroRCScores), 'zeroRC Mean: %.3f' % mean(zeroRCScores))


epoch= 0 lrate= 0.1 error=2.000
epoch= 1 lrate= 0.1 error=1.000
epoch= 2 lrate= 0.1 error=0.000
epoch= 3 lrate= 0.1 error=0.000
epoch= 4 lrate= 0.1 error=0.000
[-0.1, 0.20653640140000007, -0.23418117710000003]
Number of Instances: 208 Number of Features: 61
epoch= 0 lrate= 0.01 error=68.000
epoch= 1 lrate= 0.01 error=46.000
epoch= 2 lrate= 0.01 error=49.000
epoch= 3 lrate= 0.01 error=47.000
epoch= 4 lrate= 0.01 error=46.000
epoch= 5 lrate= 0.01 error=37.000
epoch= 6 lrate= 0.01 error=50.000
epoch= 7 lrate= 0.01 error=35.000
epoch= 8 lrate= 0.01 error=35.000
epoch= 9 lrate= 0.01 error=45.000
epoch= 10 lrate= 0.01 error=42.000
epoch= 11 lrate= 0.01 error=36.000
epoch= 12 lrate= 0.01 error=49.000
epoch= 13 lrate= 0.01 error=30.000
epoch= 14 lrate= 0.01 error=39.000
epoch= 15 lrate= 0.01 error=44.000
epoch= 16 lrate= 0.01 error=36.000
epoch= 17 lrate= 0.01 error=38.000
epoch= 18 lrate= 0.01 error=40.000
epoch= 19 lrate= 0.01 error=36.000
epoch= 20 lrate= 0.01 error=46.000
epoch= 21 lrate= 

epoch= 231 lrate= 0.01 error=22.000
epoch= 232 lrate= 0.01 error=28.000
epoch= 233 lrate= 0.01 error=20.000
epoch= 234 lrate= 0.01 error=30.000
epoch= 235 lrate= 0.01 error=10.000
epoch= 236 lrate= 0.01 error=20.000
epoch= 237 lrate= 0.01 error=17.000
epoch= 238 lrate= 0.01 error=30.000
epoch= 239 lrate= 0.01 error=28.000
epoch= 240 lrate= 0.01 error=24.000
epoch= 241 lrate= 0.01 error=26.000
epoch= 242 lrate= 0.01 error=22.000
epoch= 243 lrate= 0.01 error=32.000
epoch= 244 lrate= 0.01 error=20.000
epoch= 245 lrate= 0.01 error=16.000
epoch= 246 lrate= 0.01 error=26.000
epoch= 247 lrate= 0.01 error=16.000
epoch= 248 lrate= 0.01 error=21.000
epoch= 249 lrate= 0.01 error=25.000
epoch= 250 lrate= 0.01 error=14.000
epoch= 251 lrate= 0.01 error=28.000
epoch= 252 lrate= 0.01 error=20.000
epoch= 253 lrate= 0.01 error=26.000
epoch= 254 lrate= 0.01 error=16.000
epoch= 255 lrate= 0.01 error=23.000
epoch= 256 lrate= 0.01 error=13.000
epoch= 257 lrate= 0.01 error=26.000
epoch= 258 lrate= 0.01 error

epoch= 490 lrate= 0.01 error=15.000
epoch= 491 lrate= 0.01 error=26.000
epoch= 492 lrate= 0.01 error=22.000
epoch= 493 lrate= 0.01 error=18.000
epoch= 494 lrate= 0.01 error=11.000
epoch= 495 lrate= 0.01 error=19.000
epoch= 496 lrate= 0.01 error=14.000
epoch= 497 lrate= 0.01 error=11.000
epoch= 498 lrate= 0.01 error=15.000
epoch= 499 lrate= 0.01 error=22.000
[0.26000000000000006, -0.11627600000000317, 0.07053999999999476, 0.23072800000000046, -0.17158499999999868, -0.37406599999998996, -0.14514899999999825, 0.5176939999999928, 0.2185260000000172, -0.19204800000000047, 0.12494900000000428, -0.30247300000000704, -0.2553910000000037, 0.06247800000000376, -0.016958000000002096, 0.04577500000000074, 0.08943600000000061, -0.07131500000000031, 0.22348699999999774, -0.17157300000000753, -0.03417199999999979, -0.022840000000000547, -0.15135500000000277, 0.2868529999999965, -0.76068399999999, 0.36363699999999194, 0.08010599999999828, -0.17779600000001158, 0.24165299999999149, -0.04368399999999886

epoch= 189 lrate= 0.01 error=24.000
epoch= 190 lrate= 0.01 error=20.000
epoch= 191 lrate= 0.01 error=26.000
epoch= 192 lrate= 0.01 error=22.000
epoch= 193 lrate= 0.01 error=21.000
epoch= 194 lrate= 0.01 error=17.000
epoch= 195 lrate= 0.01 error=21.000
epoch= 196 lrate= 0.01 error=20.000
epoch= 197 lrate= 0.01 error=16.000
epoch= 198 lrate= 0.01 error=12.000
epoch= 199 lrate= 0.01 error=24.000
epoch= 200 lrate= 0.01 error=21.000
epoch= 201 lrate= 0.01 error=20.000
epoch= 202 lrate= 0.01 error=18.000
epoch= 203 lrate= 0.01 error=14.000
epoch= 204 lrate= 0.01 error=20.000
epoch= 205 lrate= 0.01 error=24.000
epoch= 206 lrate= 0.01 error=21.000
epoch= 207 lrate= 0.01 error=18.000
epoch= 208 lrate= 0.01 error=23.000
epoch= 209 lrate= 0.01 error=20.000
epoch= 210 lrate= 0.01 error=19.000
epoch= 211 lrate= 0.01 error=16.000
epoch= 212 lrate= 0.01 error=24.000
epoch= 213 lrate= 0.01 error=16.000
epoch= 214 lrate= 0.01 error=18.000
epoch= 215 lrate= 0.01 error=26.000
epoch= 216 lrate= 0.01 error

epoch= 448 lrate= 0.01 error=20.000
epoch= 449 lrate= 0.01 error=14.000
epoch= 450 lrate= 0.01 error=18.000
epoch= 451 lrate= 0.01 error=16.000
epoch= 452 lrate= 0.01 error=20.000
epoch= 453 lrate= 0.01 error=17.000
epoch= 454 lrate= 0.01 error=16.000
epoch= 455 lrate= 0.01 error=16.000
epoch= 456 lrate= 0.01 error=14.000
epoch= 457 lrate= 0.01 error=22.000
epoch= 458 lrate= 0.01 error=14.000
epoch= 459 lrate= 0.01 error=14.000
epoch= 460 lrate= 0.01 error=22.000
epoch= 461 lrate= 0.01 error=17.000
epoch= 462 lrate= 0.01 error=14.000
epoch= 463 lrate= 0.01 error=16.000
epoch= 464 lrate= 0.01 error=20.000
epoch= 465 lrate= 0.01 error=18.000
epoch= 466 lrate= 0.01 error=17.000
epoch= 467 lrate= 0.01 error=16.000
epoch= 468 lrate= 0.01 error=18.000
epoch= 469 lrate= 0.01 error=16.000
epoch= 470 lrate= 0.01 error=20.000
epoch= 471 lrate= 0.01 error=18.000
epoch= 472 lrate= 0.01 error=20.000
epoch= 473 lrate= 0.01 error=16.000
epoch= 474 lrate= 0.01 error=16.000
epoch= 475 lrate= 0.01 error

epoch= 154 lrate= 0.01 error=15.000
epoch= 155 lrate= 0.01 error=22.000
epoch= 156 lrate= 0.01 error=29.000
epoch= 157 lrate= 0.01 error=18.000
epoch= 158 lrate= 0.01 error=14.000
epoch= 159 lrate= 0.01 error=16.000
epoch= 160 lrate= 0.01 error=18.000
epoch= 161 lrate= 0.01 error=26.000
epoch= 162 lrate= 0.01 error=25.000
epoch= 163 lrate= 0.01 error=26.000
epoch= 164 lrate= 0.01 error=24.000
epoch= 165 lrate= 0.01 error=15.000
epoch= 166 lrate= 0.01 error=17.000
epoch= 167 lrate= 0.01 error=20.000
epoch= 168 lrate= 0.01 error=17.000
epoch= 169 lrate= 0.01 error=21.000
epoch= 170 lrate= 0.01 error=17.000
epoch= 171 lrate= 0.01 error=18.000
epoch= 172 lrate= 0.01 error=18.000
epoch= 173 lrate= 0.01 error=16.000
epoch= 174 lrate= 0.01 error=15.000
epoch= 175 lrate= 0.01 error=23.000
epoch= 176 lrate= 0.01 error=14.000
epoch= 177 lrate= 0.01 error=25.000
epoch= 178 lrate= 0.01 error=17.000
epoch= 179 lrate= 0.01 error=15.000
epoch= 180 lrate= 0.01 error=20.000
epoch= 181 lrate= 0.01 error

epoch= 388 lrate= 0.01 error=18.000
epoch= 389 lrate= 0.01 error=18.000
epoch= 390 lrate= 0.01 error=15.000
epoch= 391 lrate= 0.01 error=16.000
epoch= 392 lrate= 0.01 error=18.000
epoch= 393 lrate= 0.01 error=17.000
epoch= 394 lrate= 0.01 error=18.000
epoch= 395 lrate= 0.01 error=15.000
epoch= 396 lrate= 0.01 error=16.000
epoch= 397 lrate= 0.01 error=19.000
epoch= 398 lrate= 0.01 error=12.000
epoch= 399 lrate= 0.01 error=12.000
epoch= 400 lrate= 0.01 error=15.000
epoch= 401 lrate= 0.01 error=17.000
epoch= 402 lrate= 0.01 error=17.000
epoch= 403 lrate= 0.01 error=19.000
epoch= 404 lrate= 0.01 error=16.000
epoch= 405 lrate= 0.01 error=21.000
epoch= 406 lrate= 0.01 error=12.000
epoch= 407 lrate= 0.01 error=19.000
epoch= 408 lrate= 0.01 error=13.000
epoch= 409 lrate= 0.01 error=19.000
epoch= 410 lrate= 0.01 error=16.000
epoch= 411 lrate= 0.01 error=16.000
epoch= 412 lrate= 0.01 error=14.000
epoch= 413 lrate= 0.01 error=18.000
epoch= 414 lrate= 0.01 error=11.000
epoch= 415 lrate= 0.01 error

We do not need to normalize the data for the sonar data set because the data is already between 0 and 1. From using perceptron we see that there are certain angles that are more important in differentiating a rock from a mine the more important angles are weighted close to 0.5 or -0.5 while the least important angles are weighted around 0.08. All other angles are weighted between these between these two extremes. Overall the perceptron has an accuracy of differentiating between a rock and a mine of around 73% which is fairly accurate. However I am not sure if I would trust it with my life. The zeroRC should have worked slightly better considering that there were more instances of mines than rocks in the data set and it had an mean accuracy of 46% its mean accuracy should be above 50%. It may be lower because if the fold was made with equal amounts of rocks and mines it could have chosen rocks while that training data may have had a more accurate split to the overall data making zeroRC work not as well. To work better we should stratisfy the folds. To make each fold more accurate to the overall dataset. 