<h3><b>Perceptrons

Implement the perceptron algorithm, where

-><b>data</b> is a numpy array of dimension d by n<br>
-><b>labels</b> is numpy array of dimension 1 by n<br>
-><b>params</b> is a dictionary specifying extra parameters to this algorithm; your algorithm should run a number of iterations equal to T<br>
-><b>hook</b> is either None or a function that takes the tuple (th, th0) as an argument and displays the separator graphically. We won't be testing this in the Tutor, but it will help you in debugging on your own machine.<br>

It should return a tuple of θ (a d by 1 array) and θ<sub>0</sub>  (a 1 by 1 array).

We have given you some data sets in the code file for you to test your implementation.

Your function should initialize all parameters to 0, then run through the data, in the order it is given, performing an update to the parameters whenever the current parameters would make a mistake on that data point. Perform T iterations through the data. After every parameter update, if hook is defined, call it on the current (th, th0) (as a single parameter in a python tuple).

When debugging on your own, you can use the procedure test_linear_classifier for testing. By default, it pauses after every parameter update to show the separator. For data sets not in 2D, or just to get the answer, set draw = False

In [1]:
import numpy as np

def perceptron(data, labels, params={}, hook=None):
    # if T not in params, default to 100
    T = params.get('T', 100)

    # Your implementation here
    th=np.zeros([data.shape[0],1])
    th0=np.array([[0]])
    n=len(data)**2
    for j in range(1,T):   
        for i in range(data.shape[1]):
           if labels.T[i]*(th.T.dot(data.T[i])+th0)<=0:
               th=np.array(th.T+labels.T[[i]]*data.T[[i]],dtype = np.float64).T
               th0=np.array(th0+labels.T[i],dtype = np.float64)
               t=[np.array(th).tolist(),np.array(th0).tolist()]
               #print(t)
    return th,th0

### Implement averaged perceptron

Regular perceptron can be somewhat sensitive to the most recent examples that it sees. Instead, averaged perceptron produces a more stable output by outputting the average value of th and th0 across all iterations.

Implement averaged perceptron with the same spec as regular perceptron, and using the pseudocode below as a guide

In [2]:
def averaged_perceptron(data, labels, params={}, hook=None):
    # if T not in params, default to 100
    T = params.get('T', 100)

    # Your implementation here
    (d,n)=data.shape
    th=np.zeros([d,1])
    th0=np.array([[0]])
    ths=np.zeros([d,1])
    th0s=np.array([[0]])
    for j in range(0,T):   
        for i in range(n):
            if labels.T[i]*(th.T.dot(data.T[i])+th0)<=0:
               th=np.array(th.T+labels.T[[i]]*data.T[[i]]).T
               th0=np.array(th0+labels.T[i])
               #t=[np.array(th).tolist(),np.array(th0).tolist()]
            ths=ths+th
            th0s=th0s+th0
               
    return ths/(n*T),th0s/(n*T)

### Evaluating a learning algorithm using a data source

Construct a testing procedure that takes a learning algorithm and a data source as input and runs the learning algorithm multiple times, each time evaluating the resulting classifier as above. It should report the overall average classification accuracy.

You can use our implementation of eval_classifier as above.

Write the function eval_learning_alg that takes:

<b>learner</b> - a function, such as perceptron or averaged_perceptron<br>
<b>data_gen</b>- a data generator, call it with a desired data set size; returns a tuple (data, labels)<br>
<b>n_train</b> - the size of the learning sets<br>
<b>n_test</b> - the size of the test sets<br>
<b>it</b> - the number of iterations to average over<br>
and returns the average classification accuracy as a float between 0. and 1..

Note: Be sure to generate your training data and then testing data in that order, to ensure that the pseudorandomly generated data matches that in the test code.

In [3]:
def eval_learning_alg(learner, data_gen, n_train, n_test, it):
    e=0
    for i in range (0, it):
        data_train,labels_train=data_gen(n_train)
        data_test,labels_test=data_gen(n_test)
        e=e+eval_classifier(learner, data_train, labels_train, data_test, labels_test)
    return e/it

### Evaluating a learning algorithm with a fixed dataset

Cross-validation is a strategy for evaluating a learning algorithm, using a single training set of size 
n. Cross-validation takes in a learning algorithm L, a fixed data set D, and a parameter k. It will run the learning algorithm k different times, then evaluate the accuracy of the resulting classifier, and ultimately return the average of the accuracies over each of the k "runs" of L

So, each time, it trains on k−1 of the pieces of the data set and tests the resulting hypothesis on the piece that was not used for training.

When k=n, it is called leave-one-out cross validation.

In [4]:
def xval_learning_alg(learner, data, labels, k):
    # cross validation of learning algorithm
    e=0
    data_split=np.array(np.array_split(data,k,axis=1)) 
    labels_split=np.array(np.array_split(labels,k,axis=1))
    for i in range(k):
        test_data=data_split[i]
        test_labels=labels_split[i]
        a=[*range(k)]
        a.pop(i)
        train_data=np.concatenate(data_split[a],axis=1)
        train_labels=np.concatenate(labels_split[a],axis=1)
        #print(test_data)
        e=e+eval_classifier(learner,train_data,train_labels,test_data,test_labels)
    return e/k