# Percptron Lab

In [None]:
# import all our libraries here
%matplotlib inline 
import matplotlib.pyplot as plt # this library will be used for the scatter plot
import numpy as np 
import pandas as pd # to manage data frames and reading csv files


## Creating a Perceptron Classifier Class
The first step is to develop a function that can make predictions.
This will be needed both in the evaluation of candidate weights values in stochastic gradient descent, and after the model is finalized and we wish to start making predictions on test data or new data.
Below is a function named predict() that predicts an output value for an given instance using a set of weights.
The first weight is always the bias as it is standalone and not responsible for a specific input value.
Note: the best weights will be learnt iterative with gradient descent through the train method


## Training Network Weights
We can estimate the weight values for our training data using stochastic gradient descent.

Stochastic gradient descent requires two parameters:

Learning Rate: Used to limit the amount each weight is corrected each time it is updated.
Epochs: The number of times to run through the training data while updating the weight.
These, along with the training data will be the arguments to the function.

There are 3 loops we need to perform in the function:

Loop over each epoch.
Loop over each data instance in the training data for an epoch.
Loop over each weight and update it for the training instance in an epoch.
As you can see, we update each weight for each instance in the training data, each epoch.

Weights are updated based on the error the model made. The error is calculated as the difference between the expected output value and the prediction made with the candidate weights.

There is one weight for each input attribute, and these are updated in a consistent way. Remember the bias usually needs to be multiplied with 1. 

w(t+1)= w(t) + learning_rate * (expected(t) - predicted(t)) * x(t)

In [None]:
class perceptron(object):
    """Perceptron classifier.

    Parameters
    ------------
    lr : float
      Learning rate (between 0.0 and 1.0)
    input_size : int
      number of features in an instance.
    random_state : int
      Random number generator seed for random weight
      initialization.
    epochs : int
        Number of epochs for training the network towards achieving convergence

    Attributes
    -----------
    W : 1d-array
      Weights after fitting.
    E : list
      Sum-of-squares cost function value in each epoch.

    """
    def __init__(self, input_size, lr=0.01, epochs=50, random_state=1):
               
        self.input_size = input_size
        self.epochs = epochs
        self.lr = lr
        self.E= []
        self.random_state = random_state
        
        rgen = np.random.RandomState(self.random_state) # use a random seed and draw from a normal dist centered on zero
        self.W = rgen.normal(loc=0.0, scale=0.01, size=self.input_size+1) #initialise weights and add one for bias
        #self.W = np.random.normal(0.0, pow(input_size, -0.5), (1, input_size))
        #self.W = np.zeros(input_size+1) #initialise weights to zero and add one for bias
               
        self.E = []
        
    
    def net_input(self, x):
        z = self.W.T.dot(x) # dot product between input and the weights matrix 
        return z
    
    def activation(self, x):
        return 1 if x >= 0.0 else -1 #we have a binary classification
    
    def predict (self, x):
        #z = self.W.T.dot(x) # dot product between input and the weights matrix 
        z = self.net_input(x)
        a = self.activation(z)
        return a
    
    def train(self, X, t): # X is inputs d are the targets
        """Fit training data.

        Parameters
        ----------
        X : {array-like}, shape = [n_samples, n_features]
          Training vectors, where n_samples is the number of samples and
          n_features is the number of features.
        t : array-like, shape = [n_samples]
          Target values.

        Returns
        -------
        self : object

        """
               
        for _ in range(self.epochs):
            sum_error = 0.0
            for i in range(t.shape[0]): # go through each instance
                x = np.insert(X[i], 0, 1) #remember to insert 1 for the bias input
                y = self.predict(x)
                error = t[i] - y
                self.W = self.W + self.lr * error * x
                sum_error += error**2 #this is the sum of squared error accumilated over all of the train_set
            pass
            self.E.append(sum_error)
        pass


### Toy dataset to test AND gates


In [None]:
inputs = np.array([[0,0], [0,1], [1,0], [1,1]]) # this is an input example for an AND gate
print("Binary inputs to test an AND gate:", inputs)
targets = np.array([-1,-1,-1,1])
print("Outputs from the AND gate should be:", targets)
p = perceptron(input_size=len(inputs[0]), epochs=10)
p.train(inputs, targets)
print("The weights learnt ",  p.W) # these are the weights that would model an AND gate with a perceptron


# Exercise  - logic gates
- How would you test if the above weights learnt by the perceptron correctly model an AND gate? Hint: You can create an instance and use the perceptron to make a prediction p.predict(x). But remember to augment the instance with the extra bias feature.
- Can the perceptron be used for OR gates? If, yes, you should now try the above for the OR gate. Simply add a cell in the jupyter notebook (+ button) and type in your code and execute that cell. 
- Perceptron cannot model an XOR gate; can you think why this might be the case?


## Now lets try the perceptron on the SONAR Data Classification 

In this use case, you have been provided with a SONAR data set which contains the data about 208 patterns obtained by bouncing sonar signals off a metal cylinder (naval mine) and a rock at various angles and under various conditions. Now, as you know, a naval mine is a self-contained explosive device placed in water to damage or destroy surface ships or submarines. So, our goal is to build a model that can predict whether the object is a naval mine or rock based on our data set. 
<img src="comic.png">

Now, let us have a look at our SONAR data set:

<img src="sonar.png">


Here, the overall fundamental procedure will be same as that of AND gate with few difference which will be discussed to avoid any confusion. 

Lets first read in the sonar training data which is stored as a csv file. Once we create the perceptron model we can test its accuracy on a the disjoint test set. 
For convinieice I have converted the class label "R" into integer 1 and "M" into integer 0. Yuo can explore the csv file in an excel spreadsheet. Note that class label is the last column.


## What does the Sonar data look like?

In [None]:
df = pd.read_csv('sonar-train.csv')
df.head() # show the first 5 rows
# note you can use tail to view the last 5 rows

## Sonar Data Exercise - Data View:
- You should try the tail() function on the sonar data
- another useful function is info(); again try this out and exploure the output (e.g. df.info()   )

In [None]:
# a more detailed view can be had with describe()
df.describe()

The Count, min and max rows are self-explanatory. std is standard deviation, which measures how dispersed the values are. The 25%, 50% and 75% rows show the corresponding percentiles:  apercentile indicates the value below which a given percentage of observations in a group of observations falls. For example , 25% of the instances have a V1 value lower than 0.013350 , while 50% are lower than 0.022650 and 75% are lower than 0.137100. These are often called 25th percentile (or 1st quartile), the median , and the 75th percentile (or 3rd quartile).


## Histograms of  Sonar data
We will try to visualise the data using a scatter plot
Note that we have 60 features in this data set so we need to choose 2 features for the x and y axis

In [None]:
# exploring features
df. hist(bins=50, figsize=(20,15))
plt.show

In [None]:
## Scatter plot of Sonar

In [None]:
#get the class labels for each instance
y = df.iloc[0:, [-1]].values
y = np.asfarray(y).flatten()
#print(y)
#X = df.iloc[0:, [0:59]]
#print(X)

count = np.where(y == 1, 1, 0).sum() # count of the number of instances that beong to class 1
#we can use coount as an index as the instances are sorted by the class label
#so if we want to access class=1, then they appear first.
#print(count)

# extract 2 features using their index location
f1 = 0
f2 = 10

X = df.iloc[0:, [f1, f2]].values

# plot data
plt.scatter(X[:count, 0], X[:count, 1], color='red', marker='o', label='1')
plt.scatter(X[count:, 0], X[count:, 1], color='blue', marker='x', label='-1')

plt.xlabel('feature1 ')
plt.ylabel('feature2 ')
plt.legend(loc='upper left')

## plt.savefig('images/02_06.png', dpi=300)
plt.show()


## Sona Data  Exercise - Scatter Plots
You can try out the code above for different feature combinations. 
Through the scatter plots you will notice that this is a hard data set to linearly separate with just two features.

In [None]:
#Now lets the read the train and test csv files to create and test our model
train = pd.read_csv('sonar-train.csv')
test = pd.read_csv('sonar-test.csv')
train_data_list = train.iloc[0:].values
print(train_data_list.shape)
test_data_list = test.iloc[0:].values
print(test_data_list.shape)

## Train the perceptron on the SONAR dataset

In [None]:
targets = [] #empty list to hold the class labels
inputs = [] # empty list to hold the instances

#targets = np.zeros(rows) # initialise the 1-dimensional array which will hold the class values
#inputs = np.zeros(shape=(rows,cols)) # initialise the 2-dimensional matrix which has the set of train data


for instance in train_data_list:
    # split it by the commas
    #all_values = instance#.split(',') 
    input = (np.asfarray(instance[:-1]) )# return all except the last element which is the target class label
    target = (np.asfarray(instance[-1:]) )
       
    inputs.append(input) # append to the list of instances
    targets.append(int(target)) # append to the list of targets and make sure classes are integers
    pass

# we will convert our lists into numpy array so its compatible with our perceptron class
# for this we use asarray function for this
inputs = (np.asarray(inputs)) 
targets = (np.asarray(targets).flatten())
print(len(inputs), len(inputs[0]), len(targets))
    
# create a new instance of the perceptron class and train it to generate the set of weights
p = perceptron(input_size=len(inputs[0]), epochs=100, lr=0.0001)
print("weights learnt are:")
p.train(inputs, targets) # works well with lr =0.01 and higher epochs like 1000

#print(p.W)
print(p.E)


## Plot the model error with increasing epochs

In [None]:
plt.figure(figsize=(20,5)) #width, height settings for figures
plt.plot(range(1, len(p.E) + 1), p.E, marker='o')
plt.xlabel('Epochs')
plt.ylabel('Number of updates')

# plt.savefig('images/02_07.png', dpi=300)
plt.show()

### Now lets Test the Perceptron on the SONAR test data
We will keep track of the predicted and actual outputs in order to calculate the accuracy of the perceptron on the unseen test data. 

In [None]:
#create an empty list called results to keep track of the network performance on each test instance
results = []
#print ("shape ", test_data_list.shape)
#print(all_values)

#go through all the test instances
for instance in test_data_list:
    #all_values = instance#.split(',')
    input = (np.asfarray(instance[:-1]) )# return all except the last element which is the target class label
    #print(len(input))
        
    target_label = np.asfarray(instance[-1:])
    #print("Correct label", int(target_label))
    
    #query the perceptron with the test input
    x = np.insert(input, 0, 1) #remember to add the bias value of 1 to the instance
    predict_label = p.predict(x)
    #print("Predicted class:", predict_label )
    
    #compute network error
    if (predict_label == target_label):
        results.append(1)
    else: 
        results.append(0)
        pass
    pass
        
#print network performance as an accuracy metric
results_array = np.asfarray(results)
print ("accuracy = ", results_array.sum() / results_array.size)

## Compare different learning rates

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))

p1 = perceptron(input_size=len(inputs[0]), epochs=20, lr=0.1)
p1.train(inputs, targets)
ax[0].plot(range(1, len(p1.E) + 1), np.log10(p1.E), marker='o')
ax[0].set_xlabel('Epochs')
ax[0].set_ylabel('log(Sum-squared-error)')
ax[0].set_title('Learning rate 0.01')

p2 = perceptron(input_size=len(inputs[0]), epochs=20, lr=0.0001)
p2.train(inputs, targets)
ax[1].plot(range(1, len(p2.E) + 1), p2.E, marker='o')
ax[1].set_xlabel('Epochs')
ax[1].set_ylabel('Sum-squared-error')
ax[1].set_title('Learning rate 0.0001')

# plt.savefig('images/02_11.png', dpi=300)
plt.show()

# Exercises - Perceptron hyperparameters
- Explore different learning rate values, and epochs. Do they make any improvement towards the final test accuracy?
- Try to identify another dataset from the Uni California Irvine (UCI) ML repository (ideally identify a binary dataset from http://mlr.cs.umass.edu/ml/datasets.html); which is where we obtained the Sonar dataset from. Explore how you might use the above code to apply the perceptron on that dataset.
How would you change the weight update steps to implement the Adaline weight update algorithm? Hint: Adaline uses the real values returned by the net_input method instead of the quantisation output returned from the activation method. 