# What is a Perceptron
A perceptron is a building block of neural network. 

A single perceptron is capable to classify an input into two classes (Binary classifier i-e 0 / 1).
![A perceptron looks like this.](./perceptron_node.png "Perceptron")

## Different parts of a perceptron:

1. Input features (X) from dataset.
2. Weights (W), one for each input feature and a bias B.
3. A Net input Function.
4. Activation Function to normalize the values between (0 - 1).
5. Output (0/1 or Yes/No or Dog/ Cat etc.)



## Sonar Dataset

Each pattern is a set of 60 numbers in the range 0.0 to 1.0. Each number represents the energy within a particular frequency band, integrated over a certain period of time. The integration aperture for higher frequencies occur later in time, since these frequencies are transmitted later during the chirp.

The label associated with each record contains the letter "R" if the object is a rock and "M" if it is a mine (metal cylinder). The numbers in the labels are in increasing order of aspect angle, but they do not encode the angle directly.

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

In [2]:
columns = [i for i in range(1,61)]
columns.append("label")
df = pd.read_csv("sonar.all-data",delimiter = ",",names = columns,header = None)

In [3]:
df.head()

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,52,53,54,55,56,57,58,59,60,label
0,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044,R
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.018,0.0244,0.0316,0.0164,0.0095,0.0078,R
3,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.015,0.0085,0.0073,0.005,0.0044,0.004,0.0117,R
4,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.011,0.0015,0.0072,0.0048,0.0107,0.0094,R


Replacing R with 0 and M with 1 as our perceptron can only deal with numbers.

In [4]:
df["label"].replace({'R': 0, 'M': 1},inplace = True)

In [5]:
X_train, X_test, y_train, y_test = train_test_split(df[columns[:-1]], df[columns[-1]], test_size=0.33, random_state=42)
#to avoid class invalance we use train_test_split() fun as in data set first it has R data then M 

In [6]:
S1.jpg
￼
￼
125%
￼
￼
￼
￼
￼
￼
X_train.head()

SyntaxError: invalid character '￼' (U+FFFC) (2470327263.py, line 2)

In [6]:
y_train.head() 

28     0
42     0
79     0
97     1
142    1
Name: label, dtype: int64

In [7]:
X_train = np.array(X_train)
X_test = np.array(X_test)
y_train = np.array(y_train)
y_test = np.array(y_test)

X_train.shape

(139, 60)

## Weights
Let's initialize Weights for each input feature. We have 60 features so we need to define 60 wieghts.

In [8]:
W = np.random.rand(60,1)
W.shape

(60, 1)

## Bias
Let's initialize Bias.


In [9]:
b = np.random.rand()
b

0.4924237103978041

## Forward Pass
Forward pass contains two steps:
1. **Net Input Function:** where we multiply each feature x with it's corresponding weight w and then sum all of the resulting values to get a single value Z. 
2. **Activation Function:** Applying activaton funciton on Z.


### Net Input Function
We have to comput the Net Input Function for all the training samples.


![Net input function.](./NIF.png "Net input function")


In [10]:
X_train = X_train.T
X_train.shape

(60, 139)

In [11]:
X_train.shape
# X_train[1]

(60, 139)

In [12]:
numOfTrainSamples = X_train.shape[1]
numOfFeatures = X_train.shape[0]
print(numOfTrainSamples,numOfFeatures)
Z = np.zeros(numOfTrainSamples)

for i in range(numOfTrainSamples):
    for j in range(numOfFeatures): 
        z = float(X_train[j][i] * W[j])
        Z[i] = Z[i]+z
    Z[i] = Z[i] + b
    

139 60


In [13]:
len(Z)

139

In [14]:
Z[:5]

array([ 9.26226767, 10.6545098 ,  9.50383522,  9.41165453, 11.21489741])

Same net input function can be computed in an optimized manner by using vectorized code.


In [15]:
W.shape

(60, 1)

In [16]:
X_train.shape

(60, 139)

In [17]:
Z = np.dot(W.T,X_train,) + b

In [18]:
Z.shape

(1, 139)

In [19]:
Z[0,:5]

array([ 9.26226767, 10.6545098 ,  9.50383522,  9.41165453, 11.21489741])

### Activation Funciton
We apply activation function to normalize the output values between 0 and 1.
Most commonly used Activation Functions are:
1. Sigmoid
2. Relu
3. Leaky Relu
4. tanh and more

We will use sigmoid for our example.


![Sigmoid function.](./sigmoid.png "Sigmoid Function")

In [21]:
def sigmoid(z):
    return 1/(1 + np.exp(-z))

In [22]:
A = [sigmoid(z) for z in Z[0]]

In [23]:
A

[0.9997356873995312,
 0.9999861054164071,
 0.9994591971727084,
 0.9999339313868731,
 0.9999795746515352,
 0.999621508982769,
 0.9999448960043449,
 0.9999791035387847,
 0.9998052154216333,
 0.9997872309976601,
 0.9999042565705499,
 0.9995023980683508,
 0.9999487569840759,
 0.9995245845628659,
 0.9997782096520852,
 0.9998967449401529,
 0.9997364204506219,
 0.9999934606670113,
 0.999641318103894,
 0.9999730158040315,
 0.9999389604810538,
 0.9999402772381332,
 0.9999720217245707,
 0.9999531813116801,
 0.9999762621205327,
 0.9998878038077287,
 0.9999170573176861,
 0.9998716999777376,
 0.9999839982411639,
 0.9999521671184329,
 0.9999764491447412,
 0.999872499210705,
 0.9995927207007702,
 0.9997714153059643,
 0.9999576442414376,
 0.999915539394716,
 0.9997868640206787,
 0.9999885398443222,
 0.999804055519827,
 0.9999859086392181,
 0.9998448492719584,
 0.999846559101364,
 0.9999203584252668,
 0.9996774665226493,
 0.9999030321284549,
 0.9987652654853588,
 0.9999163678508142,
 0.9996982495208611

More optimized way

In [24]:
A = sigmoid(Z)

In [25]:
A[0,:5]

array([0.99973569, 0.99998611, 0.9994592 , 0.99993393, 0.99997957])

### What's Next?
We have computed the output values, now what to do with them? We need the perceptron to answer in Rock / Mine or in other words 0 / 1.
We need to apply a threshold on the output values. In most cases a threshold of 0.5 is used. All the output values greater than 0.5 will be considered as 1 and less than 0.5 will be considered as 0.

In [26]:
A = np.where(A < 0.5, 0, 1)

In [27]:
A

array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1]])

In [28]:
y_train

array([0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1,
       0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1,
       1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0,
       0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1,
       1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0,
       0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0,
       1, 0, 1, 0, 0, 1, 1])

In [29]:
print(y_train.shape)
print(A.shape)

(139,)
(1, 139)


In [30]:
y_train = np.expand_dims(y_train,axis =0)

In [31]:
y_train.shapeS1.jpg￼
￼
125%
￼
￼
￼
￼
￼


SyntaxError: invalid character '￼' (U+FFFC) (2055217024.py, line 1)

## Output Analysis
Our perceptron has not properly categorized the input. We have a lot of errors in it. Let's correct our perceptron.


## Back Propagation
In Back propagation we compute errors / loss/ cost using a loss function and then tell each weight that how much it has contributed in the error which is done by taking partial derivative of Loss function with respect to each weight. 

### Error Functions:
1. Mean Error Loss
2. Mean Squared Error 
3. Mean Absolute Error
4. Mean Squared Logarithmic Error Loss (MSLE)
5. Mean Percentage Error
6. Mean Absolute Percentage Error
7. Binary Classification Losses Binary Cross Entropy
8. Multi-Class Cross-Entropy
9. Squared Hinge Loss
10. Hinge Loss


For our example we will be using Binary Cross Entropy Loss:
![Binary cross entropy loss.](./loss.png "Loss function")

In [32]:
def binary_cross_entropy(A, Y):
    return -(Y * np.log(A) + (1 - Y) * np.log(1 - A)).mean()

In [33]:
J = binary_cross_entropy(A, y_train)

  return -(Y * np.log(A) + (1 - Y) * np.log(1 - A)).mean()
  return -(Y * np.log(A) + (1 - Y) * np.log(1 - A)).mean()


Our implementation of loss function cannot handle log of 0 which is equal to 1 ( log(0) = 1 ), that's why we will use library function for now.

In [34]:
from sklearn.metrics import log_loss
J = log_loss(y_train,A)

In [35]:
J

360.21659711854045

## Computing Gradients/ Slopes/ Derivatives
Below are the partial derivatives of Loss function.

![dz.](./dz.png "dz")

![dw.](./dw.png "dw")

![db.](./db.png "db")

In [36]:
dz = A - y_train

In [51]:
dz

array([[1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0,
        1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0,
        0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1,
        1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0,
        0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1,
        1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1,
        0, 1, 0, 1, 1, 0, 0]])

In [56]:
A[0][0] , y_train[0][0]

(1, 0)

In [37]:
X_train.shape

(60, 139)

In [38]:
dz.shape

(1, 139)

We need to compute derivative of each weight for each input.

In [39]:
dw = np.zeros(len(W))
for i in range(len(W)):
    for j in range(X_train.shape[1]):
        #print(str(i)+ " "+ str(j))
        #print(X_train[i][j])
        dw[i] = dw[i] + dz[0][j]*X_train[i][j]
    dw[i] = dw[i]/X_train.shape[1]

In [40]:
dw[:5]

array([0.0104705 , 0.0154964 , 0.01788489, 0.02003237, 0.03098993])

In [41]:
numOfTrainSamples

139

More optimized way

In [42]:
dw =  np.dot(X_train,dz.T)/numOfTrainSamples

In [43]:
dw[:5]

array([[0.0104705 ],
       [0.0154964 ],
       [0.01788489],
       [0.02003237],
       [0.03098993]])

For bias we need just need the mean of sum of all dz.


In [44]:
db = np.sum(dz,axis =1)/numOfTrainSamples

## Gradient Desent Step
Now we will update all the weights according to their slopes.

**Learning Rate (alpha)**
alpha is used to control the gradients, if we keep the alpha too high our gradients will diverge from minimum and if we take the alpha too low, the gradients will converge to minimum slowly.

alpha range [0,1]

let's suppose alpha is 0.001

### Update formulas for weight and bias

![w_update.](./w_update.png "w_update")


![b_update.](./b_update.png "b_update")

In [45]:
alpha = 0.001

In [46]:
W = W - alpha * dw

In [47]:
b = b - alpha *db

## Epoch
1 Forward and 1 Backward pass is known as 1 epoch.

## Task
1. Write code to perform N number of epochs until the loss gets close to zero.
2. Compute the loss after each epcoh using sklearn loss function.
3. Once the perceptron gets trained, test the trained perceptron on testing data and report test accuracy, confusion matrix.
4. Try different values of alpha and see how it affects the training process.
5. Use the above vectorized code to make 2 layer Neural Network. 1st layer will contain 2 Perceptrons and last layer will contain 1 perceptron. See how it affects the performance using accuracy and confusion matrix.


In [134]:
from sklearn.metrics import log_loss

W = np.random.rand(60,1)
W.shape

b = np.random.rand()
b

for n in range(0,1):
    # Net input function output
    Z = np.dot(W.T,X_train,) + b
    
    # Activation function 
    A = sigmoid(Z)
    
    # To change values to int
    A = np.where(A < 0.5, 0, 1)
    
    
    y_train = np.expand_dims(y_train,axis =0)
    print(y_train.shape,A.shape)
#     J = log_loss(y_train,A)
    


(1, 1, 1, 1, 1, 139) (1, 139)
