<h1 align = center><font color = '#642AAE'>Neural Networks Basics (Perceptron, Activation Functions) </h1>


#### What are Neural Networks ??


Neural networks are computational models inspired by the structure and function of the human brain. They consist of interconnected nodes, or neurons, that process input data to produce an output. The main components of neural networks include:

1. Input Layer: Receives the input data and passes it to the hidden layers.

2. Hidden Layers: Intermediate layers that process the input data and generate intermediate representations. They can have multiple neurons and can include activation functions to introduce non-linearity and complex patterns.

3. Output Layer: Processes the generated intermediate representations and produces the final output.

Neural networks are designed to learn and adapt to complex patterns in data, making them powerful tools for various applications such as image recognition, natural language processing, and recommendation systems.


<img src = 'NN.png' alt = "Neural Networks">


#### What is Perceptron ?

A perceptron is a simple neural network model that consists of a single layer of neurons. It is a linear classifier, meaning it can only classify data into two classes (e.g., binary classification). The perceptron is based on the McCulloch-Pitts neuron model, which was introduced by George W. McCulloch and Walter Pitts in 1943.

The perceptron algorithm is an iterative learning algorithm used to train a linear classifier. It follows these steps:

1. Initialize the weights and bias randomly.

2. For each training example:

   a. Calculate the predicted output using the current weights and bias.

   b. Compare the predicted output with the actual output.

   c. Update the weights and bias based on the difference between the predicted output and the actual output.

   d. Repeat steps a-c until the desired accuracy is achieved or a maximum number of iterations is reached.

   e. After training, the perceptron can be used to make predictions on new input data.

   f. The perceptron can also be trained using gradient descent, which updates the weights and bias iteratively to minimize the error between the predicted output and the actual output.

   g. The perceptron can be used for binary classification problems, where the output is either 0 or 1.

<img src = 'perceptron.jpg' alt = 'Perceptron'>

#### Activation Functions in Neural Networks

Activation functions are used to introduce non-linearity and complex patterns into the neural network. There are various activation functions available, such as:

1. Sigmoid Function: S-shaped curve that outputs values between 0 and 1. The sigmoid function is widely used in binary classification problems.

2. Hyperbolic Tangent (tanh) Function: S-shaped curve that outputs values between -1 and 1. The tanh function is similar to the sigmoid function but has a better gradient for training neural networks.

3. Rectified Linear Unit (ReLU) Function: Outputs the input value if it is positive, or 0 otherwise. The ReLU function is widely used in neural networks due to its simplicity and efficiency.

4. Leaky ReLU Function: Outputs a small negative value (e.g., 0.01) for negative inputs, and the input value for positive inputs. The leaky ReLU function helps mitigate the vanishing gradient problem in neural networks.

5. Exponential Linear Unit (ELU) Function: Outputs the input value if it is positive, or an exponential of the input value minus 1 for negative inputs. The ELU function has a better initialization and stability compared to ReLU functions.

6. Parametric ReLU Function: Outputs a configurable parameter (e.g., alpha) times the input value if it is positive, or 0 otherwise. The Parametric ReLU function can help improve the performance of neural networks by allowing the network to learn different activation functions for different parts of the input.

<img src = 'activationFunction.png' alt='Activation Functions'>

By using appropriate activation functions, neural networks can model complex patterns and achieve better performance in various applications, such as image recognition, natural language processing, and recommendation systems.








<h2 align = center> <font color ='#EEFF32'>Implementing Neural Network For Regression Problem From Scratch


### Importing Important Libraries

In [111]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import StandardScaler




### Step 1 : Initializing The Weights And Biases

W1 matrix will be initialized with random weights and of the shape of hidden_layer_size * input_layer_size and then scalling with 0.01 

W2 matrix will be initialized with random weights and of the shape of hidden_layer_size * output_layer_size and then scalling with 0.01 

b1 matrix initialized with 0 and of the shape (hidden_layer_size,1)

b2 matrix initialized with 0 and of the shape (output_layer_size,1)





In [93]:

def sigmoid(x):
    return(1/(1 + np.exp(-x)))

def parameter_initialization(input_layer_size,hidden_layer_size,output_layer_size):
    np.random.seed(10)

    W1 = np.random.randn(hidden_layer_size,input_layer_size)*0.01
    b1 = np.zeros((hidden_layer_size,1))
    W2 = np.random.randn(output_layer_size,hidden_layer_size)*0.01
    b2 = np.zeros((output_layer_size,1))
    
    return W1, b1, W2, b2




Mathematics

W1∈R 
hidden_size×input_size
 

b1∈R 
hidden_size×1
 

W2∈R 
output_size×hidden_size
 

b2∈R 
output_size×1

### Step 2 : Defining Forward Propagation

#### What is Forward Pass Or Forward Propagation ?

Forward propagation (or forward pass) refers to the calculation and storage of intermediate variables (including outputs) for a neural network in order from the input layer to the output layer.


In [200]:
def forward_propagation(data , W1, b1 , W2 ,b2):
    Z1 = np.dot(W1, data) + b1

    A1 = np.tanh(Z1)

    Z2 = np.dot(W2,A1) + b2

    return Z1,A1,Z2

    



The forward pass function takes the input X, the weight matrices W1 and W2, and the bias vectors b1 and b2 as inputs.

Input Layer:The input matrix X is the input layer.

Hidden Layer: The activations A1 represent the hidden layer. This is the layer between the input layer and the output layer, where the non-linear transformation of the data occurs.
The hidden layer takes the linear transformation Z1 and applies the activation function np.tanh() to introduce non-linearity.
The hidden layer learns to extract and represent meaningful features from the input data.

Output Layer: The activations Z2 represent the output layer.
The output layer takes the activated values A1 from the hidden layer and applies another linear transformation to produce the final output Z2.

### Step 3 : Calculate the loss (Mean Squared Error).

In [191]:
def computing_loss(y, y_pred):
    m = y.shape[0]
    loss = np.sum((y_pred - y) ** 2) / (2 * m)
    return loss

### Step 4 : Backward Propagation to compute gradients.

#### What is backward propagation ?

Backward propagation (or backward pass) is the process of calculating the gradients of the loss with respect to the weights and biases for each layer in the neural network. This is crucial for updating the weights and biases to minimize the loss.




In [193]:
def backward_propagation (X , Y , Z1 , A1, Z2, W2):
    m = X.shape[1]

    # Output Layer Gradient
    dZ2 = Z2 - Y
    dW2 = np.dot(dZ2,A1.T)/m
    db2 = np.sum(dZ2,axis=1,keepdims=True)/m

    # Hidden Layer Gradient
    dA1 = np.dot(W2.T,dZ2)
    dZ1 = dA1 * (1 -  np.tanh(Z1)** 2 )
    dW1 = np.dot(dZ1,X.T)/m
    db1 = np.sum(dZ1,axis=1,keepdims=True)/m
    return dW1, db1, dW2, db2



### Explaining Each Variable:

dZ2 = Difference between predicted output and true output

dW2 = Gradient of cost function with respect to weights in the output layer

db2 = Gradient of cost function with respect to biases in the output layer

dA1 = Activation function derivative of hidden layer

dZ1 = Derivative of linear transformation of hidden layer

dW1 = Gradient of cost function with respect to weights in the hidden layer

db1 = Gradient of cost function with respect to biases in the hidden layer



### Step 5 : Update the weights using gradient descent.

In [13]:
def update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate):
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    return W1, b1, W2, b2

<h2 align = center> <font color = '#1128FF'>Now Merging All the above fucntions together to make a train function

In [201]:
def trainNN(X,y,input_size,hidden_size,output_size,epochs,learning_rate):
    W1, b1, W2, b2 = parameter_initialization(input_size, hidden_size, output_size)
    for i in range(epochs):
        Z1, A1, Z2 = forward_propagation(X, W1, b1, W2, b2)
        loss = computing_loss(y, Z2)

        dW1 , db1, dW2 , db2 = backward_propagation(X,y, Z1, A1, Z2,W2)

        W1, b1, W2, b2 = update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)

        if i % 100 == 0:
            print(f'Epoch {i}, Loss: {loss}')

    return W1, b1, W2, b2

### Making Prediction Function



In [202]:
def predict (X , W1 , b1, W2, b2):
    a,b,Z2 = forward_propagation(X,W1,b1,W2,b2)
    return Z2


### Importing Data

In [176]:
from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
liver_disorders = fetch_ucirepo(id=60) 

X = liver_disorders.data.features.values.T
y = liver_disorders.data.targets.values.T





### Now Using The TrainNN Function

In [264]:
print(X.shape)
print(y.shape)

# Splitting Data
X_train, X_test, y_train, y_test = train_test_split(X.T, y.T, test_size=0.2, random_state=42)
X_train = X_train.T
X_test = X_test.T
y_train = y_train.T
y_test = y_test.T


# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train.T).T
X_test = scaler.transform(X_test.T).T



# Setting Up Hyperparameters For Neural Network
input_size = X_train.shape[0]
hidden_size = 2
output_size = y_train.shape[0]
epochs = 40000
learning_rate = 0.001

W1, b1, W2, b2 = trainNN(X_train, y_train, input_size, hidden_size, output_size, epochs, learning_rate)


(5, 345)
(1, 345)
Epoch 0, Loss: 3189.134682082662
Epoch 100, Loss: 2891.620406919029
Epoch 200, Loss: 2648.03138641206
Epoch 300, Loss: 2448.550457700655
Epoch 400, Loss: 2285.129360462447
Epoch 500, Loss: 2151.1565648462324
Epoch 600, Loss: 2041.1842600589976
Epoch 700, Loss: 1950.7018408070421
Epoch 800, Loss: 1875.9475909381674
Epoch 900, Loss: 1813.7546894210132
Epoch 1000, Loss: 1761.4329629522247
Epoch 1100, Loss: 1716.6924277249059
Epoch 1200, Loss: 1677.6133871357133
Epoch 1300, Loss: 1642.6546320468306
Epoch 1400, Loss: 1610.67016603724
Epoch 1500, Loss: 1580.8963440414582
Epoch 1600, Loss: 1552.8914108531712
Epoch 1700, Loss: 1526.4425552993528
Epoch 1800, Loss: 1501.4716120995954
Epoch 1900, Loss: 1477.9625375948467
Epoch 2000, Loss: 1455.9171827069995
Epoch 2100, Loss: 1435.334187474295
Epoch 2200, Loss: 1416.2019291002468
Epoch 2300, Loss: 1398.4977169372523
Epoch 2400, Loss: 1382.1885564346944
Epoch 2500, Loss: 1367.231651503653
Epoch 2600, Loss: 1353.574531704228
Epoch 

### Evaluating Model

In [265]:

y_pred = predict(X_test, W1, b1, W2, b2)

# Evaluate the model
y_test_flat = y_test.flatten()
y_pred_flat = y_pred.flatten()

mse = mean_squared_error(y_test_flat, y_pred_flat)
mae = mean_absolute_error(y_test_flat, y_pred_flat)
r2 = r2_score(y_test_flat, y_pred_flat)

print(f"Mean Squared Error: {mse}")
print(f"Mean Absolute Error: {mae}")
print(f"R-squared: {r2}")


Mean Squared Error: 9.142097812255336
Mean Absolute Error: 2.4152657013875922
R-squared: 0.14116106741093226


<h2 align = center> <font color = "#FFA009"> Conclusion </font></h2>

The model evaluation shows that the model score is 14% which can be further improved by setting the hyperparameters.