## Neural Network implementation using only NumPy.

### Problem Statement: 
* In this project, we will learn the implementation of Neural Network from scratch using numpy.

### General Architecture:
- __Input Layer__:
    The input layer will have 2400 data points. 
- __Hidden Layer__:
    Typically, we start with a few hidden layer with ReLU activation function.
- __Output Layer__:
    This layer will have 3 units with Softmax activation function to output class probabilities.

### Objective:
* Create a dataset using `nnfs` library.
* Implement neural network architecture on the created dataset.
* Train the neural network using forward propagation.
* Evaluate the accuracy of the neural network. 

### Dataset:
* The dataset is created using `vertical_data` method from nnfs library.
* `vertical_data()` function generates or fetches a dataset.

**Import libraries**

In [1]:
import numpy as np
import nnfs
from nnfs.datasets import vertical_data


nnfs.init()

**Create a fully connected dense layer of neural network using `Layer_Dense` class.**

### Initialization:

* `n_inputs`: The number of input features to the layer.
* `n_neurons`: The number of neurons in the layer.
* `weights`: The weight matrix initialized with small random values. It has a shape of (n_inputs, n_neurons).
* `bias`: The bias vector initialized to zeros. It has a shape of (1, n_neurons).

### Forward Pass:

* `input`: The input to the layer, which is typically the output of the previous layer or the input data.
* `output`: The output of the layer, computed as the dot product of the input and the weights, plus the bias.
* The forward pass is essential for propagating the input through the network to produce an output.

In [2]:
class Layer_Dense:
    def __init__(self, n_inputs, n_neurons):
        self.weights= 0.10 * np.random.randn(n_inputs,n_neurons)  
        self.bias= np.zeros((1, n_neurons))

    def forward(self, input):
        self.output = np.dot(input, self.weights) + self.bias

**Implement ReLU activation function.**

* ReLU enables neural networks to learn and generalize complex patterns effectively.

* ReLU is defined as:

$$ReLU(𝑥)=max(0,𝑥)$$


* This means that the function outputs the given input itself, if it is positive; otherwise, the output is zero. 

In [3]:
class Activation_ReLu:
    def forward(self,input):
        self.output=np.maximum(0,input)

**Implement activation softmax function.**

* The softmax function is commonly used in the output layer of a neural network for classification tasks. 
* It converts raw output scores into probabilities, making it easier to interpret the output of the model.

\[
\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j=1}^{K} e^{x_j}}
\]

This formula ensures that the output is a probability distribution, i.e., the sum of all the output probabilities is 1, and each probability is between 0 and 1.









In [4]:
class Activation_Softmax:
    def forward(self,input):
        exp_value=np.exp(input-np.max(input, axis=1,keepdims=True))
        probability=exp_value/np.sum(exp_value,axis=1,keepdims=True)
        self.output=probability
        

**Loss calculation**

To quantify the difference between the predicted output of a neural network and the actual ground truth labels.

In [5]:
class Loss:
    def calculate(self,output,y):
        sample_losses=self.forward(output,y)
        data_loss=np.mean(sample_losses)
        return data_loss

**Implementing categorial cross-entropy loss**
It measures the dissimilarity between the predicted probability distribution (output of the model) and the true probability distribution (one-hot encoded target labels).

* The forward method takes two arguments: `y_pred` (predicted probabilities) and `y_true` (true class labels).
* It first clips the predicted probabilities to avoid numerical instability issues.
* In next step, it calculates the correct confidence scores based on the true class labels.
* Finally, it computes the negative log-likelihoods as the loss values and returns them.

In [6]:
class Loss_CategoricalCrossEntropy(Loss):
    def forward(self,y_pred,y_true):
        samples=len(y_pred)
        y_pred_clipped=np.clip(y_pred, 1e-7,1-1e-7)

        if len(y_true.shape)==1:
            correct_confidence=y_pred_clipped[range(samples),y_true]
        elif len(y_true.shape)==2:
            correct_confidence=np.sum(y_pred_clipped*y_true, axis=1)
        
        negative_log_liklihoods= -np.log(correct_confidence)
        return negative_log_liklihoods

**Generate sample dataset**

To create a dataset, we will use `vertical_data` method from `nnfs.datasets`, that generates a dataset. 


The function takes two arguments:
* `samples` : 800 - This specifies that the function should generate 800 samples in the dataset.
* `classes` : 3 - This specifies that the dataset should have 3 distinct classes or categories for the target labels `y`.


In [7]:
x,y = vertical_data(samples=800,classes=3)

In [8]:
# Visually, this is how our dataset looks like:

import pandas as pd
df = pd.DataFrame(x, columns=['Column1', 'Column2'])
df['Labels'] = y

rows, cols = df.shape
print(df.head())
print(f'\n{rows} Rows & {cols} Cols.')

    Column1   Column2  Labels
0  0.176405  0.641117       0
1  0.040016  0.578580       0
2  0.097874  0.494253       0
3  0.224089  0.460878       0
4  0.186756  0.594092       0

2400 Rows & 3 Cols.


**Define the architecture of neural network using layers and activation functions.**

In [9]:
dense1=Layer_Dense(2,3)
activation1=Activation_ReLu()

dense2=Layer_Dense(3,3)
activation2=Activation_Softmax()


In [10]:
loss_function=Loss_CategoricalCrossEntropy()

**Tracking Best Model Parameters**

In this part of the code, we are initializing variables to track the parameters of the best model found during training. These parameters include weights and biases of two dense layers (dense1 and dense2).
<br>
<br>
Also, we have initialized the loss with a large value (9999999) to ensure that any loss calculated during training will be lower than this initial value. 
<br>
<br>
This variable will be updated during training if a model with a lower loss is found.



In [11]:
lowest_loss=9999999
best_dense1_weights=dense1.weights.copy()
best_dense1_bias=dense1.bias.copy()
best_dense2_weights=dense2.weights.copy()
best_dense2_bias=dense2.bias.copy()

In [12]:
for iterations in range(1001):

    dense1.weights+=0.05 * np.random.randn(2,3)
    dense1.bias+=0.05 * np.random.randn(1,3)
    dense2.weights+=0.05 * np.random.randn(3,3)
    dense2.bias+=0.05 * np.random.randn(1,3)

    dense1.forward(x)
    activation1.forward(dense1.output)
    dense2.forward(activation1.output)
    activation2.forward(dense2.output)
    
    loss=loss_function.calculate(activation2.output,y)

    predictions=np.argmax(activation2.output,axis=1)
    accuracy=np.mean(predictions==y)

    if (iterations%50==0):
        print("iteration: ",iterations, " loss: ",loss, f"accuracy: {accuracy:.2f}")
        
    if loss<lowest_loss:
        best_dense1_weights=dense1.weights.copy()
        best_dense1_bias=dense1.bias.copy()
        best_dense2_weights=dense2.weights.copy()
        best_dense2_bias=dense2.bias.copy()
        lowest_loss=loss

    else:
        dense1.weights=best_dense1_weights.copy()
        dense1.bias=best_dense1_bias.copy()
        dense2.weights=best_dense2_weights.copy()
        dense2.bias=best_dense2_bias.copy()



iteration:  0  loss:  1.1013985 accuracy: 0.33
iteration:  50  loss:  1.0087012 accuracy: 0.62
iteration:  100  loss:  0.89863473 accuracy: 0.78
iteration:  150  loss:  0.7762941 accuracy: 0.67
iteration:  200  loss:  0.7016407 accuracy: 0.86
iteration:  250  loss:  0.5616108 accuracy: 0.78
iteration:  300  loss:  0.4245901 accuracy: 0.92
iteration:  350  loss:  0.40391943 accuracy: 0.89
iteration:  400  loss:  0.33655158 accuracy: 0.92
iteration:  450  loss:  0.2871132 accuracy: 0.93
iteration:  500  loss:  0.25195062 accuracy: 0.94
iteration:  550  loss:  0.22821032 accuracy: 0.94
iteration:  600  loss:  0.20495217 accuracy: 0.93
iteration:  650  loss:  0.18344648 accuracy: 0.93
iteration:  700  loss:  0.17568047 accuracy: 0.94
iteration:  750  loss:  0.18093361 accuracy: 0.93
iteration:  800  loss:  0.18032336 accuracy: 0.94
iteration:  850  loss:  0.16782731 accuracy: 0.93
iteration:  900  loss:  0.16415131 accuracy: 0.94
iteration:  950  loss:  0.16705398 accuracy: 0.93
iteration:

~94% Accuracy achieved.