In [1]:
import numpy as np # linear algebra

# Neural Network Basics

This is my personal documentation of my journey of learning the basics of Neural Networks.

Thanks to: [sentdex](https://www.youtube.com/@sentdex) for his lessons in NNFS (Neural Networks from Scratch)

#### First, lets import his module for accessing his datasets

In [2]:
!pip install nnfs

Collecting nnfs
  Downloading nnfs-0.5.1-py3-none-any.whl (9.1 kB)
Installing collected packages: nnfs
Successfully installed nnfs-0.5.1


In [3]:
import nnfs

An example of a 2-D array of inputs:

In [4]:
X = [[1, 2, 3, 2.5],
     [2.0, 5.0, -1.0, 2.0],
     [-1.5, 2.7, 3.3, -0,8]]

## Neural Network:
Okay, so here is a picture of a Neural Network for context:

![](https://victorzhou.com/media/nn-series/network.svg)

It consists of:

* **Neurons:** Those circles are called neurons.
    It consists of:
    * **Baises:** Values added (+) to the output.
    * **Activation Function:** A function basically deciphers a neuron's importance in making the output decision.
    
        
* **Layers:**
    * Input Layer: The first layer where the model recieves its inputs (in blue)
    * Hidden Layers: The middle two(or however many) consisting of 6 neurons(in this example) each (in black)
    * Output Layer: The last layer at the end, which is the output of the model (in green)
    
    
* **Weights:** The various lines which you see that joins each neuron of a particular layer to its neighbouring layers are called weights. They are multiplied with the previous layer's output.    


### The Neural Network Equation:


![](https://pub.mdpi-res.com/universe/universe-08-00120/article_deploy/html/images/universe-08-00120-g001-550.jpg?1645603658)



Very similar to the equation: y = mx + b
Where,
* **y** = output
* **m** = weights
* **x** = inputs
* **b** = bias

What it basically does is that it takes in the **inputs** from the **previous neuron**, **multiply them** with the **weights**, or the **connections(lines)**, and once it **reaches the current layer's neuron**, if and when it goes through the **activation function**, the **neuron's bias** gets added and then it **outputs** it to **the next layer**.

### Lets create our code for a Layer

**Layer Dense**

First, we are defining the weights:
* **Weights** will be an array of **random numbers** of the dimension: **(number of inputs, n_neurons)** since **we need weights of each and every input per neuron.**

We will also **normalize** the weights by **multiplying it with a number(n) i.e. 1>n>0**, here lets take 0.10
> self.weight = 0.10 * np.random.rand(n_inputs, n_neurons)

Next, we will define the biases:

* **Biases** will be (for now), **an array of zeroes** for **each neuron**, so the **dimension** will be: **(1, number of neurons)**

Why? 
Well, if the number of the biases are too big, it **might just explode** by the **time it comes to the output layer**. We **don't** want very big numbers because **they might explode and become incomputable**, hence we will keep the biases 0 for now.
> self.biases = np.zeroes((1, n_neurons))

**Drawback of having 0 biases:**

If **incase** the **sum of an equation for a neuron becomes 0** because of our bias (i.e. the value being added in the equation) is zero, the **input for the next layer will be 0**, and so will be its output, since anything multiplied by a 0 is 0. 

**We do not want** that **since that will create a ripple effect of outputs of 0 for each layer** and we will be left with something called a **dead network**.

So, if we have a **dead network**, try changing the biases to a **non-zero number**.

##### **Forward method of Layer Dense:**

It will simply output the dotproduct i.e. (y = mx) + b for each neuron, hence:
> self.output = np.dot(inputs, self.weight) + self.biases

In [5]:
class Layer_Dense:
    def __init__(self, n_inputs, n_neurons):
        self.weight = 0.10 * np.random.rand(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))
        
    def forward(self, inputs):
        self.output = np.dot(inputs, self.weight) + self.biases      

## Activation Function

Each neuron **should** consist of an activation function.

**Activation Function:** A function which basically **decides whether** a **neuron should be activated or not** based on its **weighted sum of inputs**. The **choice of activation function** affects the network's ability to learn **complex patters** and **relationships** in data.

For our example, we will be using **ReLU (Rectified Linear Unit)** Function.

### Rectified Linear Unit (ReLU) function

![](https://assets-global.website-files.com/5d7b77b063a9066d83e1209c/60d24d1ac2cc1ded69730feb_relu.jpg)

**ReLU function:**

> f(x) = max(0, x)

It basically **assings a value** of **0 (unactivated)** or **non-zero 'x' (activated)** to the **weighted sum of a neuron's inputs**.

In [6]:
class Activation_ReLU:
    def forward(self, inputs):
        self.output = np.maximum(0, inputs)

Next we will make an **Softmax Activation** for our output layer

## Softmax Activation

![](https://miro.medium.com/v2/resize:fit:1232/0*GxuMPOpGsMoN5RwI)

Softmax Activation function: 

![](https://docs-assets.developer.apple.com/published/c2185dfdcf/0ab139bc-3ff6-49d2-8b36-dcc98ef31102.png)

## Softmax vs Sigmoid

Softmax curve might be looking like the sigmoid curve, but there is quite a bit of difference: 

![](https://i.stack.imgur.com/iJ6vX.png)

## What is the softmax function doing?

Basically, it is putting the output values (output of the model) through:
* The exponential function i.e. -> y =e^x
* Then normalizing it

**Remember:** The **inputs** here are the **outputs of our model/output layer**, hence it will be in **batch form** (since we will have a whole dataset of inputs for any given model, in most cases).



* First we will exponentiate each value subtracted by the max value in that batch.

> y = e^(x - max(x))

So:

> exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True)

#### Why are we doing that?


What subtracting each value with that array's maximum value will basically do is:
* Turn the **highest number** in that array into a **0**
* **Rest of the numbers** will be **less than 0**

Then when we exponentiate it:
* The **max number, previously 0**, will **become 1** since **e^0 = 1.0**
* All the **other numbers** in the array will be a **number less than 1** but **more than 0** (classification heaven: 0-1)
* All other numbers(n) in the array will be: **1>n>0**

In the following code: 
> exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True)

**Remember, since:**

* **we want the max of each respective array**, and **not the entire array**, we **do axis=1**.
* And **keepdims=True** to **output the values** in the **same dimensions as the input**.

### Next, in the softmax function:

* **Dividing the exponential values** by the **sum of the exponential values of respective arrays**

**Why?**

So that we have a **normalized set of values for our outputs**, which will **help us achieve** our **goal of creating a neural network** which **works well with a multiple-classification problem**.


In [7]:
class Activation_Softmax:
    def forward(self, inputs):
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
        self.output = probabilities

Now, will **input a dataset** which basically is **visually spiral** when **scatter plotted**.

Something like:

![](https://telesens.co/wp-content/uploads/2017/09/img_59cbd07be1178.png)

Link to spiral dataset code: https://github.com/Sentdex/nnfs/blob/master/nnfs/datasets/spiral.py

### Training Spiral Data:

Lets create training data in a **spiral format!**

In [8]:
from nnfs.datasets import spiral_data

X, y = spiral_data(samples=100, classes=3)

#### For starters, lets create a **basic input layer**:

In [9]:
basic_dense1 = Layer_Dense(2,3)

In [10]:
print(basic_dense1)

<__main__.Layer_Dense object at 0x78760d174df0>


#### Now, we create an **activation function** for basic_dense1

In [11]:
basic_activation1 = Activation_ReLU()

#### Now, lets create the output layer:

With **softmax activation**.

In [12]:
basic_dense2 = Layer_Dense(3, 3)
basic_activation2 = Activation_Softmax()

#### Lets test it out!

In [13]:
basic_dense1.forward(X)
basic_activation1.forward(basic_dense1.output)

basic_dense2.forward(basic_activation1.output)
basic_activation2.forward(basic_dense2.output)

In [14]:
print(basic_activation2.output[:5])

[[0.33333333 0.33333333 0.33333333]
 [0.3333352  0.333324   0.33334079]
 [0.33334128 0.33331956 0.33333916]
 [0.33334248 0.33330637 0.33335114]
 [0.33334924 0.33330584 0.33334492]]


## Everything is working great!

That was our first ever neural network! Congrats!

3.12.2023

# **Calculating Loss**

## **Categorical Cross-Entropy Loss**

Here is the function in mathematical notation:

![](https://androidkt.com/wp-content/uploads/2023/05/Selection_098.png)

#### **Steps for calculating Categorical Cross-Entropy Loss (also called log-loss):**

For examples sake lets say the following is the output of our output layer after going through the softmax function:

In [15]:
softmax_output_eg = [0.7, 0.2, 0.1]

And the following is the target output for the same:

In [16]:
target_output_eg = [1, 0, 0]

**The Categorical Cross-Entropy will calculate it as:** (For one-hot encoding outputs)

> target value * log(predicted output)

for each indices. 

Sum each indices and make the entire equation a negative (-).

In [17]:
import math

loss_eg = -(math.log(softmax_output_eg[0])*target_output_eg[0] +
            math.log(softmax_output_eg[1])*target_output_eg[1] +
            math.log(softmax_output_eg[2])*target_output_eg[2])

print(f'Categorical Cross-Entropy Loss: {loss_eg}')

Categorical Cross-Entropy Loss: 0.35667494393873245


Since the **product** of the **indices for which the target value is 0** will **result in a 0**, **the actual sum really sums down to:**

In [18]:
loss_eg = -(math.log(softmax_output_eg[0]))

print(f'Categorical Cross-Entropy Loss: {loss_eg}')

Categorical Cross-Entropy Loss: 0.35667494393873245


We sucessfully calculated **Categorical Cross-Entropy Loss**, manually!

### Now, we know that we will be making predictions in batch, and 



#### How to **implement loss** where the **output** is **sparse** **rather than one-hot encoded** when in batches:

Lets say our softmax outputs from the softmax activation function are in batches like so:

In [19]:
softmax_output_eg2 = [[0.7, 0.2, 0.1],
                      [0.1, 0.5, 0.4],
                      [0.02, 0.9, 0.08]]

But, **our class targets are in a 1D vector** with the **following map**:

* 0: dog
* 1: cat
* 2: human

Our **sparse target vector** would be:

In [20]:
class_targets_eg2 = [0, 1, 1]

So basically, in the case of sparse target vector, what the vector is basically saying is:

**For the softmax output in index one**, it is **concerned with the 0th element of that list**, i.e. **in our case 0.7**

**Second,** it is **concerned with the 1st element of that list**, i.e. **in our case: 0.5**

Similarly, **third is 0.9**

In [21]:
# Explaining correspondence. Not meant to be run!

softmax_output_eg2 = [[0.7, 0.2, 0.1],                  class_targets_eg2 = [0, 
                     [0.1, 0.5, 0.4],                                        1, 
                     [0.02, 0.9, 0.08]]                                      1]

SyntaxError: invalid syntax. Maybe you meant '==' or ':=' instead of '='? (239927523.py, line 3)

The above code explains the correspondence. 

One way we can get the values is by:

In [None]:
for targ_ids, distribution in zip(class_targets_eg2, softmax_output_eg2):
    print(distribution[targ_ids])

We get exactly what we stated before, the values we are interested in for calculating loss.

#### **But there is a better and more efficient way to do it with NumPy:**

First, just make turn our softmax outputs into a NumPy array

Then:

> print(softmax_output_eg2[[0, 1, 2], class_targets_eg2])

What the above code is basically doing is a NumPy array feature where:

We are basically calling the values of our interested outputs by their indices.

In [None]:
softmax_output_eg2 = np.array([[0.7, 0.2, 0.1],
                               [0.1, 0.5, 0.4],
                               [0.02, 0.9, 0.08]])

print(softmax_output_eg2[[0, 1, 2], class_targets_eg2])

**Nice and clean!**

We can do one more thing for making it future proof.

I.e. **rather than hard-coding the range of our softmax output, we can just call the range of it**:

In [None]:
print(softmax_output_eg2[range(len(softmax_output_eg2)), class_targets_eg2])

And now, **the final touch**. Since **Categorical Cross Entropy is the negative log of the target classes confidence**, we get:

In [None]:

neg_log_eg2 = -np.log(softmax_output_eg2[
    range(len(softmax_output_eg2)), class_targets_eg2
])
print(neg_log_eg2)

We get our losses for each class!

Now, **to calculate the loss of the whole batch**, we just **mean it**.

In [None]:
average_loss_eg2 = np.mean(neg_log_eg2)
print(average_loss_eg2)

#### Now, we might hit a problem with this, i.e. 0. Since -log(0) = infinity

In [None]:
print(-np.log(0))

So, we **if try to mean it**, it **will throw an error** because it **obviously cannot take infinity in its calculation of finding the mean**.

A **simple fix** to that is to **clip the amount to a fairly insignificant amount**:

Like:
> -np.log(1e-7))

All the way to:
> -np.log(1-1e-7))

In [None]:
print(-np.log(1e-7))
print(-np.log(1-1e-7))

**Here is how we will clip them:**

In [None]:
y_pred_clipped = np.clip(y_pred, 1e-7, 1-1e-7)

## **Implementing the loss function into our Neural Network:**

**We will start by defining a common (parent) Loss class:**

Its "calculate" function will take in:
* **output** = predicted value outputs of the neural network
* **y** = actual values of the batches

"**sample_losses**" method might chance depending on the kind of loss metric.

**data_loss** basically takes the sample of those losses and returns it.

In [22]:
class Loss:
    def calculate(self, output, y):
        sample_losses = self.forward(output, y)
        data_loss = np.mean(sample_losses)
        return data_loss

### **The Categorical Cross-Entropy Loss Class**

It will firstly **inherent** from the **class "Loss"**

It will have a **forward method** which will basically work on the **first line of the Loss class' "calculate" function**:
> sample_losses = self.forward(output, y)

The forward method will take:
* **y_pred** = Predicted y values of our neural network
* **y_true** = Actual y values

First things first:

* We will calculate the length of the predicted value vector.
* Clip the predicted y value vector as discussed before.

Then we need to take into account that the y_values might come out either in the form of **sparse target vector** or, **one-hot encoded 2D vectors**

**Sparse Target Vector example:**
> [1,0,1,0]

**One-Hot Encoded Vector example:**
> [[0,1], [1,0]]

So, **we want our loss function to be able to handle both of these cases** so we **write a query** to **understand the shape of the output vectors** and **deal with them accordingly**, as discussed above.

Then, we just **-log(confidences)** and we **return our loss**.

In [23]:
class Loss_CategoricalCrossEntropy(Loss):
    def forward(self, y_pred, y_true):
        samples = len(y_pred)
        y_pred_clipped = np.clip(y_pred, 1e-7, 1-1e-7)
        
        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[range(samples), y_true]
            
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum(y_pred_clipped*y_true, axis=1)
            
        negative_log_likelihoods = -np.log(correct_confidences)
        return negative_log_likelihoods

Now, all that is left is for us to call our loss function on our softmax activation function outputs:

In [24]:
loss_function = Loss_CategoricalCrossEntropy()
loss = loss_function.calculate(basic_activation2.output, y)

print(f'Loss: {loss}')

Loss: 1.0987117299631253


Thats a great loss value! 

# **Calculating Accuracy**

Doing that is very simple.

We basically run an **np.argmax()** function (function that returns the maximum 
along a specified axis), **on our softmax outputs**, and then **find the mean of that being compared to our class targets**:

In [27]:
predictions = np.argmax(softmax_output_eg2, axis=1)
accuracy = np.mean(predictions == class_targets_eg2)

print(f'acc: {accuracy}')

acc: 1.0


**Remember:** While **accuracy is practical and useful**, the **metric we most interested** in **while training a neural network is the loss metric**, which basically **tells how wrong something is**. Our goal is to **decrease that loss**.

Link to github repo: https://github.com/PrathamGhoshRoy/NeuralNetworkBasics/tree/main