In [1]:
import numpy as np # linear algebra

# Neural Network Basics

This is my personal documentation of my journey of learning the basics of Neural Networks.

Thanks to: [sentdex](https://www.youtube.com/@sentdex) for his lessons in NNFS (Neural Networks from Scratch)

#### First, lets import his module for accessing his datasets

In [2]:
!pip install nnfs

Collecting nnfs
  Downloading nnfs-0.5.1-py3-none-any.whl (9.1 kB)
Installing collected packages: nnfs
Successfully installed nnfs-0.5.1


In [3]:
import nnfs

An example of a 2-D array of inputs:

In [4]:
X = [[1, 2, 3, 2.5],
     [2.0, 5.0, -1.0, 2.0],
     [-1.5, 2.7, 3.3, -0,8]]

## Neural Network:
Okay, so here is a picture of a Neural Network for context:

![](https://victorzhou.com/media/nn-series/network.svg)

It consists of:

* **Neurons:** Those circles are called neurons.
    It consists of:
    * **Baises:** Values added (+) to the output.
    * **Activation Function:** A function basically deciphers a neuron's importance in making the output decision.
    
        
* **Layers:**
    * Input Layer: The first layer where the model recieves its inputs (in blue)
    * Hidden Layers: The middle two(or however many) consisting of 6 neurons(in this example) each (in black)
    * Output Layer: The last layer at the end, which is the output of the model (in green)
    
    
* **Weights:** The various lines which you see that joins each neuron of a particular layer to its neighbouring layers are called weights. They are multiplied with the previous layer's output.    


### The Neural Network Equation:


![](https://pub.mdpi-res.com/universe/universe-08-00120/article_deploy/html/images/universe-08-00120-g001-550.jpg?1645603658)



Very similar to the equation: y = mx + b
Where,
* **y** = output
* **m** = weights
* **x** = inputs
* **b** = bias

What it basically does is that it takes in the **inputs** from the **previous neuron**, **multiply them** with the **weights**, or the **connections(lines)**, and once it **reaches the current layer's neuron**, if and when it goes through the **activation function**, the **neuron's bias** gets added and then it **outputs** it to **the next layer**.

### Lets create our code for a Layer

**Layer Dense**

First, we are defining the weights:
* **Weights** will be an array of **random numbers** of the dimension: **(number of inputs, n_neurons)** since **we need weights of each and every input per neuron.**

We will also **normalize** the weights by **multiplying it with a number(n) i.e. 1>n>0**, here lets take 0.10
> self.weight = 0.10 * np.random.rand(n_inputs, n_neurons)

Next, we will define the biases:

* **Biases** will be (for now), **an array of zeroes** for **each neuron**, so the **dimension** will be: **(1, number of neurons)**

Why? 
Well, if the number of the biases are too big, it **might just explode** by the **time it comes to the output layer**. We **don't** want very big numbers because **they might explode and become incomputable**, hence we will keep the biases 0 for now.
> self.biases = np.zeroes((1, n_neurons))

**Drawback of having 0 biases:**

If **incase** the **sum of an equation for a neuron becomes 0** because of our bias (i.e. the value being added in the equation) is zero, the **input for the next layer will be 0**, and so will be its output, since anything multiplied by a 0 is 0. 

**We do not want** that **since that will create a ripple effect of outputs of 0 for each layer** and we will be left with something called a **dead network**.

So, if we have a **dead network**, try changing the biases to a **non-zero number**.

##### **Forward method of Layer Dense:**

It will simply output the dotproduct i.e. (y = mx) + b for each neuron, hence:
> self.output = np.dot(inputs, self.weight) + self.biases

In [5]:
class Layer_Dense:
    def __init__(self, n_inputs, n_neurons):
        self.weight = 0.10 * np.random.rand(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))
        
    def forward(self, inputs):
        self.output = np.dot(inputs, self.weight) + self.biases      

## Activation Function

Each neuron **should** consist of an activation function.

**Activation Function:** A function which basically **decides whether** a **neuron should be activated or not** based on its **weighted sum of inputs**. The **choice of activation function** affects the network's ability to learn **complex patters** and **relationships** in data.

For our example, we will be using **ReLU (Rectified Linear Unit)** Function.

### Rectified Linear Unit (ReLU) function

![](https://assets-global.website-files.com/5d7b77b063a9066d83e1209c/60d24d1ac2cc1ded69730feb_relu.jpg)

**ReLU function:**

> f(x) = max(0, x)

It basically **assings a value** of **0 (unactivated)** or **non-zero 'x' (activated)** to the **weighted sum of a neuron's inputs**.

In [6]:
class Activation_ReLU:
    def forward(self, inputs):
        self.output = np.maximum(0, inputs)

Next we will make an **Softmax Activation** for our output layer

## Softmax Activation

![](https://miro.medium.com/v2/resize:fit:1232/0*GxuMPOpGsMoN5RwI)

Softmax Activation function: 

![](https://docs-assets.developer.apple.com/published/c2185dfdcf/0ab139bc-3ff6-49d2-8b36-dcc98ef31102.png)

## Softmax vs Sigmoid

Softmax curve might be looking like the sigmoid curve, but there is quite a bit of difference: 

![](https://i.stack.imgur.com/iJ6vX.png)

## What is the softmax function doing?

Basically, it is putting the output values (output of the model) through:
* The exponential function i.e. -> y =e^x
* Then normalizing it

**Remember:** The **inputs** here are the **outputs of our model/output layer**, hence it will be in **batch form** (since we will have a whole dataset of inputs for any given model, in most cases).



* First we will exponentiate each value subtracted by the max value in that batch.

> y = e^(x - max(x))

So:

> exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True)

#### Why are we doing that?


What subtracting each value with that array's maximum value will basically do is:
* Turn the **highest number** in that array into a **0**
* **Rest of the numbers** will be **less than 0**

Then when we exponentiate it:
* The **max number, previously 0**, will **become 1** since **e^0 = 1.0**
* All the **other numbers** in the array will be a **number less than 1** but **more than 0** (classification heaven: 0-1)
* All other numbers(n) in the array will be: **1>n>0**

In the following code: 
> exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True)

**Remember, since:**

* **we want the max of each respective array**, and **not the entire array**, we **do axis=1**.
* And **keepdims=True** to **output the values** in the **same dimensions as the input**.

### Next, in the softmax function:

* **Dividing the exponential values** by the **sum of the exponential values of respective arrays**

**Why?**

So that we have a **normalized set of values for our outputs**, which will **help us achieve** our **goal of creating a neural network** which **works well with a multiple-classification problem**.


In [7]:
class Activation_Softmax:
    def forward(self, inputs):
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
        self.output = probabilities

Now, will **input a dataset** which basically is **visually spiral** when **scatter plotted**.

Something like:

![](https://telesens.co/wp-content/uploads/2017/09/img_59cbd07be1178.png)

Link to spiral dataset code: https://github.com/Sentdex/nnfs/blob/master/nnfs/datasets/spiral.py

### Training Spiral Data:

Lets create training data in a **spiral format!**

In [8]:
from nnfs.datasets import spiral_data

X, y = spiral_data(samples=100, classes=3)

#### For starters, lets create a **basic input layer**:

In [9]:
basic_dense1 = Layer_Dense(2,3)

In [10]:
print(basic_dense1)

<__main__.Layer_Dense object at 0x79de06cd15a0>


#### Now, we create an **activation function** for basic_dense1

In [11]:
basic_activation1 = Activation_ReLU()

#### Now, lets create the output layer:

With **softmax activation**.

In [12]:
basic_dense2 = Layer_Dense(3, 3)
basic_activation2 = Activation_Softmax()

#### Lets test it out!

In [13]:
basic_dense1.forward(X)
basic_activation1.forward(basic_dense1.output)

basic_dense2.forward(basic_activation1.output)
basic_activation2.forward(basic_dense2.output)

In [14]:
print(basic_activation2.output[:5])

[[0.33333333 0.33333333 0.33333333]
 [0.33333844 0.33332724 0.33333432]
 [0.33335953 0.33330719 0.33333328]
 [0.33337674 0.33329111 0.33333215]
 [0.3333764  0.33328889 0.33333471]]


## Everything is working great!

That was our first ever neural network! Congrats!

3.12.2023

Link to github repo: https://github.com/PrathamGhoshRoy/NeuralNetworkBasics/tree/main