# Lecture 1 - Building Neural Network for Intelligence

## Introduction

### Section 1: Natural Intelligence = Brain

#### Electrical Brain
1. Group of Interconnected Wires (Neuron Connections) with different amount of Fat Insulation (Myelination)
2. Carrying electrical signals (Data)

#### Network Brain
Brain is a
1. Large 
2. Interconnected
3. Network of
4. Neurons
   1. Group of Neurons at same level are called Layer
5. Check [Brain Neural Network](http://nxxcxx.github.io/Neural-Network/)

#### Single Neuron - 3 Things
1. Input Data through Wire
2. Wire with varying Insulation Strength via Mylenation
3. Output Connection

#### Single Neuron Diagram
![Biological Neuron Model](https://www.researchgate.net/publication/341241129/figure/fig1/AS:888908187443205@1588943635819/Biological-Neuron-Model.ppm)
![Single Neuron](https://media.geeksforgeeks.org/wp-content/uploads/20230410104038/Artificial-Neural-Networks.webp)
![Layers of Neurons](https://qph.cf2.quoracdn.net/main-qimg-084ade3ed1f8a97709e374090a92e1ca.webp)

#### Brain as Layered Neural Network
![Visual Processing in Brain](https://neuwritesd.files.wordpress.com/2015/10/visual_stream_small.png)



### Section 2: Artificial Intelligence

Brute Force Error Minimizer from Data

```python

for each x_actual & y_actual in train_data_loader:
    y_predicted_LOGITS = model(x_actual)
    loss               = error_func(y_predicted_LOGITS, y_actual)

    dError_dWeights = torch.autograd.grad(outputs= loss, inputs = model.parameters() )
    for weight, gradient in zip(model.parameters(), dError_dWeights):
        weight = weight - gradient * learning_rate
        print(weight.shape, gradient.shape)
        print(weight, gradient)

```


### Neural Network in Pytorch
Neural Network has 4 Steps
1. Data
2. Model Architecture
3. Model Training
4. Model Evaluation

Model Training has 5 Steps
1. Predict from Existing Weight Values. (Network)
2. Calculate Error of Prediction wrt y_actual
3. Clear dError_dWeights
4. Calculate dError_dWeights
5. $ w = w - \nabla * lr $

$$
\large trained\_model = \operatorname*{argmin}_{\mathbf{w}, b}\  Loss( y_{predicted}, y_{actual})\\
$$

In [1]:
!pip install datasets
!pip install wandb
!pip install torchmetrics
!pip install torchinfo
!pip install torchvision
!pip install ipyplot



### Data

In [2]:
import torch, torch.nn as nn
import datasets as huggingface_datasets

import ipyplot

"""
digits_dataset = huggingface_datasets.load_dataset("mnist", split="train")
digits_dataloader = torch.utils.data.DataLoader(digits_dataset, batch_size= 4)
digits_dataset.set_format(type='torch', format_kwargs={"dtype": torch.float32})
ipyplot.plot_images(digits_dataset['image'][0:5]);
"""

'\ndigits_dataset = huggingface_datasets.load_dataset("mnist", split="train")\ndigits_dataloader = torch.utils.data.DataLoader(digits_dataset, batch_size= 4)\ndigits_dataset.set_format(type=\'torch\', format_kwargs={"dtype": torch.float32})\nipyplot.plot_images(digits_dataset[\'image\'][0:5]);\n'

In [4]:
from torchvision import datasets as torchvision_datasets
from torchvision import transforms as torchvision_transforms

BATCH_SIZE = 4

train_dataset    = torchvision_datasets.MNIST( root= '../dataset', transform= torchvision_transforms.ToTensor(), train= True, download= True )
train_dataset, validation_dataset = torch.utils.data.random_split(train_dataset, [0.9, 0.1])

train_dataloader      = torch.utils.data.DataLoader( dataset = train_dataset,      batch_size = BATCH_SIZE, shuffle = True )
validation_dataloader = torch.utils.data.DataLoader( dataset = validation_dataset, batch_size = BATCH_SIZE, shuffle = True )

TOTAL_BATCHES = len(train_dataset) / BATCH_SIZE

In [8]:
ipyplot.plot_images(train_dataset[0][0])

#### Model

In [10]:
import torch, torch.nn as nn

hidden_layer_1st = nn.Linear(in_features = 2, out_features = 4)
hidden_layer_1st = nn.Linear(out_features = 4, in_features = 2)

nn.Linear(out_features = 4, in_features = 2)
nn.Linear(out_features = 8, in_features = 4)
nn.Linear(out_features = 14, in_features = 8)

layer = nn.Linear(out_features = 4, in_features = 2)
layer

Linear(in_features=2, out_features=4, bias=True)

$$y = X \odot W + b$$

In [13]:
layer.weight, layer.bias
layer.weight[2]

tensor([ 0.6502, -0.4159], grad_fn=<SelectBackward0>)

In [None]:
import keras_core as keras

keras.layers.Dense(units = 4, activation="relu")
keras.layers.Dense(units = 8)
keras.layers.Dense(units = 14)


In [14]:
from torch.nn import ReLU as ActivatePositive

MODEL = nn.Sequential(
    nn.Identity(),                                             # LAYER 1: INPUT LAYER
    nn.Flatten(start_dim=1),                                   #          IMAGE RESHAPE
    nn.Linear(out_features = 20, in_features = 28*28*1),       # LAYER 2: 1st Hidden Layer
    ActivatePositive(),                                        #          Activation Function f(x) -> (if x < 0: return 0) & else (if x > 0: return x)
    nn.Linear(out_features = 10 , in_features = 20),           # LAYER 3: Output Layer
    # NO ACTIVATION FUNCTION ON FINAL LAYER. Called logits as pre activation value
)

model_parameters = list(MODEL.parameters())

In [19]:
ERROR_FUNC = nn.functional.cross_entropy
LEARNING_RATE = 0.001

OPTIMIZER    = torch.optim.SGD( params= model.parameters() , lr= LEARNING_RATE)
# GRADIENTS  = torch.autograd.grad(output = loss, input = params)

model, error_func, learning_rate, optimizer = MODEL, ERROR_FUNC, LEARNING_RATE, OPTIMIZER

In [21]:
from torchinfo import summary

summary(MODEL, input_size=(1,28,28), 
        verbose=2, col_names = ["input_size", "output_size","kernel_size", "num_params","trainable", "params_percent"], col_width=20);

Layer (type:depth-idx)                   Input Shape          Output Shape         Kernel Shape         Param #              Trainable            Param %
Sequential                               [1, 28, 28]          [1, 10]              --                   --                   True                      --
├─Identity: 1-1                          [1, 28, 28]          [1, 28, 28]          --                   --                   --                        --
├─Flatten: 1-2                           [1, 28, 28]          [1, 784]             --                   --                   --                        --
├─Linear: 1-3                            [1, 784]             [1, 20]              --                   15,700               True                  98.68%
│    └─weight                                                                      [784, 20]            ├─15,680
│    └─bias                                                                        [20]                 └─20
├─ReLU: 

In [17]:
model.named_parameters()
model.parameters()

for name, parameter in model.named_parameters():
    print(name,parameter.shape)

2.weight torch.Size([20, 784])
2.bias torch.Size([20])
4.weight torch.Size([10, 20])
4.bias torch.Size([10])


In [None]:
nn.Linear(in_features=1,out_features=32)

**Same code as Pytorch, easier to read & understand in keras**
```python

import keras_core as keras
from keras import layers, models

import os
os.environ["KERAS_BACKEND"] = "torch"

k_model = models.Sequential([
    layers.Input(shape=(28,28,1))                   # LAYER 1: Input Layer
    layers.Flatten(),
    layers.Dense(units = 100, activation="relu"),   # LAYER 2: 1st Hidden Layer with Activation Function
    layers.Dense(units = 10 )                       # LAYER 3: Output Layer
])
k_model.summary()

k_model.compile(loss = "cross_entropy", optimizer = "adam")
```

In [None]:
from torchinfo import summary

summary(MODEL, input_size=(1,28,28), 
        verbose=2, col_names = ["input_size", "output_size","kernel_size", "num_params","trainable", "params_percent"]);

#### Model Reducing Error by Looking at Data

```python
loss = error_func( y_predicted_logits, y_actual )
de_dw = torch.autograd.grad(outputs= loss, inputs = model.parameters() )
loss.backward()
```
$$
\frac{\partial E}{\partial W}
$$
```python
optimizer.step()
for parameter in model.parameters():
    parameter = parameter - learning_rate * parameter.gradient
```

In [None]:
x_actual, y_actual = next(iter(train_dataloader))


y_predicted_LOGITS = model.forward(input=x_actual)
loss               = error_func(y_predicted_LOGITS, y_actual)

optimizer.zero_grad()

dError_dWeights = torch.autograd.grad(outputs= loss, inputs = list(model.parameters()) )
optimizer.step() # SINGLE STEP UPDATES ALL PARAMETERS by one STEP

for weight, gradient in zip(model.parameters(), dError_dWeights):
    weight = weight - gradient * LEARNING_RATE

    print(weight.shape, gradient.shape)
    print(weight, gradient)

In [8]:
import torchmetrics
import wandb
wandb.init()

REPEAT = 10

def trainer_function(train_dataloader, model, error_func, optimizer, epochs):
    model.train(mode=True)
    for epoch_no in range(epochs):

        loss_total, accuracy_total = 0, 0
        for batch_no, (x_actual, y_actual) in enumerate(train_dataloader):

            y_predicted_LOGITS = model.forward(x_actual)
            y_predicted_probs  = nn.functional.softmax(y_predicted_LOGITS, dim= 1)
            loss               = error_func(y_predicted_LOGITS, y_actual)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            loss_batch = loss.item()
            accuracy_batch = torchmetrics.functional.accuracy(y_predicted_LOGITS, y_actual, task="multiclass", num_classes=10)
            
            loss_total = loss_total + loss_batch 
            accuracy_total = accuracy_total + accuracy_batch
            metrics_per_batch = {
                "loss": loss_batch,
                "accuracy_batch": accuracy_batch,
                "batch_no": batch_no
            }
            wandb.log(metrics_per_batch)
            # dError_dWeights = torch.autograd.grad(outputs= loss, inputs = model.parameters() )
            # for parameter, gradient in zip(model.parameters(), dError_dWeights):
            #     parameter = parameter - gradient * learning_rate
        
        accuracy_average = accuracy_total / TOTAL_BATCHES
        metrics_per_epoch = {
            "train_accuracy_epoch": accuracy_average,
            "epoch": epoch_no
        }
        wandb.log(metrics_per_epoch)
        evaluate_model(validation_dataset, model, error_func)

def evaluate_model(dataset, model, error_func):
    model.train(mode=False)

    loss_total, accuracy_total = 0, 0
    for x_actual, y_actual in validation_dataloader:
        y_predicted_LOGITS = model(x_actual)
        loss = error_func(y_predicted_LOGITS, y_actual)
        accuracy = torchmetrics.functional.accuracy(y_predicted_LOGITS, y_actual, task="multiclass", num_classes=10)
        
        loss_total = loss_total + loss 
        accuracy_total = accuracy_total + accuracy
    
    accuracy_avg = accuracy_total / len(dataset)
    wandb.log("validation_accuracy",accuracy_avg)
    

trainer_function(train_dataloader, MODEL, ERROR_FUNC, OPTIMIZER, REPEAT)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33majinkyakolhe112[0m ([33mm2_mac[0m). Use [1m`wandb login --relogin`[0m to force relogin


### Section 4: Brain vs Artificial Neural Network

- The brain does not learn by implementing a single, global optimization principle within a uniform and undifferentiated neural network.
- Rather, biological brains are modular, with distinct but interacting subsystems underpinning key functions such as memory, language, and cognitive control
- The primate visual system works differently. Rather than processing all input in parallel, visual attention shifts strategically among locations and objects, centering processing resources and representational coordinates on a series of regions in turn
- Continual Learning is an ability to master new tasks without forgetting how to perform prior tasks. Brain does continual Learning easily. Neural Networks can't do that.They do Catastrophic Forgetting
- Efficient Learning: ability to rapidly learn about new concepts from only a handful of examples
- Transfer Learning

## Neural Networks in more Detail

### 7 Steps to Learned Neural Network
1. Dataset in Detail
2. Neural Network Forward Pass & Dot Product & Activation
3. Error Function & Calculation for each Data
4. Error Gradient Calculation / Backward Pass
5. PARAMETER update in direction of Error Reduction. Model Training Monitoring
6. Model Report

In [None]:
# TODO: MODEL 
feature_extractor = nn.Sequential(
    nn.Conv2d( out_channels = 50, in_channels = 1, kernel_size = (3,3) , padding="same"),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=(2,2), stride = 2),
  
    nn.Conv2d(out_channels = 100, in_channels = 50, kernel_size = (3,3), padding="same"),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=(2,2), stride = 2),

)

decision_maker = nn.Sequential(
  nn.Linear(out_features = 50, in_features = 100*7*7 ),
  nn.Linear(out_features = 10, in_features = 50)
)

model = nn.Sequential(
  feature_extractor,
  decision_maker
)

## Rest

##### Types of Intelligence
1. No Intelligence      - 
1. Narrow Intelligence  - Single Task Intelligence
1. General Intelligence - Multiple Tasks Intelligence
1. Super Intelligence   - More tasks than possible by Single Human

---
##### Complexity of Intelligence
1. Standing Up & Picking Up a Pen
2. Identifying an Object
3. Understanding Words
---
##### Possible Applications via Flexibility
1. Robotics
2. Visual Factory Hand 
3. ChatGPT+

