# Self-study try-it activity 16.1: Jupyter Notebook on using PyTorch

To compute with neural networks, you will be using the Python library `torch`.

Ensure that the libraries `aima3` and `torchviz` are already installed. If not, please use `pip install aima3` and `pip install torchviz` to install these before proceeding.

In [None]:
pip install aima3

In [None]:
!pip install torchviz

In [None]:
import sys
from aima3 import learning
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

 `from aima3 import learning` imports the machine learning module from the aima3 package, which implements algorithms and concepts from the book "Artificial Intelligence: A Modern Approach." This allows you to access various AI learning functionalities provided by the package.

### 1. Using PyTorch for neural networks

To compute with neural networks, you will be using the Python library `torch`:
```python
from __future__ import print_function
import torch
x = torch.rand(5, 3) #Example usage
```


In [None]:
from __future__ import print_function
import torch

This code demonstrates how to define a linear layer in PyTorch using the `torch.nn` module.

- The line `import torch.nn as nn` imports PyTorch's neural network module, which provides tools to build neural networks.
- `F_1 = nn.Linear(in_features=2, out_features=2, bias=True)` creates a linear layer with two input features and two output features. The `bias=True` argument means the layer includes a bias term in its calculations.
- This linear layer performs a linear transformation on the input data, defined mathematically as \( y = xW^T + b \), where \(x\) is the input, \(W\) is the weight matrix and \(b\) is the bias vector.
- The `print` statement displays the randomly initialised weights and bias parameters of the layer.

This layer is typically a building block in neural networks, representing a fully connected (dense) layer where each input feature is connected to each output feature via learned weights.


In [None]:
import torch.nn as nn
import torch.nn.functional as func

#In PyTorch, you have to start defining the linear functions of each layer
F_1 = nn.Linear(in_features=2, out_features=2, bias=True) #Bias = True adds the bias term for input - layer 1 weights
print('A_2:', F_1.weight, '\n', 'b_2:', F_1.bias) #Weights are assigned randomly

From `F_2` above, you can define a function in Python using either the `lambda` or the `def` keywords:

```python
def F_2(x):
    return F_2.forward(x) #Or just F_2(x)
```

In [None]:
def relu(x):
  return np.maximum(0, x)

In [None]:
import numpy as np

def sigmoid(x):
  return 1 / (1 + np.exp(-x))

1. An input tensor in PyTorch is created by converting a NumPy array with specific values and data type.

2. A linear layer in PyTorch is defined, which maps the input features to a specified number of output features and includes a bias term.

3. A function `F_2` is defined, which applies the linear layer `L_2` followed by a ReLU activation function to the input.

4. Finally, evaluate the input tensor by passing it through the defined linear layer and ReLU activation function to get the output.



In [None]:
x = torch.from_numpy(np.array([1., 2.], dtype=np.float64)).float() #Use numpy commands – example input to Layer 2
L_2 = nn.Linear(in_features=2, out_features=3, bias=True) #Define the second step -> 2 - to 3 + the bias term
F_2 = lambda x: func.relu(L_2.forward(x)) #F_2 applies the ReLU activation on the L_2 evaluation.
F_2(x)

You can apply the above steps by simple algebra, so what `torch` applies is not a mystery:

- Next, extract the weights and biases from the PyTorch linear layer and convert them to NumPy arrays for matrix operations.

- Perform the affine transformation on the input by multiplying the weight matrix with the input vector and adding the bias, then apply the ReLU activation function to get the final output.

In [None]:
weights = L_2.weight.detach().numpy() #The weights L2 uses are taken
x_vect = x.detach().numpy().reshape(2,1) #You take the input as a NumPy array
input_of_L_2 = weights.dot(x_vect) + L_2.bias.detach().numpy().reshape(3,1) #Now, apply the affine transformation
output_of_L2 = relu(input_of_L_2) #Composition with the ReLu function (activate the input)
output_of_L2 #This is what F_2 returns

Finally, you apply the last layer:

In [None]:
L_3 = nn.Linear(in_features=3, out_features=2, bias=True)
F_3 = lambda x: L_3.forward(x) #This is just evaluation; here, you do not apply ReLU (or this is "identity activation")

You defined all layers. Now, to make a prediction, you can simply compose the layers:

In [None]:
F = lambda x: F_3(F_2(F_1(x)))

And finally, giving an input to `F` will return an output:

In [None]:
x = torch.from_numpy(np.array([1., 2.], dtype=np.float64)).float()
y = F(x)
print("The output is given by" ,y)

#To-do:
For the input [-2, 5], use the torch network created and compute the output.

In [None]:
x = torch.from_numpy(np.array([-2, 5], dtype=np.float64)).float()
y = F(x)
print("The output is given by" ,y)

Note that `F` is just a function with fixed weights since `torch` automatically assigns initial weights.

#### 2. Visualising a network architecture

## Neural network model using `nn.Sequential` container and visualisation using `torchviz`

This code constructs a small neural network model using PyTorch’s `nn.Sequential` container, runs a random input through it and visualises the computation graph with `torchviz`.

- **Importing modules**:
  - `Variable` from `torch.autograd` is used to wrap tensors for automatic differentiation (though in newer PyTorch versions, tensor itself has this capability).
  - `make_dot` from `torchviz` is used to create a visual graph of the computations for the model’s output tensor.

- **Defining the model with `nn.Sequential`**:
  - `nn.Sequential()` is a sequential container where layers and functions are added in order.
  - `model.add_module('W1', nn.Linear(2, 2))` adds the first linear layer with two input features and two output features.
  - `model.add_module('W2', nn.Linear(2, 3))` adds the second linear layer mapping from two to three features.
  - `model.add_module('relu', nn.ReLU())` adds a ReLU activation function to introduce non-linearity.
  - `model.add_module('W3', nn.Linear(3, 2))` adds a final linear layer mapping from three to two output features.

- **Creating input and computing output**:
  - `x = Variable(torch.randn(1, 2))` creates a single random input tensor with two features wrapped as a variable to track operations for automatic differentiation.
  - `y = model(x)` forwards the input `x` through the model, producing an output tensor `y`.

- **Visualising the computation graph**:
  - `make_dot(y, params=dict(model.named_parameters()))` generates a graphical visualisation of the computation graph starting from output `y`.
  - This graph illustrates the flow of data and operations through the layers, including weights and biases from each layer in the model.
  
This code is useful for defining a simple feedforward neural network, executing a forward pass on random data and visually inspecting the computation graph to understand how tensors and operations connect across the network layers.


In [None]:

from torch.autograd import Variable
from torchviz import make_dot
model = nn.Sequential()
model.add_module('W1', nn.Linear(2,2)) #Add a layer with linear transformation as before
model.add_module('W2', nn.Linear(2,3)) #Add another layer similarly
model.add_module('relu', nn.ReLU()) #Add a ReLU activation
model.add_module('W3', nn.Linear(3, 2)) #Add a final linear layer
x = Variable(torch.randn(1, 2)) #Input variable
y = model(x)
make_dot(y, params=dict(model.named_parameters()))

You can change the activation functions and number of layers for different purposes. For example, you can represent logistic regression with the architecture shown in the beginning of this notebook. You can visualise the architecture by using `torchviz`.

In [None]:
#This is how a simple logistic regression would look like
model = nn.Sequential()
model.add_module('W0', nn.Linear(2, 2))
model.add_module('logit', nn.Sigmoid()) #Sigmoid activation
x = Variable(torch.randn(1, 2))
y = model(x)
make_dot(y.max(), params=dict(model.named_parameters()))

### Demonstration of logistic regression via a neural network

For a binary classification problem, using logistic regression with two input features $$x_1$$ and $$x_2$$, the probability of an instance belonging to the positive class \(y = +1\) is given by:
$$ \mathbb{P}[y = 1 | x_1, x_2] = \dfrac{1}{ 1 + \exp( - w_0 - w_1 x_1 -  w_2 x_2)} = s(w_0 + w_1x_1 + w_2x_2).$$

### To-do:
Find the probability $\mathbb{P}[y = 1 | x_1, x_2]$ where $w_0 = 1, w_1 = 0.3, w_2 = -0.1$ for the input $x = (x_1 = 1, x_2 = -2)$.

Try this using simple `sigmoid()`.

**Answer:**

In [None]:
w0,w1,w2 = 1, 0.3, -0.1
x1,x2 = 1, -2
prob = sigmoid(w0 + w1*x1 + w2*x2)
round(prob,4)

### To-do:
Model the same with a simple neural network. Report the probability of the same input belonging to class +1 when the weights are fixed accordingly.
*(Hint: The weights within two layers are automatically being initialised via `torch`; however, you may change it by ```L_1.weight.data = ...``` and ```L_1.bias.data = ...```)*

**Answer:**

In [None]:
L_1 = nn.Linear(in_features=2, out_features=1, bias=True) #Bias = True adds the bias term for input - layer 1 weights
L_1.weight.data = torch.from_numpy(np.array([[0.3, -0.1]], dtype=np.float64)).float() #Give the transformation weights
L_1.bias.data = torch.from_numpy(np.array([1], dtype=np.float64)).float() #Give the bias weight
F_1 = lambda x: func.sigmoid(L_1.forward(x)) #F_2 applies the ReLU activation on the L_2 evaluation.

In [None]:
print('Transformation weights', L_1.weight, '\n', 'bias weight:', L_1.bias) #Weights fixed as specified.

In [None]:
x = torch.from_numpy(np.array([1,-2], dtype=np.float64)).float() #Input
y = F_1(x)
prob_nn = y.item() #Extract the estimated probability
round(prob_nn,4)