# Linear Neural Networks

In this notebook, we will explore the mathematical foundations and implementation of Linear Neural Networks. Linear models are the simplest form of neural networks and are primarily used for linear regression tasks.


## Theoretical Background

### Overview
Linear Neural Networks are composed of layers where each neuron performs a linear transformation of the input.

**Type of Function**: Linear

**Nature**: Continuous

**Behavior**: Linear neural networks are the simplest form of neural networks where the output is a linear combination of the input features. 

### Mathematical Formulation


The output \(y\) is given by:
\[ y = XW + b1\]
where \( X \) is the input, \( W \) is the weight matrix, and \( b \) is the bias term.


# Implementation in PyTorch

In [1]:
import torch 
from torch import nn
from torch.optim import SGD

### Mathematical function

In [2]:
#quadratic polynomial function
def f(x):
    return x**2 + 1

x = torch.tensor(4.0, requires_grad=True)
y = f(x)
print("f(x) = ", y.item())

#gradient
y.backward()
print("df/dx = ", x.grad.item())

f(x) =  17.0
df/dx =  8.0


In [3]:
#linear polynomial function
def lin_F(x):
    W = torch.tensor([1.0], requires_grad=True)
    b = torch.tensor([1.0], requires_grad=True)
    
    assert x.shape[-1] == W.shape[0], """
    Invalid shape. (mxn)(nxp) = (mxp). Check shape. W.shape == 1
    """
    return x@W + b, W, b

## Understanding BackProg

In [4]:
# Using PyTorch's .grad and .backward functions
x = torch.tensor([2.0], requires_grad=True)
y_true = torch.tensor([10.0])

#forward pass
y_pred, W, b = lin_F(x)
print("y =", y_pred.item())

#calculate the loss : mean squared error
Loss = ((y_true - y_pred)**2).mean()

#gradient via PyTorch
Loss.backward()
print("Loss = ", Loss.item())
print("dL/y_pred (pT)= ", x.grad.item())

# Gradient of loss w.r.t W and b
print("dL/dW (pT)= ", W.grad.item())
print("dL/db (pT)= ", b.grad.item())

y = 3.0
Loss =  49.0
dL/y_pred (pT)=  -14.0
dL/dW (pT)=  -28.0
dL/db (pT)=  -14.0


In [5]:
# Calculating gradients manually

y_pred, W, b = lin_F(x)
man_loss = ((y_true - y_pred)**2).mean()
print("y =", y_pred.item())

Loss = ((y_true - y_pred)**2).mean()

#differentiating Loss w.r.t y_pred (chain-rule)
u = y_true - y_pred
v = u**2

du = -1 
dv = 2*u
dv_dypred = du*dv

dL = dv_dypred.mean()
print("dL/dy_pred = ", dL.item())

#differentiating Loss w.r.t W (chain-rule)
dW = dL * x
dB = dL*1

print("dL/dW = ", dW.item())
print("dL/db = ", dB.item())

y = 3.0
dL/dy_pred =  -14.0
dL/dW =  -28.0
dL/db =  -14.0


## Creating a simple two layers neural network with activation function 

In [17]:
torch.manual_seed(0)

<torch._C.Generator at 0x116b12e90>

In [23]:
W1 = torch.randn(2, 3, requires_grad=True)  # Weights for layer 1
b1 = torch.randn(2, requires_grad=True)     # Bias for layer 1

W2 = torch.randn(1, 2, requires_grad=True)  # Weights for layer 2
b2 = torch.randn(1, requires_grad=True)     # Bias for layer 2


# Input
X = torch.randn(1, 3)

# Forward pass
## Output of first layer
z1 = X@W1.T + b1
a1 = torch.relu(z1)          

## Output of second layer
z2 = a1@W2.T + b2 
y_pred = z2                    

In [24]:
# Target
y = torch.tensor([[1.0]])

# MSE Loss
loss = (y - y_pred).pow(2)

In [25]:
# Gradients of loss with respect to output
d_loss_output = -2 * (y - y_pred)

# Gradients of output of layer 2 with respect to weights and biases
d_output_W2 = a1.T
d_output_b2 = 1

# Gradient of loss with respect to W2 and b2
grad_W2 = d_loss_output * d_output_W2
grad_b2 = d_loss_output * d_output_b2

print("Manual gradient for W2:\n", grad_W2)
print("Manual gradient for b2:\n", grad_b2)


Manual gradient for W2:
 tensor([[ 0.0000],
        [21.3885]], grad_fn=<MulBackward0>)
Manual gradient for b2:
 tensor([[5.0055]], grad_fn=<MulBackward0>)


In [26]:
# Compute loss
loss.backward()

print("Autograd gradient for W2:", W2.grad)
print("Autograd gradient for b2:", b2.grad)


Autograd gradient for W2: tensor([[ 0.0000, 21.3885]])
Autograd gradient for b2: tensor([5.0055])


In [27]:
# Gradient of loss w.r.t. output of ReLU (a1)
d_loss_a1 = d_loss_output@W2

# Gradient of ReLU w.r.t. its input (z1)
d_a1_z1 = (z1 > 0).float()  # Derivative of ReLU

# Gradient of loss w.r.t. output of first layer (z1)
d_loss_z1 = d_loss_a1 * d_a1_z1

# Gradient of z1 w.r.t. W1 and b1
d_z1_W1 = X.t()
d_z1_b1 = 1

# Gradient of loss w.r.t. W1 and b1
grad_W1 = d_loss_z1.T@d_z1_W1.T
grad_b1 = d_loss_z1 * d_z1_b1

print("Manual gradient for W1:\n", grad_W1)
print("Manual gradient for b1:\n", grad_b1)


Manual gradient for W1:
 tensor([[ 0.0000,  0.0000,  0.0000],
        [ 4.9017, -5.0050, -3.0683]], grad_fn=<MmBackward0>)
Manual gradient for b1:
 tensor([[0.0000, 3.8785]], grad_fn=<MulBackward0>)


In [28]:
W1.grad, b1.grad

(tensor([[ 0.0000,  0.0000,  0.0000],
         [ 4.9017, -5.0050, -3.0683]]),
 tensor([0.0000, 3.8785]))

In [29]:
# Seed for reproducibility
torch.manual_seed(0)

# Parameters
W1 = torch.randn(2, 3, requires_grad=False)
b1 = torch.randn(2, requires_grad=False)
W2 = torch.randn(1, 2, requires_grad=False)
b2 = torch.randn(1, requires_grad=False)

# Learning rate
lr = 0.01


In [30]:
def forward(X):
    # Forward pass through the network
    z1 = torch.mm(X, W1.t()) + b1
    a1 = torch.relu(z1)
    z2 = torch.mm(a1, W2.t()) + b2
    return z2, a1, z1

def backward(X, a1, z1, z2, outputs, targets):
    # Calculate loss (MSE)
    loss = (outputs - targets).pow(2).mean()
    
    # Gradients of the loss w.r.t. output
    d_loss_output = 2 * (outputs - targets) / outputs.size(0)
    
    # Gradients through the second layer
    grad_W2 = torch.mm(d_loss_output.t(), a1).t()
    grad_b2 = d_loss_output.sum(0)
    
    # Gradients through the first layer
    d_loss_a1 = torch.mm(d_loss_output, W2)
    d_a1_z1 = (z1 > 0).float()  # Derivative of ReLU
    d_loss_z1 = d_loss_a1 * d_a1_z1
    
    grad_W1 = torch.mm(d_loss_z1.t(), X).t()
    grad_b1 = d_loss_z1.sum(0)
    
    return grad_W1, grad_b1, grad_W2, grad_b2, loss

def update_params(grad_W1, grad_b1, grad_W2, grad_b2, lr):
    # Update parameters using gradient descent
    global W1, b1, W2, b2
    W1 -= lr * grad_W1
    b1 -= lr * grad_b1
    W2 -= lr * grad_W2
    b2 -= lr * grad_b2


In [31]:
# Dummy dataset
X = torch.randn(100, 3)  # 100 samples, 3 features each
targets = torch.randn(100, 1)  # 100 target values

# Training loop
for epoch in range(100):  # 100 epochs
    for i in range(X.size(0)):
        inputs = X[i:i+1]
        target = targets[i:i+1]
        
        # Forward pass
        outputs, a1, z1 = forward(inputs)
        
        # Backward pass
        grad_W1, grad_b1, grad_W2, grad_b2, loss = backward(inputs, a1, z1, outputs, outputs, target)
        
        # Update parameters
        update_params(grad_W1, grad_b1, grad_W2, grad_b2, lr)
    
    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {loss.item()}')


RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1

## Example Use Case

### Dataset
We use a synthetic dataset for demonstration purposes.

### Preprocessing
No preprocessing is required for this simple dataset.

### Training the Model
Training the Linear Neural Network using the synthetic dataset.


## Visualization

## Conclusion and Insights

In this notebook, we have explored the fundamentals of Linear Neural Networks and implemented a simple model in PyTorch. Linear models are useful for understanding the basic principles of neural networks and serve as a foundation for more complex architectures.
