## Perceptrons Brain Dump:

TLU: threshold logic unit<br>
LTU: linear threshold unit<br>


#### **how is a single perceptron trained?** <br>
**basic idea:** <br>
* when two neurons fire together, the connection bewtween them becomes stronger. <br> 
* connection weight between two neurons is increased whenever they have the same output. <br>
* perceptrons are trained in a similar way by also taking into account the error made by the network. it reinforces the connections that help reduce the error.

~~~
for each (x, y) in dataset:
    z = dot(w, x) + b
    y_pred = sign(z)
    if y_pred != y:
        w = w + η * y * x
        b = b + η * y
~~~


Works only if data is linearly separable. Logistic Regression gives more accurate outputs than a single perceptron.

Converges in finite steps if such a solution exists (Perceptron Convergence Theorem).

Simple, but doesn't work well on complex tasks (hence, MLPs / neural nets).

### Implementing from scratch

In [2]:
import numpy as np

In [17]:
class TLU(object):

    def __init__(self, input_size):
        self.weights = np.zeros(input_size + 1)

    
    def activate(self, x):
        return 1 if x >= 0 else 0
    

    def predict(self, row):
        # predict individual row in a given dataset
        xw = np.array(row).dot(self.weights)
        a = self.activate(xw)
        return a
    

    def train_tlu(self, data, targets, epochs, lrate):
        for e in range(epochs):
            # x row and each target
            for row, t in zip(data, targets):
                # Adds a bias term to the input row by inserting -1 at the beginning of the row.
                row = np.insert(row, 0, -1)
                y_pred = self.predict(row)

                if y_pred != t:
                    error = t - y_pred
                    for r in range(len(self.weights)):
                        self.weights[r] = self.weights[r] + lrate * error * row[r]

                else: continue

        return self.weights
    

def tlu_pred(model, data, targets, epochs, lrate=0.2,):
    adj_w = model.train_tlu(data, targets, epochs, lrate)
    print(model)
    return adj_w


In [14]:
# Logical AND Data:
andData = np.array([[0,0],[0,1],[1,0],[1,1]])
andTargets = np.array([0,0,0,1])

# Logical OR Data:
orData = np.array([[0,0],[0,1],[1,0],[1,1]])
orTargets = np.array([0,1,1,1])

In [22]:
model = TLU(input_size=2)
tlu_pred(model, orData, orTargets, epochs=11, lrate=0.6)

<__main__.TLU object at 0x0000021D8FF4D790>


array([0.6, 0.6, 0.6])

### Using SkLearn

In [1]:

from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

In [14]:
iris = load_iris()

In [20]:
X = iris.data[:, (2,3)] # only the petal length and petal width
y = (iris.target == 0).astype(int)

In [21]:
percep = Perceptron()
percep.fit(X, y)

In [22]:
y_pred = percep.predict([[2, 0.5]])

## Multi Layer Perceptron

### Implementing MLP Backpropagation

In [23]:
import numpy as np
import matplotlib.pyplot as plt

In [24]:
class ActivationFunction:
    """Base class for activation functions"""
    def forward(self, x):
        raise NotImplementedError
    
    def backward(self, x):
        raise NotImplementedError

class ReLU(ActivationFunction):
    def forward(self, x):
        return np.maximum(0, x)
    
    def backward(self, x):
        return (x > 0).astype(float)

class Sigmoid(ActivationFunction):
    def forward(self, x):
        # Clip to prevent overflow
        x = np.clip(x, -500, 500)
        return 1 / (1 + np.exp(-x))
    
    def backward(self, x):
        s = self.forward(x)
        return s * (1 - s)

class Tanh(ActivationFunction):
    def forward(self, x):
        return np.tanh(x)
    
    def backward(self, x):
        return 1 - np.tanh(x) ** 2

class Linear(ActivationFunction):
    def forward(self, x):
        return x
    
    def backward(self, x):
        return np.ones_like(x)


In [None]:
class Layer:
    """A single layer in the neural network"""
    def __init__(self, input_size, output_size, activation):
        # Initialize weights with Xavier initialization
        # The core idea is to initialize weights from a normal distribution with a mean of 0 and a
        # standard deviation calculated based on the number of incoming (fan-in) and outgoing (fan-out) 
        # connections to a neuron. # 
        self.weights = np.random.randn(input_size, output_size) * np.sqrt(2.0 / input_size)
        self.biases = np.zeros((1, output_size))
        self.activation = activation
        
        # Store values for backprop
        self.last_input = None
        self.last_z = None  # Before activation
        self.last_a = None  # After  activation
        
    def forward(self, x):
        """Forward pass through this layer"""
        self.last_input = x.copy()
        
        # Linear transformation: z = xW + b
        self.last_z = np.dot(x, self.weights) + self.biases
        
        # Apply activation function: a = f(z)
        self.last_a = self.activation.forward(self.last_z)
        
        print(f"Layer forward:")
        print(f"  Input shape: {x.shape}")
        print(f"  Weights shape: {self.weights.shape}")
        print(f"  Z (before activation): {self.last_z.flatten()[:3]}... (first 3 values)")
        print(f"  A (after activation): {self.last_a.flatten()[:3]}... (first 3 values)")
        print()
        
        return self.last_a
    
    def backward(self, grad_output, learning_rate):
        """Backward pass through this layer"""
        print(f"Layer backward:")
        print(f"  Grad output shape: {grad_output.shape}")
        
        # Gradient w.r.t. pre-activation (z): dL/dz = dL/da * da/dz
        activation_grad = self.activation.backward(self.last_z)
        grad_z = grad_output * activation_grad
        
        print(f"  Activation gradient: {activation_grad.flatten()[:3]}... (first 3)")
        print(f"  Grad z: {grad_z.flatten()[:3]}... (first 3)")
        
        # Gradients w.r.t. weights and biases
        grad_weights = np.dot(self.last_input.T, grad_z)
        grad_biases = np.sum(grad_z, axis=0, keepdims=True)
        
        # Gradient w.r.t. input (for previous layer): dL/dx = dL/dz * dz/dx = dL/dz * W^T
        grad_input = np.dot(grad_z, self.weights.T)
        
        print(f"  Grad weights shape: {grad_weights.shape}")
        print(f"  Grad input: {grad_input.flatten()[:3]}... (first 3)")
        
        # Update weights and biases
        self.weights -= learning_rate * grad_weights
        self.biases -= learning_rate * grad_biases
        print()
        
        return grad_input


In [None]:

class MLP:
    """Multi-Layer Perceptron"""
    def __init__(self, layer_sizes, activations):
        self.layers = []
        
        for i in range(len(layer_sizes) - 1):
            input_size = layer_sizes[i]
            output_size = layer_sizes[i + 1]
            activation = activations[i]
            
            layer = Layer(input_size, output_size, activation)
            self.layers.append(layer)
            
        print(f"Created MLP with {len(self.layers)} layers:")
        for i, layer in enumerate(self.layers):
            print(f"  Layer {i}: {layer.weights.shape[0]} → {layer.weights.shape[1]} ({layer.activation.__class__.__name__})")
        print()
    
    def forward(self, x):
        """Forward pass through entire network"""
        print("=== FORWARD PASS ===")
        current = x
        
        for i, layer in enumerate(self.layers):
            print(f"Passing through layer {i}:")
            current = layer.forward(current)
            
        print(f"Final output: {current}")
        print()
        return current
    
    def backward(self, y_true, y_pred, learning_rate):
        """Backward pass through entire network"""
        print("=== BACKWARD PASS ===")
        
        # Start with loss gradient (for MSE loss)
        # L = 2/m * (y_pred - y_i)
        loss_grad = (2 / len(y_true)) * (y_pred - y_true) 
        print(f"Initial loss gradient: {loss_grad}")
        print()
        
        current_grad = loss_grad
        
        # Backpropagate through layers in reverse order
        for i in reversed(range(len(self.layers))):
            print(f"Backpropagating through layer {i}:")
            current_grad = self.layers[i].backward(current_grad, learning_rate)
    
    def train_step(self, x, y, learning_rate=0.01):
        """Single training step"""
        # Forward pass
        y_pred = self.forward(x)
        
        # Calculate loss (MSE)
        loss = np.mean((y_pred - y) ** 2)
        
        # Backward pass
        self.backward(y, y_pred, learning_rate)
        
        return loss, y_pred


In [None]:

# Example usage - let's solve a simple problem
print("Creating a simple dataset...")
# XOR-like problem: if x1 + x2 > 1, output 1, else 0
np.random.seed(42)
X = np.random.rand(4, 2)  # 4 samples, 2 features
y = ((X[:, 0] + X[:, 1]) > 1).astype(float).reshape(-1, 1)  # Binary output

print("Dataset:")
print(f"X:\n{X}")
print(f"y: {y.flatten()}")
print()

# Create a simple MLP: 2 → 3 → 1
# Number of neurons: Input layer (2) → Hidden layer (3, ReLU) → Output layer (1, Sigmoid)
mlp = MLP(layer_sizes=[2, 3, 1],activations=[ReLU(), Sigmoid()])

print("Training for a few steps...")
losses = []

for epoch in range(3):
    print(f"\n{'='*50}")
    print(f"EPOCH {epoch + 1}")
    print(f"{'='*50}")
    
    loss, y_pred = mlp.train_step(X, y, learning_rate=0.1)
    losses.append(loss)
    
    print(f"Loss: {loss:.4f}")
    print(f"Predictions: {y_pred.flatten()}")
    print(f"True values: {y.flatten()}")

# Show how activation functions affect the flow
print(f"\n{'='*50}")
print("ACTIVATION FUNCTION ANALYSIS")
print(f"{'='*50}")

# Test different inputs to see activation behavior
test_input = np.array([[-2, -1, 0, 1, 2]]).T

relu = ReLU()
sigmoid = Sigmoid()
tanh = Tanh()

print("Input values:", test_input.flatten())
print("ReLU output:", relu.forward(test_input).flatten())
print("ReLU gradient:", relu.backward(test_input).flatten())
print()
print("Sigmoid output:", sigmoid.forward(test_input).flatten())
print("Sigmoid gradient:", sigmoid.backward(test_input).flatten())
print()
print("Tanh output:", tanh.forward(test_input).flatten())
print("Tanh gradient:", tanh.backward(test_input).flatten())

print(f"\n{'='*50}")
print("KEY INSIGHTS")
print(f"{'='*50}")
print("1. Forward pass: Input → Linear transform (Wx + b) → Activation → Next layer")
print("2. Backward pass: Loss gradient flows backward through activation derivatives")
print("3. Weights update using gradient of loss w.r.t. weights")
print("4. Each layer stores intermediate values needed for backprop")
print("5. Activation functions introduce non-linearity and affect gradient flow")

Creating a simple dataset...
Dataset:
X:
[[0.37454012 0.95071431]
 [0.73199394 0.59865848]
 [0.15601864 0.15599452]
 [0.05808361 0.86617615]]
y: [1. 1. 0. 0.]

Created MLP with 2 layers:
  Layer 0: 2 → 3 (ReLU)
  Layer 1: 3 → 1 (Sigmoid)

Training for a few steps...

EPOCH 1
=== FORWARD PASS ===
Passing through layer 0:
Layer forward:
  Input shape: (4, 2)
  Weights shape: (2, 3)
  Z (before activation): [ 1.10729815 -0.15314274 -0.61861293]... (first 3 values)
  A (after activation): [1.10729815 0.         0.        ]... (first 3 values)

Passing through layer 1:
Layer forward:
  Input shape: (4, 3)
  Weights shape: (3, 1)
  Z (before activation): [ 0.21875934 -0.15162905 -0.00871825]... (first 3 values)
  A (after activation): [0.55447277 0.4621652  0.49782045]... (first 3 values)

Final output: [[0.55447277]
 [0.4621652 ]
 [0.49782045]
 [0.52771308]]

=== BACKWARD PASS ===
Initial loss gradient: [[-0.22276361]
 [-0.2689174 ]
 [ 0.24891023]
 [ 0.26385654]]

Backpropagating through la