# Neural Networks

Neural networks can solve a wide variety of problems like handwriting recognition and face detection

<br>

However, most neural networks are "black boxes" inspecting their details doesn't give you much understanding of how they are solving a problem

## Perceptrons

It is the simplest neural network is the perceptron, which approximates a single neuron with (n) binary inputs. 

It computes a weighted sum of its inputs and "fires" if that weighted sum is (0) or greater

In [3]:
from typing import List

Vector = List[float]

def dot(v:Vector, w:Vector)-> Vector:
    assert len(v) == len(w), "different sizes"
    return sum(v_i*w_i for v_i, w_i in zip(v,w))

def step_function(x:float)-> float:
    return 1.0 if x>=0 else 0.0

def perceptron_output(weights:Vector,bias:float,x:Vector)->float:
    """ Returns 1 if the perceptron 'fires', 0 if not """
    calculation = dot(weights,x)+bias
    return step_function(calculation)

Here bias acts as an intercept. Without bias decison boundary is forced through origin but with bias decision bias can move freely. Bias is like giving the neuron a starting opinion.

with bias the equation is:
`z = w1​x1 ​+ w2​x2​+...+wn​xn​ + b`     

- wi = weights
- xi = inputs
- b = bias

Relate it with line equation:
`y = mx + c`

Step function 

- if z >= 0 → output = 1  → neuron FIRES
- if z < 0  → output = 0  → neuron does NOT fire

So firing simply means

The neuron detected something important and activated.

The perception is simply distinguishing between the half-spaces separated by the hyperlane of points (x) for which:

`dot(weights,x) + bias = 0`

This equation defines the boundary line/plane.

---

**Note:**

A single perceptron can implement AND, OR, NAND, NOR because they are linearly separable. XOR is not linearly separable, so it requires a multi-layer neural network.

---

## Feed-Forward Neural Networks

feed-forward neural network consists of discrete layers of neuron, each connected to the next. This typically entails an

- Input layer(which receives inputs and feeds then forward unchanged)
- One or more **Hidden layers** (each of which consists of neurons that take the outputs of the previous layers, performs some calculation and passes the result to the next layer). Hidden layers are also know as ***Non-input Layers***
- Output layer (which produces the final outputs)


In this we will use **sigmoid function** rather than ***step function*** as sigmoid function provides smoothness and we need smoothness in order to use calculus

In [12]:
import math

def sigmoid(t:float)-> float:
    return 1/(1+math.exp(-t))

Calculate the output

In [9]:
def neuron_output(weights: Vector, inputs:Vector)->float:
    # weights includes the bias term, inputs include a 1
    return sigmoid(dot(weights,inputs))

In [7]:
from typing import List

def feed_forward(neural_network:List[List[Vector]],input_vector:Vector)-> List[Vector]:
    """ Feeds the input vector through the neural network
     returns the outputs of all layers (not just the last one) """
    
    outputs: List[Vector] = []

    for layer in neural_network:
        input_with_bias = input_vector + [1]

        output = [neuron_output(neuron,input_with_bias)for neuron in layer]
        outputs.append(output)

        # Then the input to the next layer is the output of this one 
        input_vector = output

    return outputs

## Backpropagation

Backpropagation is a training algorithm used in neural networks to minimize error by updating weights.


It works in two phases:

-  Forward Pass:
Input goes through the network → prediction is made.

-  Backward Pass:
The error (loss) is propagated backward through the network to compute gradients using the chain rule, and weights are updated using gradient descent.

<br>

`Gradient = Delta * Input`

- Delta: Error signal of a neuron

- Gradient: How much the weight should change

In [5]:
def sqerror_gradients(network: List[List[Vector]],input_vector:Vector,target_vector:Vector)->List[List[Vector]]:
    """ 
    Given a neural network, an input vector and a target vector
    make a prediction and compute the gradient of the squared error loss with respect to the neuron weights 
    """

    # Forward pass
    hidden_outputs, outputs = feed_forward(network, input_vector)

    # Gradients with respect to output neuron pre-activation outputs
    output_deltas = [output * (1-output) * (output-target)
                     for output,target in zip(outputs, target_vector)]
    
    # Gradient with respect to output neuron weights
    output_grads = [[output_deltas[i] * hidden_output
                     for hidden_output in hidden_outputs + [1]]
                     for i, output_neuron in enumerate(network[-1])]
    
    # Gradients with respect to hidden neuron pre-activation outputs

    hidden_deltas = [hidden_output * (1-hidden_output) * dot(output_deltas,[n[i] for n in network[-1]])
                    for i, hidden_output in enumerate(hidden_outputs)]
    
    hidden_grads = [[hidden_deltas[i] * input for input in input_vector + [1]]
                    for i , hidden_neuron in enumerate(network[0])]
    return [hidden_grads, output_grads] 

We'll start by generating the training data and initializing our neural network with random weights:

In [13]:
import random
random.seed(0)

# training data

xs = [[0., 0], [0., 1], [1., 0], [1., 1]]
ys = [[0.], [1.], [1.], [0.]]

# Start with random weights
network = [ # hidden layer: 2 inputs -> 2 outputs
    [[random.random() for _ in range(2+1)],         # 1st hidden neuron
     [random.random() for _ in range(2+1)]],        # 2nd hidden neuron 
     #output layer : 2 inputs -> 1 output
     [[random.random() for _ in range(2+1)]]       # 1st output neuron   
]


def add(v:Vector, w:Vector)-> Vector:
    return[v_i+w_i for v_i,w_i in zip(v,w)]

def scalar_multiplication(c:float, v:Vector)-> Vector:
    return[c*x for x in v]

def gradient_step(v:Vector, gradient:Vector, step_size: float) -> Vector:
    """ Moves 'step size' in the `gradient` direction from `v` """
    assert len(v) == len(gradient)
    step = scalar_multiplication(step_size, gradient)
    return add(v,step)

import tqdm

learning_rate = 1.0

for epoch in tqdm.trange(20000, desc="neural net for xor"):
    for x,y in zip(xs,ys):
        gradients = sqerror_gradients(network,x,y)

        # Take a gradient step for each neuron in each layer
        network = [[gradient_step(neuron,grad,-learning_rate)
                    for neuron,grad in zip(layer,layer_grad)]
                    for layer, layer_grad in zip(network, gradients)]
        

# check that it learned XOR
assert feed_forward(network, [0, 0])[-1][0] < 0.01
assert feed_forward(network, [0, 1])[-1][0] > 0.99
assert feed_forward(network, [1, 0])[-1][0] > 0.99
assert feed_forward(network, [1, 1])[-1][0] < 0.01


# Resulting network has weights that look like:

[   # hidden layer 
    [[7, 7, -3],     # computes OR 
     [5, 5, -8]],    # computes AND 
    # output layer 
    [[11, -12, -5]]  # computes "first but not second"
]

neural net for xor: 100%|██████████| 20000/20000 [00:00<00:00, 26014.38it/s]


[[[7, 7, -3], [5, 5, -8]], [[11, -12, -5]]]

## Example: FizzBuzz

In [15]:
def fizz_buzz_encode(x: int) -> Vector: 
    if x % 15 == 0: 
        return [0, 0, 0, 1] 
    elif x % 5 == 0: 
        return [0, 0, 1, 0] 
    elif x % 3 == 0: 
        return [0, 1, 0, 0] 
    else: 
        return [1, 0, 0, 0] 
 
assert fizz_buzz_encode(2) == [1, 0, 0, 0]
assert fizz_buzz_encode(6) == [0, 1, 0, 0]
assert fizz_buzz_encode(10) == [0, 0, 1, 0]
assert fizz_buzz_encode(30) == [0, 0, 0, 1]


def binary_encode(x:int) -> Vector:
    binary: List[float] = []

    for i in range(10):
        binary.append(x % 2)
        x = x // 2
    return binary
    
                             
#                             1  2  4  8 16 32 64 128 256 512
assert binary_encode(0)   == [0, 0, 0, 0, 0, 0, 0, 0,  0,  0]
assert binary_encode(1)   == [1, 0, 0, 0, 0, 0, 0, 0,  0,  0]
assert binary_encode(10)  == [0, 1, 0, 1, 0, 0, 0, 0,  0,  0]
assert binary_encode(101) == [1, 0, 1, 0, 0, 1, 1, 0,  0,  0]
assert binary_encode(999) == [1, 1, 1, 0, 0, 1, 1, 1,  1,  1]


xs = [binary_encode(n) for n in range(101,1024)]
ys = [fizz_buzz_encode(n) for n in range(101,1024)]


# We will give it 25 hidden units

NUM_HIDDEN = 25

network = [
    # hidden layer: 10 inputs -> NUM_HIDDEN outputs
    [[random.random() for _ in range(10+1)] for _ in range(NUM_HIDDEN)],

    # output_layer: NUM_HIDDEN inputs -> 4 outputs
    [[random.random() for _ in range(NUM_HIDDEN + 1)] for _ in range(4)]
]


def sum_of_squares(v:Vector)->float:
    return dot(v,v)

def squared_distance(v:Vector, w:Vector) -> float:
    return sum_of_squares(subtract(v,w))

def subtract(v:Vector,w:Vector):
    assert len(v) == len(w)
    return[v_i-w_i for v_i, w_i in zip(v,w)]

learning_rate = 1.0 
 
with tqdm.trange(500) as t: 
    for epoch in t: 
        epoch_loss = 0.0 
 
        for x, y in zip(xs, ys): 
            predicted = feed_forward(network, x)[-1] 
            epoch_loss += squared_distance(predicted, y) 
            gradients = sqerror_gradients(network, x, y) 
 
            # Take a gradient step for each neuron in each layer 
            network = [[gradient_step(neuron, grad, -learning_rate) 
                        for neuron, grad in zip(layer, layer_grad)] 
                    for layer, layer_grad in zip(network, gradients)] 
 
        t.set_description(f"fizz buzz (loss: {epoch_loss:.2f})")

fizz buzz (loss: 29.44): 100%|██████████| 500/500 [01:31<00:00,  5.47it/s] 


In [16]:
def argmax(xs: list) -> int: 
    """Returns the index of the largest value""" 
    return max(range(len(xs)), key=lambda i: xs[i])
assert argmax([0, -1]) == 0               
assert argmax([-1, 0]) == 1               
assert argmax([-1, 10, 5, 20, -3]) == 3   

Solve FizzBuzz

In [17]:
num_correct = 0 
for n in range(1, 101): 
 x = binary_encode(n) 
predicted = argmax(feed_forward(network, x)[-1]) 
actual = argmax(fizz_buzz_encode(n)) 
labels = [str(n), "fizz", "buzz", "fizzbuzz"] 
print(n, labels[predicted], labels[actual]) 
if predicted == actual: 
 num_correct += 1 
print(num_correct, "/", 100)

100 buzz buzz
1 / 100
