In [188]:
import random
import math

#### The Perceptron

In [256]:
class Perceptron:
    def __init__(self, input_size: int, learning_rate: float, activation_function:dict):
        self.input_size    = input_size
        self.learning_rate = learning_rate
        
        self.weights    = [random.uniform(-1, 1) for _ in range(input_size)]
        self.bias       = random.uniform(-1, 1)
        self.activation = activation_function
    
    def train(self, inputs, targets, epochs,verbose=False):    
        if len(inputs) != len(targets):
            raise ValueError('Inputs and targets must be of the same length')
        
        for _ in range(epochs):
            for input_vector, target in zip(inputs, targets):
                output              = self.forward_pass(input_vector)
                weighted_sum        = output['weighted_sum']
                activated_output    = output['activation']
                activation_gradient = self.activation['activation_diff'](weighted_sum)          # dz/da
                loss                = (activated_output - target[0]) ** 2

                if verbose:
                    print(f"Loss: {loss:.7f}, Weights: {self.weights}")

                # Update weights
                for i in range(self.input_size):
                    self.weights[i] -= self.learning_rate * (input_vector[i] * activation_gradient * 2 * (activated_output - target[0]))
                
                # Update bias
                self.bias -= self.learning_rate * (activation_gradient  * 2 * (activated_output - target[0]))
            
        print(f"Final loss: {loss:.7f}, Weights: {self.weights}")
    
    def forward_pass(self, input_vector):
        if len(input_vector) != self.input_size:
            raise ValueError('Input vector size does not match expected input size')
        
        weighted_sum = sum(x * w for x, w in zip(input_vector, self.weights)) + self.bias
        activated_output = self.activation['activation'](weighted_sum)
        return {'weighted_sum': weighted_sum, 'activation': activated_output}

#### Activation Functions along with their derivative versions

In [257]:
# Sigmoid Activation and its derivative

def sigmoid(x): return 1 / (1 + math.exp(-x))
def sigmoid_diff(x): return sigmoid(x)*(1-sigmoid(x))

# Linear Activation and its derivative

def linear(x): return x
def linear_diff(x): return 1

# ReLU Activation and its derivative

def relu (x): return max(0,x)
def relu_diff(x): return 0 if x<=0 else 1

#### Example 1 : Linear Function 2x+10

In [321]:
# Data

X = [[0],[1],[2],[3],[4],[5],[6],[7]]
Y = [[10],[12],[14],[16],[18],[20],[22],[24]]

In [322]:
# Initialize

neuron = Perceptron(1,0.05,{'activation':linear,'activation_diff':linear_diff})

In [323]:
neuron.forward_pass([10])['activation']  # Wrong Answer :(

8.033793675264269

In [324]:
# Train for 200 epochs

neuron.train(X,Y,200)

Final loss: 0.0000000, Weights: [2.0000562926222174]


In [326]:
neuron.forward_pass([20])['activation']       # Correct Answer :)

50.00102008957644

#### Example 2 : Function 2a+5b-5

In [327]:
# Data

X = [[0,0],[0,1],[1,1],[2,2],[5,5],[10,10],[1,50],[20,1],[3,4],[9,1],[3,8],[4,0]]
Y = [[-5],[0],[2],[9],[30],[65],[247],[40],[21],[18],[41],[3]]

In [328]:
# Initialize

neuron = Perceptron(2,0.01,{'activation':linear,'activation_diff':linear_diff})

In [329]:
neuron.forward_pass([0,0])['activation']  # Wrong Answer :(

0.029952426216928174

In [333]:
# Train for 5000 epochs

neuron.train(X,Y,5000,True)

Loss: 123920082453127336591816409875177158956311880733623012984641026425608596112338275847948865917221951286978394353690519024407990290611580637834295902755372007944764088137002768215317700630268992784030826880960757314615542890302845473202118218857352639745719601807066742488675896640553225316408601618895863808.0000000, Weights: [-4.262681643420455e+153, 4.196719318988435e+153]
Loss: 14835877394943382384181384432340824409063052867158561137945380342957930509984125726244430369398548491790146431015378322515493513309421532443839299676925461120869160693611303240428376715191851305849162678442315294677990612873515735699281055203477102135186616253006378541055692243081816626166460074825397305344.0000000, Weights: [-4.262681643420455e+153, 4.196719318988435e+153]
Loss: 31924103320602191815451782043571648428737711685110788696065616457495176470496192319909883988057145682284623544804148290773005938826737306001661839387756489547791320443581451968623452350456760316032957724787489692701520815529739103

OverflowError: (34, 'Numerical result out of range')

What is seen above is a phenomenon known as **exploding gradients**. Notice how the weights kept "exploding" until we could no longer hold that number in mem0ry. An opposite scenario to this is **vanishing gradients** were the gradients end up being infinitely small. Check out [this](https://youtu.be/2f_45VzKEfE?si=UjNDSgIEBSgOBzhs) video that shows why this happens.

In [335]:
neuron = Perceptron(2,0.0002,{'activation':linear,'activation_diff':linear_diff})

neuron.train(X,Y,6000)

Final loss: 0.0000000, Weights: [1.999999175795818, 4.999999784037885]


In [336]:
neuron.forward_pass([100,100])['activation']        # Almost Correct Answer :)

694.9999061451219

#### Example 3: XOR

In [508]:
X = [[0,0],[0,1],[1,0],[1,1]]
Y = [[0],[1],[1],[0]]

In [509]:
neuron = Perceptron(2,0.0002,{'activation':sigmoid,'activation_diff':sigmoid})

neuron.train(X,Y,1200)

Final loss: 0.6564842, Weights: [0.530207017370598, 0.5690668017135673]


In [510]:
neuron.forward_pass([1,1])['activation'] # Fail :(

0.8101159779991794

One notable observation is how the perceptron completely fails when attempting to predict a non-linear function, like XOR. Some articles mention that a single perceptron cannot predict XOR because it can only represent linear boundaries. But then you might ask, what is the purpose of the activation function? Wasn't it supposed to introduce non-linearity?

The truth is that it depends; different activation functions introduce non-linearity to only a certain extent or are limited to specific hyperplanes. [This](https://medium.com/@lucaspereira0612/solving-xor-with-a-single-perceptron-34539f395182) article by Lucas Araújo by Lucas Araújo shows how a parameterized sigmoid function can solve the XOR problem using a single perceptron.

Also we have these activation functions introduced recently 
- https://arxiv.org/pdf/2108.12943
- https://arxiv.org/pdf/2409.10821


But the main takeaway is that as we add more layers (or more deep) the neural-network is able to learning features effectively, with initial layers learning low-level features and later layers of neurons handling higher features in data.