# XOR without numpy

This program implements a simple neural network with one hidden layer that is used to learn and predict the XOR function. Let's break down the code step by step:

The script uses stochastic gradient descent to train the neural network, one row of data at a time, so there is no need for the matrix transposition that would be required for mini-batches. The loss function is the root mean square error.

This is the simplest script, implementation:

[![nn.png](https://i.postimg.cc/Hxf8jKqQ/nn.png)](https://postimg.cc/TLJ34kYw)

Here, a neural network is just a bunch of imprecisely written variables.

## Initialization

In [None]:
# We import random and mathematical modules.
import random
import math


# We set the variability for initializing the weights.
VARIANCE_W = 0.5

# We initialize the weights (w11, w21, w12, w22, w13, w23) and shifts (b1, b2, b3) for the neurons of the hidden layer.
w11 = random.uniform(-VARIANCE_W,VARIANCE_W)
w21 = random.uniform(-VARIANCE_W,VARIANCE_W)
b1 = 0

w12 = random.uniform(-VARIANCE_W,VARIANCE_W)
w22 = random.uniform(-VARIANCE_W,VARIANCE_W)
b2 = 0

w13 = random.uniform(-VARIANCE_W,VARIANCE_W)
w23 = random.uniform(-VARIANCE_W,VARIANCE_W)
b3 = 0

# Initialize weights (o1, o2, o3) and offset (ob) for the output neuron.
o1 = random.uniform(-VARIANCE_W,VARIANCE_W)
o2 = random.uniform(-VARIANCE_W,VARIANCE_W)
o3 = random.uniform(-VARIANCE_W,VARIANCE_W)
ob = 0

**Variability** (variativnist): is a general term that means the degree of spread or changeability of something. In the context of machine learning, it describes how much the model's predictions vary depending on the training data set.
**Sample Dependence** (zalezhnist vid vibirky): A more technical term that emphasizes that the variability arises because the model is too sensitive to the particular training data set it is given.


---


**Offset** (ob) for an output neuron in an artificial neural network is a constant that is added to the input signal of the neuron before the activation function is applied.

It plays an important role in the work of the neuron, performing the following functions:


1. Baseline output: A shift can be used to base a neuron's output, which can be useful for some types of problems, such as classification problems where one class may have a higher probability than others.
2. Improving learning performance: Bias can help improve the learning performance of a neuron by making it more robust to noise in the data and making it easier for it to learn complex features.
3. Shifting the Decision Boundary: Shifting can be used to shift the decision boundary of a neuron, which can be useful for some types of problems, such as tuning problems where the neuron's sensitivity to changes in input data needs to be tuned.

It is important to note that the offset value is usually determined during neural network training. This can be done using optimization algorithms such as gradient descent that automatically adjust the offset value to minimize the neuron's error.

In addition to the above functions, the shift can also be used for other purposes, depending on the specific task and neural network architecture.

## Activation function

In [None]:
# This is the activation function used to introduce nonlinearity.
def sigmoid(x):
    return 1.0 / (1.0 + math.exp(-x))

# The derivative of the sigmoid function used for backpropagation.
def sigmoid_prime(x): # X is already sigmoid.
    return x * (1 - x)

## Prediction function

In [None]:
# The prediction function calculates the output of the network from the inputs i1 and i2.
def predict(i1,i2):
    # It calculates the activations s1, s2 and s3 for hidden layer neurons using a sigmoid function.
    s1 = w11 * i1 + w21 * i2 + b1
    s1 = sigmoid(s1)
    s2 = w12 * i1 + w22 * i2 + b2
    s2 = sigmoid(s2)
    s3 = w13 * i1 + w23 * i2 + b3
    s3 = sigmoid(s3)

    # It calculates the final result by combining the hidden layer activations with the output weights and applying a sigmoid function.
    output = s1 * o1 + s2 * o2 + s3 * o3 + ob
    output = sigmoid(output)

    return output

## Learning function

In [None]:
# The learning function updates the weights and biases based on the error between the predicted output and the target output.
def learn(i1,i2,target, alpha=0.2):
    # It performs forward propagation to calculate activations.
    global w11,w21,b1,w12,w22,b2,w13,w23,b3
    global o1,o2,o3,ob

    s1 = w11 * i1 + w21 * i2 + b1
    s1 = sigmoid(s1)
    s2 = w12 * i1 + w22 * i2 + b2
    s2 = sigmoid(s2)
    s3 = w13 * i1 + w23 * i2 + b3
    s3 = sigmoid(s3)

    output = s1 * o1 + s2 * o2 + s3 * o3 + ob
    output = sigmoid(output)

    # It calculates the error and its derivative.
    error = target - output
    derror = error * sigmoid_prime(output)

    ds1 = derror * o1 * sigmoid_prime(s1)
    ds2 = derror * o2 * sigmoid_prime(s2)
    ds3 = derror * o3 * sigmoid_prime(s3)

    # It performs backpropagation to update the weights and biases using the alpha learning rate.
    o1 += alpha * s1 * derror
    o2 += alpha * s2 * derror
    o3 += alpha * s3 * derror
    ob += alpha * derror

    w11 += alpha * i1 * ds1
    w21 += alpha * i2 * ds1
    b1 += alpha * ds1
    w12 += alpha * i1 * ds2
    w22 += alpha * i2 * ds2
    b2 += alpha * ds2
    w13 += alpha * i1 * ds3
    w23 += alpha * i2 * ds3
    b3 += alpha * ds3

## Training and testing

In [None]:
# The INPUTS and OUTPUTS lists represent input-output XOR pairs.
INPUTS = [
        [0,0],
        [0,1],
        [1,0],
        [1,1]
    ]

OUTPUTS = [
        [0],
        [1],
        [1],
        [0]
    ]

# The training cycle lasts 10,000 epochs
for epoch in range(1,10001):
    indexes = [0,1,2,3]
    random.shuffle(indexes)

    # At each epoch, the input indices are shuffled and the network is trained on each input.
    for j in indexes:
        learn(INPUTS[j][0],INPUTS[j][1],OUTPUTS[j][0], alpha=0.2)

    # The root mean square error is output every 1000 epochs.
    if epoch%1000 == 0:
        cost = 0
        for j in range(4):
            o = predict(INPUTS[j][0],INPUTS[j][1])
            cost += (OUTPUTS[j][0] - o) ** 2
        cost /= 4
        print("epoch", epoch, "mean squared error:", cost)

# result output.
for i in range(4):
    result = predict(INPUTS[i][0],INPUTS[i][1])
    print("for input", INPUTS[i], "expected", OUTPUTS[i][0], "predicted", f"{result:4.4}", "which is", "correct" if round(result)==OUTPUTS[i][0] else "incorrect")

epoch 1000 mean squared error: 0.0008236728994545758
epoch 2000 mean squared error: 0.000710268262431356
epoch 3000 mean squared error: 0.0006234091972789838
epoch 4000 mean squared error: 0.0005548692292482299
epoch 5000 mean squared error: 0.0004994785500849843
epoch 6000 mean squared error: 0.00045383151134952026
epoch 7000 mean squared error: 0.00041559674737267835
epoch 8000 mean squared error: 0.00038312648995585157
epoch 9000 mean squared error: 0.00035522426269496725
epoch 10000 mean squared error: 0.0003310009868735209
for input [0, 0] expected 0 predicted 0.01849 which is correct
for input [0, 1] expected 1 predicted 0.9819 which is correct
for input [1, 0] expected 1 predicted 0.9819 which is correct
for input [1, 1] expected 0 predicted 0.0181 which is correct


**Sources**

1. Building an RNN prediction:
2. Сhatgpt: https://chatgpt.com/share/a80ddd78-8384-43be-8af4-20c2187b6df1
3. Neural Network Training Process: https://www.codingame.com/playgrounds/59631/neural-network-xor-example-from-scratch-no-libs
4. Robotdreams: https://robotdreams.cc/uk/blog/327-funkciji-aktivaciji-stupinchasta-liniyna-sigmojida-relu-ta-tanh

