**Task 3**

The task was to solve the XOR problem using a NN built from scratch.

In order to evaluate how NN's work and solve the XOR problem, NN's with two inputs, two hidden nodes and an output node with bias nodes (initiated with value of 1) being connected to the non-input nodes, were built. Each NN was trained on 1000 iterations using the Gradient Descent algorithm. Learning rates of 0.001, 0.01, 0.1 and 1 were chosen.

The weights for non-bias nodes were initiated manually using only 1's or 0's, 0's and 1's, only 0.5's or with increasing numbers between 0 and 1.

Using sigmoid, tanh or relu activation functions for hidden and output nodes: None of the manually chosen weights solved the XOR problem sufficiently. Learning rates of 0.1 or 1 reduced MSE most for sigmoid function NN's, using tanh activation functions, MSE's may be smaller for 0.01, 0.1 or 1 learning rates, depending on the inital weights. For relu, learning rate of 1 performed worst regarding MSE.

Using the lazy approach of randomly initiating weights between 0 and 1 for non-bias nodes in NN's with sigmoid activation functions, 15% out of 20 runs contained weights which solved the XOR problem. Tanh and relu functions both did not find optimal solutions to the XOR problem.






In [26]:
import random
random.seed(3311791)

In [27]:
# Sigmoid activation function
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def xor_net(inputs, weights):
    if len(inputs) != 2 or len(weights) != 9:
        raise ValueError("inputs length has to be 2 and weights length has to be 9.")

    input1, input2 = inputs
    w_input_hidden1 = weights[:2]
    w_input_hidden2 = weights[2:4]
    w_hidden_output1 = weights[4:6]
    w_bias_hidden1 = weights[6:8]
    w_bias_output = weights[8]

    hidden1 = sigmoid(input1 * w_input_hidden1[0] + input2 * w_input_hidden1[1] + 1 * w_bias_hidden1[0])
    hidden2 = sigmoid(input1 * w_input_hidden2[0] + input2 * w_input_hidden2[1] + 1 * w_bias_hidden1[1])
    output = sigmoid(hidden1 * w_hidden_output1[0] + hidden2 * w_hidden_output1[1] + 1 * w_bias_output)
    return output

def mse(weights):
    inputs = [(0, 0), (0, 1), (1, 0), (1, 1)]
    targets = [0, 1, 1, 0]
    mse_sum = 0
    network_outputs = []

    for i in range(4):
        output = xor_net(inputs[i], weights)
        network_outputs.append(output)
        mse_sum += (output - targets[i]) ** 2
    return mse_sum / 4, network_outputs

def grdmse(weights, learning_rate):
    grad = np.zeros_like(weights)

    for i in range(len(weights)):
        weights_plus = np.copy(weights)
        weights_minus = np.copy(weights)

        weights_plus[i] += learning_rate
        weights_minus[i] -= learning_rate

        grad[i] = (mse(weights_plus)[0] - mse(weights_minus)[0]) / (2 * learning_rate)
    return grad

def gradient_descent(weights, learning_rate, num_iterations):
    for _ in range(num_iterations):
        grad = grdmse(weights, learning_rate)
        weights = weights - learning_rate * grad
    return weights

def train(weights):
    for i in range(len(learning_rates)):
        trained_weights = gradient_descent(weights, learning_rates[i], num_iterations)

        for input_pair in inputs:
            mse_val, outputs = mse(trained_weights)
            binary_outputs = [1 if output > 0.5 else 0 for output in outputs]
        print(f"Weights initilization: {weights}, learning rate: {learning_rates[i]} output: {binary_outputs}, MSE: {mse_val:.4f}")

    return binary_outputs

learning_rates = [0.001, 0.01, 0.1, 1]
num_iterations = 1000
inputs = [(0, 0), (0, 1), (1, 0), (1, 1)]

print("Sigmoid activation function")
train([1] * 9)
train(np.concatenate([[0] * 6, [1] * 3]))
train([0, 1, 0, 1, 0, 1, 1, 1, 1])
train(np.concatenate([[0.5] * 6, [1] * 3]))
train([.25, .33, .44, .55, .65, .70, 1, 1, 1])




Sigmoid activation function
Weights initilization: [1, 1, 1, 1, 1, 1, 1, 1, 1], learning rate: 0.001 output: [1, 1, 1, 1], MSE: 0.4258
Weights initilization: [1, 1, 1, 1, 1, 1, 1, 1, 1], learning rate: 0.01 output: [1, 1, 1, 1], MSE: 0.2746
Weights initilization: [1, 1, 1, 1, 1, 1, 1, 1, 1], learning rate: 0.1 output: [0, 1, 1, 1], MSE: 0.2473
Weights initilization: [1, 1, 1, 1, 1, 1, 1, 1, 1], learning rate: 1 output: [1, 1, 1, 1], MSE: 0.4388
Weights initilization: [0 0 0 0 0 0 1 1 1], learning rate: 0.001 output: [1, 1, 1, 1], MSE: 0.2858
Weights initilization: [0 0 0 0 0 0 1 1 1], learning rate: 0.01 output: [1, 1, 1, 1], MSE: 0.2505
Weights initilization: [0 0 0 0 0 0 1 1 1], learning rate: 0.1 output: [1, 1, 1, 0], MSE: 0.2500
Weights initilization: [0 0 0 0 0 0 1 1 1], learning rate: 1 output: [1, 1, 1, 1], MSE: 0.3034
Weights initilization: [0, 1, 0, 1, 0, 1, 1, 1, 1], learning rate: 0.001 output: [1, 1, 1, 1], MSE: 0.3524
Weights initilization: [0, 1, 0, 1, 0, 1, 1, 1, 1], lea

[0, 1, 1, 1]

In [25]:
def lazy(num_trials):
    num_runs = 0
    for trial in range(num_trials):
        for i in range(len(learning_rates)):
            weights = np.concatenate([np.random.rand(6), [1] * 3])
            trained_weights = gradient_descent(weights, learning_rates[i], num_iterations)
            mse_val, outputs = mse(trained_weights)
            binary_outputs = [1 if output > 0.5 else 0 for output in outputs]
            num_runs = num_runs + 1
            if binary_outputs == [0, 1, 1, 0]:
                successful_weights = [trained_weights]
                return successful_weights, num_runs
    return None, num_runs

def lazy_repeat(num_trials):
    c = 0
    num_success = 0
    for trial in range(num_trials):
        successful_weights, num_runs = lazy(1)
        c = c + num_runs
        if successful_weights:
            num_success = num_success + 1
    #num_trained = num_trials *
    prop = num_success / c
    print(f"Number of trained NNs: {c}, Solutions to XOR: {num_success}, Proportion: {prop}")
    return prop

xor_correct_weight, _ = lazy(5)
lazy_repeat(5)


Number of trained NNs: 20, Solutions to XOR: 3, Proportion: 0.15


0.15

In [28]:
# Tanh activation function
def tanh(x):
    return (np.tanh(x) + 1) /2 # scale tanh's [-1, 1] to [0, 1]

def xor_net(inputs, weights):
    if len(inputs) != 2 or len(weights) != 9:
        raise ValueError("inputs length has to be 2 and weights length has to be 9.")

    input1, input2 = inputs
    w_input_hidden1 = weights[:2]
    w_input_hidden2 = weights[2:4]
    w_hidden_output1 = weights[4:6]
    w_bias_hidden1 = weights[6:8]
    w_bias_output = weights[8]

    hidden1 = tanh(input1 * w_input_hidden1[0] + input2 * w_input_hidden1[1] + 1 * w_bias_hidden1[0])
    hidden2 = tanh(input1 * w_input_hidden2[0] + input2 * w_input_hidden2[1] + 1 * w_bias_hidden1[1])
    output = tanh(hidden1 * w_hidden_output1[0] + hidden2 * w_hidden_output1[1] + 1 * w_bias_output)
    return output

def mse(weights):
    inputs = [(0, 0), (0, 1), (1, 0), (1, 1)]
    targets = [0, 1, 1, 0]
    mse_sum = 0
    network_outputs = []

    for i in range(4):
        output = xor_net(inputs[i], weights)
        network_outputs.append(output)
        mse_sum += (output - targets[i]) ** 2
    return mse_sum / 4, network_outputs

def grdmse(weights, learning_rate):
    grad = np.zeros_like(weights)

    for i in range(len(weights)):
        weights_plus = np.copy(weights)
        weights_minus = np.copy(weights)

        weights_plus[i] += learning_rate
        weights_minus[i] -= learning_rate

        grad[i] = (mse(weights_plus)[0] - mse(weights_minus)[0]) / (2 * learning_rate)
    return grad

def gradient_descent(weights, learning_rate, num_iterations):
    for _ in range(num_iterations):
        grad = grdmse(weights, learning_rate)
        weights = weights - learning_rate * grad
    return weights

def train(weights):
    for i in range(len(learning_rates)):
        trained_weights = gradient_descent(weights, learning_rates[i], num_iterations)

        for input_pair in inputs:
            mse_val, outputs = mse(trained_weights)
            binary_outputs = [1 if output > 0.5 else 0 for output in outputs]
        print(f"Weights initilization: {weights}, learning rate: {learning_rates[i]} output: {binary_outputs}, MSE: {mse_val:.4f}")

    weights = np.concatenate((np.random.rand(7), np.array([1] * 2)))
print("Tanh activation function")
train([1] * 9)
train(np.concatenate([[0] * 7, [1] * 2]))
train([0, 1, 0, 1, 0, 1, 0, 1, 1])
train(np.concatenate([[0.5] * 7, [1] * 2]))

def lazy(num_trials):
    num_runs = 0
    for trial in range(num_trials):
        for i in range(len(learning_rates)):
            weights = np.concatenate([np.random.rand(6), [1] * 3])
            trained_weights = gradient_descent(weights, learning_rates[i], num_iterations)
            mse_val, outputs = mse(trained_weights)
            binary_outputs = [1 if output > 0.5 else 0 for output in outputs]
            num_runs = num_runs + 1
            if binary_outputs == [0, 1, 1, 0]:
                successful_weights = [trained_weights]
                return successful_weights, num_runs
    return None, num_runs

def lazy_repeat(num_trials):
    c = 0
    num_success = 0
    for trial in range(num_trials):
        successful_weights, num_runs = lazy(1)
        c = c + num_runs
        if successful_weights:
            num_success = num_success + 1
    #num_trained = num_trials *
    prop = num_success / c
    print(f"Number of trained NNs: {c}, Solutions to XOR: {num_success}, Proportion: {prop}")
    return prop

xor_correct_weight, _ = lazy(5)
lazy_repeat(5)


Tanh activation function
Weights initilization: [1, 1, 1, 1, 1, 1, 1, 1, 1], learning rate: 0.001 output: [1, 1, 1, 1], MSE: 0.4965
Weights initilization: [1, 1, 1, 1, 1, 1, 1, 1, 1], learning rate: 0.01 output: [1, 1, 1, 1], MSE: 0.4950
Weights initilization: [1, 1, 1, 1, 1, 1, 1, 1, 1], learning rate: 0.1 output: [0, 1, 1, 1], MSE: 0.2358
Weights initilization: [1, 1, 1, 1, 1, 1, 1, 1, 1], learning rate: 1 output: [1, 1, 1, 1], MSE: 0.4968
Weights initilization: [0 0 0 0 0 0 0 1 1], learning rate: 0.001 output: [1, 1, 1, 1], MSE: 0.3127
Weights initilization: [0 0 0 0 0 0 0 1 1], learning rate: 0.01 output: [1, 1, 1, 0], MSE: 0.2500
Weights initilization: [0 0 0 0 0 0 0 1 1], learning rate: 0.1 output: [1, 1, 1, 0], MSE: 0.2500
Weights initilization: [0 0 0 0 0 0 0 1 1], learning rate: 1 output: [1, 1, 1, 1], MSE: 0.3950
Weights initilization: [0, 1, 0, 1, 0, 1, 0, 1, 1], learning rate: 0.001 output: [1, 1, 1, 1], MSE: 0.4701
Weights initilization: [0, 1, 0, 1, 0, 1, 0, 1, 1], learni

0.0

In [29]:
# Relu activation function

def relu(x):
    return np.maximum(0, x)

def xor_net(inputs, weights):
    if len(inputs) != 2 or len(weights) != 9:
        raise ValueError("inputs length has to be 2 and weights length has to be 9.")

    input1, input2 = inputs
    w_input_hidden1 = weights[:2]
    w_input_hidden2 = weights[2:4]
    w_hidden_output1 = weights[4:6]
    w_bias_hidden1 = weights[6:8]
    w_bias_output = weights[8]

    hidden1 = relu(input1 * w_input_hidden1[0] + input2 * w_input_hidden1[1] + 1 * w_bias_hidden1[0])
    hidden2 = relu(input1 * w_input_hidden2[0] + input2 * w_input_hidden2[1] + 1 * w_bias_hidden1[1])
    output = relu(hidden1 * w_hidden_output1[0] + hidden2 * w_hidden_output1[1] + 1 * w_bias_output)
    return output

def mse(weights):
    inputs = [(0, 0), (0, 1), (1, 0), (1, 1)]
    targets = [0, 1, 1, 0]
    mse_sum = 0
    network_outputs = []

    for i in range(4):
        output = xor_net(inputs[i], weights)
        network_outputs.append(output)
        mse_sum += (output - targets[i]) ** 2
    return mse_sum / 4, network_outputs

def grdmse(weights, learning_rate):
    grad = np.zeros_like(weights)

    for i in range(len(weights)):
        weights_plus = np.copy(weights)
        weights_minus = np.copy(weights)

        weights_plus[i] += learning_rate
        weights_minus[i] -= learning_rate

        grad[i] = (mse(weights_plus)[0] - mse(weights_minus)[0]) / (2 * learning_rate)
    return grad

def gradient_descent(weights, learning_rate, num_iterations):
    for _ in range(num_iterations):
        grad = grdmse(weights, learning_rate)
        weights = weights - learning_rate * grad
    return weights

def train(weights):
    for i in range(len(learning_rates)):
        trained_weights = gradient_descent(weights, learning_rates[i], num_iterations)

        for input_pair in inputs:
            mse_val, outputs = mse(trained_weights)
            binary_outputs = [1 if output > 0.5 else 0 for output in outputs]
        print(f"Weights initilization: {weights}, learning rate: {learning_rates[i]} output: {binary_outputs}, MSE: {mse_val:.4f}")

    weights = np.concatenate((np.random.rand(7), np.array([1] * 2)))

print("Relu activation function")
train([1] * 9)
train(np.concatenate([[0] * 7, [1] * 2]))
train([0, 1, 0, 1, 0, 1, 0, 1, 1])
train(np.concatenate([[0.5] * 7, [1] * 2]))

Relu activation function
Weights initilization: [1, 1, 1, 1, 1, 1, 1, 1, 1], learning rate: 0.001 output: [0, 0, 0, 0], MSE: 0.5000
Weights initilization: [1, 1, 1, 1, 1, 1, 1, 1, 1], learning rate: 0.01 output: [0, 0, 0, 0], MSE: 0.5000
Weights initilization: [1, 1, 1, 1, 1, 1, 1, 1, 1], learning rate: 0.1 output: [0, 0, 0, 0], MSE: 0.5000
Weights initilization: [1, 1, 1, 1, 1, 1, 1, 1, 1], learning rate: 1 output: [0, 0, 0, 0], MSE: 0.5000
Weights initilization: [0 0 0 0 0 0 0 1 1], learning rate: 0.001 output: [1, 1, 1, 1], MSE: 0.2501
Weights initilization: [0 0 0 0 0 0 0 1 1], learning rate: 0.01 output: [1, 1, 1, 0], MSE: 0.2500
Weights initilization: [0 0 0 0 0 0 0 1 1], learning rate: 0.1 output: [1, 1, 1, 0], MSE: 0.2500
Weights initilization: [0 0 0 0 0 0 0 1 1], learning rate: 1 output: [0, 0, 0, 0], MSE: 0.5000
Weights initilization: [0, 1, 0, 1, 0, 1, 0, 1, 1], learning rate: 0.001 output: [0, 0, 0, 0], MSE: 0.5000
Weights initilization: [0, 1, 0, 1, 0, 1, 0, 1, 1], learni

In [30]:
def lazy(num_trials):
    num_runs = 0
    for trial in range(num_trials):
        for i in range(len(learning_rates)):
            weights = np.concatenate([np.random.rand(6), [1] * 3])
            trained_weights = gradient_descent(weights, learning_rates[i], num_iterations)
            mse_val, outputs = mse(trained_weights)
            binary_outputs = [1 if output > 0.5 else 0 for output in outputs]
            num_runs = num_runs + 1
            if binary_outputs == [0, 1, 1, 0]:
                successful_weights = [trained_weights]
                return successful_weights, num_runs
    return None, num_runs

def lazy_repeat(num_trials):
    c = 0
    num_success = 0
    for trial in range(num_trials):
        successful_weights, num_runs = lazy(1)
        c = c + num_runs
        if successful_weights:
            num_success = num_success + 1
    #num_trained = num_trials *
    prop = num_success / c
    print(f"Number of trained NNs: {c}, Solutions to XOR: {num_success}, Proportion: {prop}")
    return prop

xor_correct_weight, _ = lazy(5)
lazy_repeat(5)


Number of trained NNs: 20, Solutions to XOR: 0, Proportion: 0.0


0.0