# Curve fitting - Artificial Neural Networks

In this activity, we will use a simple artificial neural network to fit a model the provides the best decision boundary to the data provided. At this point of the lecture, you will learn that while traditional curve fitting might struggle with non-linear relationships or interactions between multiple variables, neural networks excel in these areas due to their layered architecture and non-linear activation functions. Essentially, a neural network can be thought of as performing a highly sophisticated form of curve fitting. It adjusts its internal parameters (weights and biases) during training to minimize the loss function, analogous to how coefficients are adjusted in polynomial curve fitting. This enables neural networks to fit intricate patterns in data, making them powerful tools for tasks that involve image recognition, natural language processing, and more, where traditional models might fail to capture the complexity of the data.

This notebook assists learning the advantages of combinining non-linear activations with a more complex model architecture i.e., a neural network.

## Neural network architecture

The following illustration is our model architecture. It is a simple neural network with two inputs (i.e., `Feature 0` and `Feature 1`), a single `hidden` layer with two `nodes` or `units`, and an output layer with a single `node`. In binary classification tasks, meaning, those that answer "Yes"/"No", "Positive"/"Negative", and "Class 0" vs "Class 1" questions, a single output node will do. The values here are then thresholded, meaning we set a value decision boundary and anything north or south of that value are mapped to the two classes of outputs. The labels in the illustration should guide you when changing the slider values for each parameter in the network.


<center><img src="https://github.com/mikedataCrunch/GMS5204/blob/main/media/nn_activity.jpeg?raw=true"/></center>
<center><b>Figure 1. Simple neural network architecture: 2 input features, 1 hidden layer with 2 nodes, and an output layer with a single node.</b></center>

## Data description
The sample data we're using here resembles a binary classification, where each sample belongs to either `Class: 0` or `Class: 1`. This is quite common in tasks that requires identifying examples that are `positive` to a particular condition, disease, diagnosis, or some other classification criteria.

The binary classification task takes in two input `features`. We can think of features as characteristics of an example. If consider humans as examples, then the two features can be height & weight, age & gender, gender & income, etc. If we consider this in a medical sense, then the two features can be test result A & B, lifestyle & age, or whatever pair of characteristics available to us.

In our activity, we will refer to these as `Feature 0` and `Feature 1`.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interactive, FloatSlider, Layout, interact
import ipywidgets as widgets
from IPython.display import display

def ReLU(x):
    """ReLU: Rectified linear unit function."""
    return np.maximum(0, x)

# Define the neural network with one hidden layer
def simple_neural_network(inputs, w1, w2, b1, b2):
    # Inputs is expected to be Nx2, w1 is 2x2, b1 is size 2, w2 is size 2, b2 is a scalar
    hidden_layer_input = np.dot(inputs, w1) + b1
    hidden_layer_activation = ReLU(hidden_layer_input)
    output = np.dot(hidden_layer_activation, w2) + b2
    return 1 / (1 + np.exp(-output))  # Sigmoid activation for output layer or output activation

def calculate_bce_loss(y_true, y_pred):
    """
    Calculate the binary cross-entropy loss.

    Parameters:
    -----------
    y_true (array-like): True binary labels (0 or 1).
    y_pred (array-like): Predicted probabilities, between 0 and 1.

    Returns:
    --------
    float: The average binary cross-entropy loss.
    """
    # Ensure that y_pred does not contain values exactly equal to 0 or 1,
    # as log(0) is undefined and can cause computation errors.
    epsilon = 1e-10
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)

    # Calculate binary cross-entropy loss
    loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    return loss

In [2]:

# Generate a simple dataset
np.random.seed(42)
# Class 0
feature_0_class_0 = np.random.normal(2, 1, 100)  # Feature 1 for class 0
feature_1_class_0 = np.random.normal(2, 1, 100)  # Feature 2 for class 0
# Class 1
feature_0_class_1 = np.random.normal(5, 1, 100)  # Feature 1 for class 1
feature_1_class_1 = np.random.normal(5, 1, 100)  # Feature 2 for class 1

features = np.vstack((np.column_stack((feature_0_class_0, feature_1_class_0)),
                      np.column_stack((feature_0_class_1, feature_1_class_1))))
y_true = np.array([0]*100 + [1]*100)

In [3]:
# offset = 10
offset = 0
# Grid for decision boundary visualization
feature_0, feature_1 = np.meshgrid(
    np.linspace(
        min(np.concatenate([feature_0_class_0, feature_0_class_1])) - offset,
        max(np.concatenate([feature_0_class_0, feature_0_class_1])) + offset,
        30
    ),
    np.linspace(
        min(np.concatenate([feature_1_class_0, feature_1_class_1])) - offset,
        max(np.concatenate([feature_1_class_0, feature_1_class_1])) + offset,
        30
    ),
)
# initial slider vals
w1_00_init = 2
w1_01_init = 2
w1_10_init = 2
w1_11_init = 2
w2_0_init = 0.5
w2_1_init = -0.4

b1_0_init = -1
b1_1_init = -1
b2_init = -1


# slider range
min_, max_ = -10, 10

slider_style = {'description_width': 'initial', 'handle_color': 'lightblue'}  # Adjust handle color and description width
layout = Layout(width='600px')
# Plotting function
@interact(
    w1_00=FloatSlider(description="w_x0_h0", value=w1_00_init, min=min_, max=max_, step=0.01, style=slider_style, layout=layout,),
    w1_01=FloatSlider(description="w_x0_h1", value=w1_01_init, min=min_, max=max_, step=0.01, style=slider_style, layout=layout,),
    w1_10=FloatSlider(description="w_x1_h0", value=w1_10_init, min=min_, max=max_, step=0.01, style=slider_style, layout=layout,),
    w1_11=FloatSlider(description="w_x1_h1", value=w1_11_init, min=min_, max=max_, step=0.01, style=slider_style, layout=layout,),
    b1_0=FloatSlider(description="b_h0", value=b1_0_init, min=min_, max=max_, step=0.01, style=slider_style, layout=layout,),
    b1_1=FloatSlider(description="b_h1", value=b1_1_init, min=min_, max=max_, step=0.01, style=slider_style, layout=layout,),
    w2_0=FloatSlider(description="w_h0_out", value=w2_0_init, min=min_, max=max_, step=0.01, style=slider_style, layout=layout,),
    w2_1=FloatSlider(description="w_h1_out", value=w2_1_init, min=min_, max=max_, step=0.01, style=slider_style, layout=layout,),
    b2=FloatSlider(description="b_out", value=b2_init, min=min_, max=max_, step=0.01, style=slider_style, layout=layout,),
    continuous_update=False,
)
def plot_nn_decision_boundary(w1_00, w1_01, w1_10, w1_11, b1_0, b1_1, w2_0, w2_1, b2):
    w1 = np.array([[w1_00, w1_01], [w1_10, w1_11]])
    b1 = np.array([b1_0, b1_1])
    w2 = np.array([w2_0, w2_1])
    b2 = np.array([b2])

    zz = simple_neural_network(np.c_[feature_0.ravel(), feature_1.ravel()], w1, w2, b1, b2)
    zz = zz.reshape(feature_0.shape)

    plt.figure(figsize=(10, 8))
    plt.scatter(
        features[:,0][:y_true.size // 2],
        features[:,1][:y_true.size // 2],
        c='blue',
        label='Class: 0',
        alpha=0.8
    )
    plt.scatter(
        features[:,0][y_true.size // 2:],
        features[:,1][y_true.size // 2:],
        c='red',
        label='Class: 1',
        alpha=0.8
    )

    contour = plt.contourf(
        feature_0,
        feature_1,
        zz,
        levels=[0, 0.5, 1],
        alpha=0.3,
        cmap="coolwarm",
    )
    plt.colorbar(contour)
    # Decision boundary line for the zz = 0.5 threshold
    plt.contour(
        feature_0,
        feature_1,
        zz,
        levels=[0.5],
        colors='k',
        vmin=0,
        vmax=1,
        linestyles='dashed')

    # Calculate loss
    y_pred = simple_neural_network(np.c_[features[:,0].ravel(), features[:,1].ravel()], w1, w2, b1, b2)
    loss = calculate_bce_loss(y_true, y_pred)

    plt.annotate(f"Current BCE Loss: {loss:.4f}", xy=(5,2), fontsize=10)
    plt.title('Neural Network Decision Boundary')
    plt.xlabel('Feature 0')
    plt.ylabel('Feature 1')
    plt.legend()
    plt.grid(True)
    plt.show()

interactive(children=(FloatSlider(value=2.0, description='w_x0_h0', layout=Layout(width='600px'), max=10.0, mi…

## End.