<h1>Building a Neural Network  from scratch </h1>

In [34]:
import numpy as np

<b>Step 1: Initializing Parameters</b>

First, we need to initialize the weights and biases for each layer. Small random values are typically used for weights, and biases can be initialized to zero.

In [35]:
import numpy as np

def initialize_parameters(layers_dims):
    parameters = {}
    L = len(layers_dims)

    for l in range(1, L):
        parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1]) * 0.01
        parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
    
    return parameters

<b>Step 2: Implement Forward Propagation</b>

Forward propagation is the process of calculating the output of the neural network for a given input. We need to calculate the linear part and then apply the activation function.

In [36]:
def linear_forward(A, W, b):
    Z = np.dot(W, A) + b
    cache = (A, W, b)
    return Z, cache

def linear_activation_forward(A_prev, W, b, activation):
    if activation == "sigmoid":
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = sigmoid(Z)
    
    elif activation == "relu":
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = relu(Z)
    
    cache = (linear_cache, activation_cache)
    return A, cache


<b>Activation functions:</b> Activation functions introduce non-linearities into the network, enabling it to learn complex data patterns. 

<b>a. Sigmoid Function</b>

The sigmoid function is traditionally used for binary classification tasks, especially in the output layer.

In [44]:
def sigmoid(Z):
    A = 1 / (1 + np.exp(-Z))
    cache = Z
    return A, cache

def sigmoid_backward(dA, cache):
    Z = cache
    S = 1 / (1 + np.exp(-Z))
    dZ = dA * S * (1 - S)
    return dZ

<b>b. ReLU Function</b>

ReLU (Rectified Linear Unit) is widely used in hidden layers due to its efficiency and effectiveness in addressing the vanishing gradient problem.

In [46]:
def relu(Z):
    A = np.maximum(0, Z)
    cache = Z
    return A, cache

def relu_backward(dA, cache):
    Z = cache
    dZ = np.array(dA, copy=True)
    dZ[Z <= 0] = 0
    return dZ

<b>c. Tanh Function</b>

The tanh function is a scaled version of the sigmoid and can be more effective in certain hidden layers due to its output range of [-1, 1], centering the data.

In [47]:
def tanh(Z):
    A = np.tanh(Z)
    cache = Z
    return A, cache

def tanh_backward(dA, cache):
    Z = cache
    dZ = dA * (1 - np.tanh(Z)**2)
    return dZ

<b>d. Softmax Function</b>

Softmax is typically used in the output layer of a multiclass classification network, converting logits to probabilities.

In [48]:
def softmax(Z):
    expZ = np.exp(Z - np.max(Z, axis=0, keepdims=True))
    A = expZ / np.sum(expZ, axis=0, keepdims=True)
    cache = Z
    return A, cache

def softmax_backward(Y, cache):
    Z = cache
    A, _ = softmax(Z)
    dZ = A - Y
    return dZ

<b>Step 3: Compute the Loss</b>

The loss function measures the performance of the neural network. The choice of loss function depends on the task (e.g., cross-entropy for classification, mean squared error for regression).

<b>a. Mean Squared Error (MSE) - For Regression</b>

Used primarily in regression tasks, where the goal is to predict continuous values.

In [37]:
def mean_squared_error(Y, Y_hat):
    m = Y.shape[1]
    cost = (1 / (2 * m)) * np.sum(np.square(Y_hat - Y))
    return cost

<b>b. Binary Cross-Entropy Loss - For Binary Classification</b>

Commonly used in binary classification tasks. It measures the performance of a classification model whose output is a probability value between 0 and 1.

In [38]:
def binary_cross_entropy(Y, Y_hat):
    m = Y.shape[1]
    cost = (-1 / m) * np.sum(Y * np.log(Y_hat) + (1 - Y) * np.log(1 - Y_hat))
    cost = np.squeeze(cost)  # To ensure the cost is the proper shape (e.g., turns [[17]] into 17).
    return cost

<b>c. Categorical Cross-Entropy Loss - For Multiclass Classification</b>

Used in multiclass classification settings, where the goal is to categorize instances into more than two classes.

In [39]:
def categorical_cross_entropy(Y, Y_hat):
    m = Y.shape[1]
    cost = (-1 / m) * np.sum(Y * np.log(Y_hat))
    cost = np.squeeze(cost)
    return cost

<b>Step 4: Backward Propagation</b>

Backward propagation calculates the gradient of the loss function with respect to the parameters, which is used to update the weights and biases.

In [40]:
def linear_backward(dZ, cache):
    A_prev, W, b = cache
    m = A_prev.shape[1]

    dW = np.dot(dZ, A_prev.T) / m
    db = np.sum(dZ, axis=1, keepdims=True) / m
    dA_prev = np.dot(W.T, dZ)
    
    return dA_prev, dW, db

<b>Step 5: Update Parameters</b>

Using the gradients computed from backpropagation, we update the weights and biases of the network.

In [41]:
def update_parameters(parameters, grads, learning_rate):
    L = len(parameters) // 2

    for l in range(L):
        parameters["W" + str(l+1)] -= learning_rate * grads["dW" + str(l+1)]
        parameters["b" + str(l+1)] -= learning_rate * grads["db" + str(l+1)]
    
    return parameters

<b>Step 6: Training the Neural Network</b>

After defining all necessary components, we need to put them together to train the neural network. Training involves feeding the network with data, performing forward and backward propagation, and updating the model's weights and biases iteratively.

In [32]:
def model(X, Y, layers_dims, learning_rate=0.0075, num_iterations=3000, print_cost=False):
    np.random.seed(1)
    costs = []
    
    # Parameters initialization
    parameters = initialize_parameters(layers_dims)

    # Loop (gradient descent)
    for i in range(0, num_iterations):

        # Forward propagation
        AL, caches = forward_propagation(X, parameters)
        
        # Compute cost
        cost = compute_cost(AL, Y)

        # Backward propagation
        grads = backward_propagation(AL, Y, caches)
 
        # Update parameters
        parameters = update_parameters(parameters, grads, learning_rate)
        
        # Printing the cost every 100 training examples
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
        if print_cost and i % 100 == 0:
            costs.append(cost)
            
    # plot the cost
    plt.plot(np.squeeze(costs))
    plt.ylabel('cost')
    plt.xlabel('iterations (per hundreds)')
    plt.title(f"Learning rate = {learning_rate}")
    plt.show()
    
    return parameters

<b>Step 7: Evaluation and Prediction</b>

Once the model is trained, we can use it to make predictions on new data. To do this, we simply perform forward propagation with the learned parameters and interpret the output.

<b>a. Binary Classification (Sigmoid Activation)</b>

This is the simplest case, typically used when the output layer of the neural network uses a sigmoid activation function for binary classification.

In [33]:
def predict_binary(X, parameters):
    AL, _ = forward_propagation(X, parameters)
    predictions = AL > 0.5  # Default threshold is 0.5
    return predictions

<b>b. Binary Classification with Custom Threshold</b>

In some applications, especially where there is class imbalance or different costs associated with false positives and false negatives, we may want to adjust the decision threshold.

In [42]:
def predict_binary_threshold(X, parameters, threshold=0.5):
    AL, _ = forward_propagation(X, parameters)
    predictions = AL > threshold
    return predictions

<b>c. Multiclass Classification (Softmax Activation)</b>

For multiclass classification, the output layer typically uses the softmax activation. Here, the prediction is based on the index of the maximum output value.

In [25]:
def predict_multiclass(X, parameters):
    AL, _ = forward_propagation(X, parameters)
    predictions = np.argmax(AL, axis=0)
    return predictions