# Logistic Regression from scratch (Using only Numpy)

In this notebook we are going to implement **logistic regression** from scratch, using only Numpy. Logistic regression is equivalent to a **single neuron** of a neural network with a **sigmoid function**.

This notebook will set a general idea of what we are going to work on in the upcoming notebooks in the series.

In this Notebook we will implement the following functions:

- **`initialize`** → set the initial values of weights `w` and bias `b`  **(For now we will initialize them to zeros)**
- **`sigmoid`** → the activation function that maps inputs into probabilities  
- **`propagate`** → perform forward propagation (compute predictions & cost) and backward propagation (compute gradients)  
- **`optimize`** → update parameters `(w, b)` iteratively using gradient descent  
- **`predict`** → use the learned parameters `(w, b)` to predict labels on new data
- **`LR_model`** → build a logistic regression model using the functions above
- **`plot_learning_curve`** → plot the learning curve of the trained model using costs and matplotlib

By the end of this notebook, you’ll see how logistic regression works end-to-end:
1. Initialize parameters  
2. Learn them through optimization  
3. Predict outcomes on unseen examples  

This workflow is the **blueprint** for training more complex neural networks in the upcoming notebooks.

The main steps for building a Neural Network are:
1. Define the model structure (such as number of input features) 
2. Initialize the model's parameters
3. Loop:
    - Calculate current loss (forward propagation)
    - Calculate current gradient (backward propagation)
    - Update parameters (gradient descent)

We build each of these seperately, and then put them together in a model function.

In [None]:
# Import requirements
import numpy as np
import copy
import matplotlib.pyplot as plt

# 1 - Initialization of W and b

In [None]:
def Initialize_params(dim):
    """
    Initialize w -> zero vector of shape (dim, 1) and bias b to 0.0

    Args:
    dim(int): size of the w vector we want (or number of parameters in this case -- number of input features)

    Returns:
    w: zero vector of shape (dim, 1)
    b: float set equal to zero
    """

    w = np.zeros((dim, 1), dtype=float)
    b = float(0)

    return {'w': w, 'b': b}

# 2 - Sigmoid Activation Function
We need to implement this function to compute $sigmoid(z) = \frac{1}{1 + e^{-z}}$ for $z = w^T x + b$ to make predictions.

In [None]:
def sigmoid(z):
    """
    Compute sigmoid of z

    Args:
    z: numpy array

    Returns:
    s: sigmoid of z
    """

    s = 1/(1+np.exp(-z))

    return s

# 3 - Forward and Backward propagation
After we initialized the **parameters**, we need now to implement the propagate function **to learn them**.

In [None]:
def propagate(W, b, X, Y):
    """
    Forward and Backward propagation, computing the cost function and its gradients

    Args:
    W: weights, numpy array of size (n_x, 1)
    b: bias, scalar
    X: input data, numpy array of size (n_x, m)
    Y: labels, numpy array of size (1, m)
    NOTE: m is the number of examples and n_x = number of features per input = n_px*n_px*3 (Images)

    Returns:
    cost: computed cost
    grads: dictionary containing dW (gradient of the loss with respect to W, same shape as W)
                                 db (gradient of the loss with respect to b, same shape as b)
    """

    Z = np.dot(W.T, X) + b
    A = sigmoid(Z)

    m = X.shape[1]
    cost = (-1/m) * np.sum(Y*np.log(A) + (1-Y)*np.log(1-A))

    dZ = A - Y
    dW = (1/m) * np.dot(X, dZ.T)
    db = (1/m) * np.sum(dZ)

    cost = np.squeeze(np.array(cost))
    grads = {'dW': dW, 'db': db}

    return cost, grads

# 4 - Optimization
Now after we finished with forward and backward propagation, we have to implement **optimization** to learn W and b using **gradient descent**, by simply **minimizing the cost**.

We simply need to do the following:
1) Calculate the cost and the gradient for the current parameters. Use propagate().
2) Update the parameters using gradient descent rule for w and b.

In [None]:
def optimize(W, b, X, Y, num_iterations=1000, learning_rate=0.009, print_cost=True):
    """
    Optimize (update) W and b by using gradient descent

    Args:
    W: weights, numpy array of size (n_x, 1)
    b: bias, scalar
    X: input data, numpy array of size (n_x, m)
    Y: labels, numpy array of size (1, m)
    num_iterations: number of times we want to run the optimization
    learning_rate: learning rate of W and b (rate of updating)
    print_cost: boolean to enable printing the cost every 100 iterations

    Returns:
    params: dictionary containing W and b
    grads: dictionary containing dW and db
    costs: list of all costs computed during the optimization, this can be used to plot the learning curve
    
    """
    costs = []
    W = copy.deepcopy(W)
    b = copy.deepcopy(b)
    
    for i in range(num_iterations):
        cost, grads = propagate(W, b, X, Y)

        dW = grads['dW']
        db = grads['db']

        W -= learning_rate * dW
        b -= learning_rate * db

        if i%100 == 0:
            costs.append(cost)
            if print_cost:
                print(f'iteration {i}/{num_iterations}: loss = {cost}')
        
    params = {'W': W, 'b': b}
    grads = {'dW': dW, 'db': db}
    
    return params, grads, costs

# 5 - Making Predictions
Finally, after finishing the main functions to train a LR model, We have now to implement the **predict()** function to be able to make predictions.

A Logistic regression model makes **prediction based on Y-hat** (A = sigmoid(Z)):
- For Y-hat of an entry <=0.5 -> label the entry 0
- For Y-hat of an entry > 0.5 -> label the entry 1

In [None]:
def predict(W, b, X):
    """
    Label entries 0 or 1 based on the value of the activation

    Args:
    W: weights, numpy array of size (n_x, 1)
    b: bias, scalar
    X: input data, numpy array of size (n_x, m)

    Returns:
    Y_prediction: numpy array containing predicted labels to all the examples X
    """

    m = X.shape[1]
    Y_prediction = np.zeros((1, m))

    Z = np.dot(W.T, X) + b
    A = sigmoid(Z)

    for i in range(A.shape[1]):
        if A[0, i] <= 0.5:
            Y_prediction[0, i] = 0
        else:
            Y_prediction[0, i] = 1
    
    return Y_prediction

# 6 - Building the Model
Now, after implementing all the core functions of a LR model, we can finally put them together to build a LR model.

In [None]:
def LR_Model(X_train, X_test, Y_train, Y_test, num_iterations=1000, learning_rate=0.009, print_cost=True):
    """
    Builds LR model with all the previously implemented functions

    Args:
    X_train:
    X_test:
    Y_train:
    Y_test:
    num_iterations:
    learning_rate:
    print_cost:

    Returns:
    summary: dictionnary containing -- weights('W'), bias('b'), gradients ('grads'), costs('costs'), train_accuracy('train_accuracy'), 
                                       test_accuracy('test_accuracy'), Y_prediction_train('Y_prediction_train'), Y_prediction_test('Y_prediction_test'),
                                       learning_rate('learning_rate'), and num_iterations('num_iterations').
    """

    m = X_train.shape[0]
    W, b = Initialize_params(m)
    train_accuracy = 0
    test_accuracy = 0

    params, grads, costs = optimize(W, b, X_train, Y_train, num_iterations, learning_rate, print_cost)

    W = params['W']
    b = params['b']
    
    Y_prediction_train = predict(W, b, X_train)
    Y_prediction_test = predict(W, b, X_test)

    train_accuracy = 100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100 
    test_accuracy = 100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100

    summary = {
        'W': W,
        'b': b,
        'grads': grads,
        'costs': costs,
        'train_accuracy': train_accuracy,
        'test_accuracy': test_accuracy,
        'Y_prediction_train': Y_prediction_train,
        'Y_prediction_test': Y_prediction_test,
        'train_accuracy': train_accuracy,
        'test_accuracy': test_accuracy,
        'learning_rate': learning_rate,
        'num_iterations': num_iterations
    }


# 7 - Plotting the learning curve 

In [None]:
def plot_learning_curve(trained_model):
    
    costs = np.squeeze(trained_model['costs'])
    learning_rate = trained_model['learning_rate']
    num_iterations = trained_model['num_iterations']

    plt.plot(costs)
    plt.title(f'Learing rate = {learning_rate}')
    plt.ylabel(costs)
    plt.xlabel(num_iterations)

    plt.show()