<a href="https://colab.research.google.com/github/Fawzy-AI-Explorer/X-From-Scratch/blob/main/Logistic_Regression-From_Scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Logistic Regression From Scratct
## Outline:
- [Goals](#goals)
- [Tools](#tools)
- [Prediction && Accuracy](#pred_acc)
- [Compute Cost](#cost)
- [Gradient Descent](#grd_desc)
- [Final Implementation](#final_implementation)
- [Test the model](#test)
- [Jupyter Notebook License](#license)

## <a name="goals">Goals<a>
In this toturial, we will:

- Implement the logistic regression model.
- Implement the gradient descent algorithm to train the model.
- Implement the accuracy score to evaluate the model.<br><br>

*All from scratch*

## <a name="tools">Tools<a>
In this project, we will make use of:
- math, This module provides access to the mathematical functions defined by the C standard
- pandas, a Python library used for working with data sets
- NumPy, a popular library for scientific computing
- seaborn, a Python data visualization library based on matplotlib
- Matplotlib, a popular library for plotting data

In [111]:
import math, copy
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as plt

## <a name="pred_acc">Prediction && Accuracy<a>
### Logistic Regression
A logistic regression model applies the sigmoid to the familiar linear regression model as shown below:

$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = g(\mathbf{w} \cdot \mathbf{x}^{(i)} + b ) \tag{1} $$

  where

  $g(z) = \frac{1}{1+e^{-z}}\tag{2}$

<br><br>
**Accuracy Score**: This is the proportion of correct predictions among all predictions made by the model.<br><br>
$$Accuracy=\frac{Number\ of\ Correct\ Predictions}{Total\ Number\ of\ Predictions}\tag{3}$$<br>

In [112]:
# sigmoid function implementation
def sigmoid(z):
    """
    Compute the sigmoid of z

    Args:
        z (ndarray): A scalar, numpy array of any size.

    Returns:
        g (ndarray): sigmoid(z), with the same shape as z
    """
    # apply the sigmoid function
    g = 1 / (1 + np.exp(-z))
    return g

In [113]:
# prediction function implemntation
def predict(X, w, b):
    """
    single predict using linear regression
    Args:
      X (ndarray): Shape (m, n) example with multiple features
      w (ndarray): Shape (n,) model parameters
      b (scalar):             model parameter

    Returns:
      p (scalar):  prediction
    """
    # apply sigmiod and threshold
    p = sigmoid(np.dot(w, X) + b)
    return p

let's make another function that predicts the exact category

In [114]:
# predicts the exact value for the class
def predict_class(X, w, b):
    """
    single predict using linear regression
    Args:
      X (ndarray): Shape (m, n) example with multiple features
      w (ndarray): Shape (n,) model parameters
      b (scalar):             model parameter

    Returns:
      p (scalar):  prediction
    """
    # apply sigmiod and threshold
    p = 1. if sigmoid(np.dot(X, w) + b) >= 0.5 else 0.
    return p

In [115]:
# accuracy function implementation
def get_accuracy(X, y, w, b):
    """
    Returns the accuracy of the model
    Args:
    X (ndarray): Shape(m, n) examples with multiple features
    y (ndarray): Shape (m,) the actual target values
    w (ndarray): Shape (n,) model parameters
    b (scalar) : model parameter

    Returns:
      accuracy (scalar): the accuracy of the model
    """
    m = X.shape[0]
    correct = 0
    # count the correct predictions
    for i in range(m):
      prediction = predict_class(X[i], w, b)
      if prediction == y[i]:
        correct += 1
    # proportion of correct predictions among all predictions
    accuracy = round(correct / m, 2)
    return accuracy

# <a name="cost">Compute Cost<a>
The equation for the cost function with multiple variables $J(\mathbf{w},b)$ is:
$$J(\mathbf{w},b) = -\frac{1}{m}\sum\limits_{i=0}^{m-1}[y^{(i)} log(f_{\mathbf{w},b}(\mathbf{x}^{(i)})) + (1 - {y}^{(i)})log(1 - f_{\mathbf{w},b}(\mathbf{x}^{(i)}))]\tag{1}$$

In [116]:
# cost function for logistic regression
def compute_cost(X, y, w, b):
    """
    Computes the cost function for logistic regression.

    Args:
      X (ndarray (m,n)): Data, m examples and n features
      y (ndarray (m,)): target values
      w (ndarray (m,)): model parameters
      b (scalar)    : model parameter

    Returns:
        total_cost (float): The cost of using w,b as the parameters for linear regression
               to fit the data points in x and y
    """
    # examples and features
    m, n = X.shape
    total_cost = 0.

    # compute the loss for each example
    for i in range(m):
      epsilon = 1e-9 # to avoid overflow
      f_wb_i = predict(X[i], w, b) + epsilon
      total_cost += (y[i] * np.log(f_wb_i) + (1 - y[i]) * np.log(1 - f_wb_i))

    total_cost /= -m

    return total_cost

## <a name="grd_desc">Gradient Descent<a>
Gradient descent for multiple variables:

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline\;
& w_j = w_j -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{1}  \; & \text{for j = 0..n-1}\newline
&b\ \ = b -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial b}  \newline \rbrace
\end{align*}$$

where, n is the number of features, parameters $w_j$,  $b$, are updated simultaneously and where  

$$
\begin{align}
\frac{\partial J(\mathbf{w},b)}{\partial w_j}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{2}  \\
\frac{\partial J(\mathbf{w},b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{3}
\end{align}
$$
* m is the number of training examples in the data set

    
*  $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value




---


let's start by computing the gradients

In [117]:
# function to compute the gradients
def compute_gradient(X, y, w, b):
    """
    Computes the gradient for linear regression
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters
      b (scalar)       : model parameter

    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w.
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b.
    """
    # number of examples
    m, n = X.shape

    # initial gradients
    dj_dw = np.zeros((n, ))
    dj_db = 0.

    # compute the actual gradient for each parameter
    for i in range(m):
      f_wb_i = predict(X[i], w, b)
      err = f_wb_i - y[i]
      dj_dw += err * X[i]
      dj_db += err

    dj_dw /= m
    dj_db /= m

    return dj_dw, dj_db

Now let's apply the algorithm

In [118]:
# apply gradient descent algorithm
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters):
    """
    Performs batch gradient descent to learn w and b. Updates w and b by taking
    num_iters gradient steps with learning rate alpha

    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent

    Returns:
      w (ndarray (n,)) : Updated values of parameters
      b (scalar)       : Updated value of parameter
    """
    # initialize some variables
    J_history = []
    w = copy.deepcopy(w_in)
    b = b_in

    # apply the algorithm `num_iters` of iterations
    for i in range(num_iters):
      # compute gradient
      dj_dw, dj_db = gradient_function(X, y , w, b)
      # simultneous update
      w = w - alpha * dj_dw
      b = b - alpha * dj_db

      # save the history
      if i < 100000:
        J_history.append(cost_function(X, y, w, b))

      # Print cost every at intervals 10 times or as many iterations if < 10
      if i % math.ceil(num_iters / 10) == 0:
        print("{:>8} {:>11.5e}".format(i, cost_function(X, y, w, b)))

    return w, b, J_history


## <a name="final_implementation">Final Implementation<a>
Now we will grab all this together to make the complete model that can be used easily later.<br><br>
*NOTE*: we will rename some methods just to keep up with the original model

In [119]:
"""Define a linear regressino class."""


class LogisticRegression():
    """Representation of a logistic regression model.
    the model predicts categories from small number of possible outputs.
    """

    def __init__(self, learning_rate=1e-3, n_iters=1000):
      """Initialize the linear regression model

      Args:
        learning_rate (scalar) : The number indicates the step in gradient descent.
        n_iters (scalar) : The maximum number of passes over the training data.
      """
      self.lr = learning_rate
      self.n_iters = n_iters
      self.weights = None
      self.bias = None

    def fit(self, X, y):
      """Fit the model according to the given training data.

      Args:
        X (ndarray (m,n))   : Data, m examples with n features
        y (ndarray (m,))    : target values

      Returns:
        self (object) : Fitted model estimator.
      """
      # initialize some variables
      J_history = []
      self.weights = np.zeros((X.shape[1],))
      self.bias = 0.0

      # apply the algorithm `num_iters` of iterations
      for i in range(self.n_iters):
        # compute gradient
        dj_dw, dj_db = self.compute_gradient(X, y)
        # simultneous update
        self.weights = self.weights - self.lr * dj_dw
        self.bias = self.bias - self.lr * dj_db

        # save the history
        if i < 100000:
          J_history.append(self.compute_cost(X, y))

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i % math.ceil(self.n_iters / 10) == 0:
          print("{:>8} {:>11.5e}".format(i, self.compute_cost(X, y)))

      return self


    def compute_gradient(self, X, y):
      """
      Computes the gradient for logistic regression
      Args:
        X (ndarray (m,n)): Data, m examples with n features
        y (ndarray (m,)) : target values

      Returns:
        dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w.
        dj_db (scalar):       The gradient of the cost w.r.t. the parameter b.
      """
      # number of examples
      m, n = X.shape

      # initial gradients
      dj_dw = np.zeros((n, ))
      dj_db = 0.

      # compute the actual gradient for each parameter
      for i in range(m):
        f_wb_i = self.predict(X[i])
        err = f_wb_i - y[i]
        dj_dw += err * X[i]
        dj_db += err

      dj_dw /= m
      dj_db /= m

      return dj_dw, dj_db

    def predict(self, X):
      """
      single predict using linear regression
      Args:
        X (ndarray): Shape (m, n) example with multiple features

      Returns:
        p (scalar):  prediction
      """
      # apply sigmiod and threshold
      p = self.sigmoid(np.dot(self.weights, X) + self.bias)
      return p

    def predict_class(self, X):
      """
      single predict using linear regression
      Args:
        X (ndarray): Shape (m, n) example with multiple features

      Returns:
        p (scalar):  prediction
      """
      # apply sigmiod and threshold
      p = 1. if self.predict(X) >= 0.5 else 0.
      return p

    def sigmoid(self, z):
      """
      Compute the sigmoid of z

      Args:
          z (ndarray): A scalar, numpy array of any size.

      Returns:
          g (ndarray): sigmoid(z), with the same shape as z
      """
      # apply the sigmoid function
      g = 1 / (1 + np.exp(-z))
      return g

    def compute_cost(self, X, y):
      """
      Computes the cost function for logistic regression.

      Args:
        X (ndarray (m,n)): Data, m examples and n features
        y (ndarray (m,)): target values

      Returns:
          total_cost (float): The cost of using w,b as the parameters for linear regression
                to fit the data points in x and y
      """
      # examples and features
      m, n = X.shape
      total_cost = 0.

      # compute the loss for each example
      for i in range(m):
        epsilon = 1e-9 # to avoid overflow
        f_wb_i = self.predict(X[i]) + epsilon
        total_cost += (y[i] * np.log(f_wb_i) + (1 - y[i]) * np.log(1 - f_wb_i))

      total_cost /= -m

      return total_cost

    def score(self, X, y):
      """
      Returns the accuracy of the model
      Args:
      X (ndarray): Shape(m, n) examples with multiple features
      y (ndarray): Shape (m,) the actual target values

      Returns:
        accuracy (scalar): the accuracy of the model
      """
      m = X.shape[0]
      correct = 0
      # count the correct predictions
      for i in range(m):
        prediction = self.predict_class(X[i])
        if prediction == y[i]:
          correct += 1
      # proportion of correct predictions among all predictions
      accuracy = round(correct / m, 2)
      return accuracy


## <a name="test">Test the model<a>
Now let's test the model on [Breast Cancer Wisconsin (Diagnostic) Data Set](https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data)

I will not go through the data exploration and analysis but if you want to know more you can [check this](https://www.kaggle.com/code/adham3lam/logistic-regression-from-scratch).

In [120]:
# load the data
url = "https://raw.githubusercontent.com/Ad7amstein/Logistic_Regression-Breast_Cancer_Diagnostic/main/data.csv"
df = pd.read_csv(url)

# drop id and Unnamed: 32 columns
df.drop(["id", "Unnamed: 32"], axis=1, inplace=True)

# convert target to numerical values
df.diagnosis = [1 if value == "M" else 0 for value in df.diagnosis]

# divide into target variables and predictors
y = df["diagnosis"] # our target variable
X = df.drop(["diagnosis"], axis=1) # our predictors
X = X.values

# Normalization
from sklearn.preprocessing import StandardScaler

# Create a calar object
scalar = StandardScaler()

# Fit the scalar to the data and transform it
X_norm = scalar.fit_transform(X)

# Split the data
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_norm, y, test_size=0.3, random_state=42)
X_train, X_test, y_train, y_test = np.array(X_train), np.array(X_test), np.array(y_train), np.array(y_test)

Now let's build the model and train it.

In [122]:
lr = LogisticRegression()

lr.fit(X_train, y_train)

print("The score of our model is {}%".format(lr.score(X_test, y_test) * 100))

       0 6.91224e-01
     100 5.46879e-01
     200 4.61833e-01
     300 4.06128e-01
     400 3.66671e-01
     500 3.37111e-01
     600 3.14027e-01
     700 2.95418e-01
     800 2.80035e-01
     900 2.67062e-01
The score of our model is 95.0%


## <a name="license">Jupyter Notebook License<a>
### Author: Adham Allam
### How to reach me ?
- <a href="https://www.kaggle.com/adham3lam">kaggle<a>
- <a href="https://www.linkedin.com/in/adham-allam-284486254/">Linkedin<a>
- <a href="https://linktr.ee/Adham.3llam">Linktree<a>

### Terms:
1. Free to use for learning purposes.
2. Attribution to the author, Adham Allam, is required.
3. Permission to edit and explore, but derivative works must reference the original notebook and author.
4. No warranties provided.<br>

By using this notebook, you agree to these terms.