# Introduction
In this guide, we'll implement **logistic regression** from scratch using **gradient descent**. Starting with **dataset loading**, we'll cover the **mathematical foundations** and step-by-step code implementation.

The goal is to understand how **logistic regression** works, how **gradient descent** optimizes model parameters, and how to build it without **high-level machine learning libraries**.

### Table of Contents

1. **Importing Libraries**
   - Setting up the necessary libraries for data manipulation, model implementation, and visualization.
2. **Loading and Exploring the Dataset**
   - Understanding the structure of the dataset and initial data exploration.
3. **Preparing the Data**
   - Preprocessing the data by scaling features and splitting into training and testing sets.
4. **Initializing Parameters**
   - Defining the initial parameters for the model, including weights and bias.
5. **Defining the Sigmoid Function**
   - Implementing the model's prediction function using the sigmoid activation.
6. **Defining the Cost Function**
   - Formulating the cost function to measure the accuracy of predictions against actual values using the log loss formula.
7. **Computing the Gradients**
   - Calculating the gradients for weights and bias to optimize the cost function.
8. **Updating Parameters Using Gradient Descent**
   - Applying gradient descent to adjust parameters and minimize the cost function.
9. **Training the Model**
   - Training the model using the data and updating parameters through iterative optimization.
10. **Evaluating Model Performance with Test Data**
    - Assessing the model's performance using test data and relevant metrics.
11. **Conclusion**
    - Summarizing the key findings and insights from the model implementation.

# 1. Importing Libraries

The following code imports essential libraries for linear regression and dataset loading:

- **numpy**: For numerical computing and array manipulation.
- **load_breast_canver**: Loads the Breast Cancer dataset for classification tasks.
- **matplotlib.pyplot**: For visualizations such as loss curves and predictions.

In [26]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer

# 2. Loading and Exploring the Dataset

1. **`data = load_breast_cancer()`**:
   - Loads the Breast Cancer dataset from `sklearn.datasets`.

2. **`X = data.data`**:
   - Extracts feature matrix (e.g., mean radius, texture, area) for each sample.

3. **`y = data.target`**:
   - Extracts target labels:
     - `0`: Malignant
     - `1`: Benign

In [27]:
data = load_breast_cancer()
X = data.data
y = data.target

# 3. Preparing the Data

1. **`from sklearn.preprocessing import StandardScaler`**:
   - Imports the `StandardScaler` for feature standardization.

2. **`from sklearn.model_selection import train_test_split`**:
   - Imports the `train_test_split` function to split the dataset into training and testing sets.

3. **`scaler = StandardScaler()`**:
   - Initializes a scaler to standardize features by removing the mean and scaling to unit variance.

4. **`X = scaler.fit_transform(X)`**:
   - Standardizes the feature matrix `X`.

5. **`X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)`**:
   - Splits the dataset into training and testing sets:
     - 80% for training (`X_train`, `y_train`).
     - 20% for testing (`X_test`, `y_test`).
   - Ensures reproducibility with `random_state=42`.

In [28]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

scaler = StandardScaler()
X = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 4. Initializing Parameters

1. **`weights = np.zeros(X_train.shape[1])`**:
   - Initializes the weight vector for the model.
   - The weights have the same number of elements as the features (`X_train.shape[1]`).
   - All weights are initially set to `0`.

2. **`bias = 0`**:
   - Initializes the bias term to `0`.
   - This is a scalar value added to the model's predictions.

In [29]:
X_train.shape[1]

30

In [30]:
weights = np.zeros(X_train.shape[1])
bias = 0

# 5. Defining the Sigmoid Function

1. **`def sigmoid(z):`**:
   - Defines the sigmoid activation function.

2. **`return 1 / (1 + np.exp(-z))`**:
   - Computes the sigmoid of `z` using the formula:

   $$
   \large \sigma(z) = \frac{1}{1 + e^{-z}}
   $$

The sigmoid function maps input `z` to a range between 0 and 1, representing probabilities in binary classification tasks.


In [31]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# 6. Defining the Cost Function

1. **`def compute_cost(X, y, weights, bias):`**:
   - Defines a function to compute the cost (log loss) for logistic regression.

2. **`m = len(y)`**:
   - Determines the number of training samples (`m`).

3. **`predictions = sigmoid(np.dot(X, weights) + bias)`**:
   - Calculates the predicted probabilities:
     - Uses the sigmoid function on the linear combination of features, weights, and bias.
     - Formula:

  $$
  \large \ z =  \left( \sum_{j=1}^{n} w_j x_j^{(i)} + b \right)
  $$
  
  

  $$
  \large \hat{y}_{(i)} = \sigma \left(\ z \right)
  $$
  


4. **`cost = -1 / m * np.sum(y * np.log(predictions) + (1 - y) * np.log(1 - predictions))`**:
   - Computes the cost function (log loss):
     - Formula:

  $$
  \large J = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(\hat{y}_{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}_{(i)}) \right]
  $$

5. **`return cost`**:
   - Returns the computed cost value.


Calculates the log loss, which measures the difference between predicted probabilities and actual labels. Lower cost indicates better model performance.


In [32]:
def compute_cost(X, y, weights, bias):
    m = len(y)
    predictions = sigmoid(np.dot(X, weights) + bias)
    cost = -1 / m * np.sum(y * np.log(predictions) + (1 - y) * np.log(1 - predictions))
    return cost

# 7. Computing the Gradients

1. **`def compute_gradients(X, y, predictions):`**:
   - Defines a function to compute gradients of the cost function with respect to weights and bias.

2. **`m = X.shape[0]`**:
   - Retrieves the number of training samples (`m`).

3. **`dz = predictions - y`**:
   - Computes the error between predicted probabilities (`predictions`) and true labels (`y`).

4. **`dw = 1 / m * np.dot(X.T, dz)`**:
   - Computes the gradient of the cost function with respect to weights (`dw`).
   - Formula:

  $$
  \large \frac{\partial J}{\partial w_j} = \frac{1}{m} \sum_{i=1}^{m} \left( \hat{y}_{(i)} - y^{(i)} \right) x_j^{(i)}
  $$

5. **`db = 1 / m * np.sum(dz)`**:
   - Computes the gradient of the cost function with respect to the bias (`db`).
   - Formula:

  $$
  \large \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^{m} \left( \hat{y}_{(i)} - y^{(i)} \right)
  $$

6. **`return dw, db`**:
   - Returns the computed gradients for weights (`dw`) and bias (`db`).

Computes the gradients needed for updating weights and bias during model training using gradient descent.


In [33]:
def compute_gradients(X, y, predictions):

    m = X.shape[0]
    dz = predictions - y

    dw = 1 / m * np.dot(X.T, dz)
    db = 1 / m * np.sum(dz)

    return dw, db

# 8. Updating Parameters Using Gradient Descent

1. **`def update_parameters(weights, bias, dw, db, learning_rate):`**:
   - Defines a function to update the model's weights and bias using gradients and the learning rate.

2. **`weights -= learning_rate * dw`**:
   - Updates the weights by subtracting the product of the learning rate and the gradient of weights (`dw`).
   - Formula:
  $$
  \large w_j = w_j - \alpha \frac{\partial J}{\partial w_j}
  $$
     Where:

  $$
  \alpha = \text{learning rate}
  $$


3. **`bias -= learning_rate * db`**:
   - Updates the bias by subtracting the product of the learning rate and the gradient of the bias (`db`).
   - Formula:
  $$
  \large b = b - \alpha \frac{\partial J}{\partial b}
  $$

4. **`return weights, bias`**:
   - Returns the updated weights and bias.

Applies gradient descent to iteratively update weights and bias, minimizing the cost function and improving model performance.


In [34]:
def update_parameters(weights, bias, dw, db, learning_rate):

    weights -= learning_rate * dw
    bias -= learning_rate * db
    return weights, bias

# 9. Training the Model

1. **`def train(X, y, weights, bias, learning_rate, iterations):`**
   - Trains a logistic regression model using gradient descent over a given number of iterations.

2. **`cost_history = []`**
   - Initializes a list to store the cost value at every 100 iterations for monitoring training progress.

3. **`for i in range(iterations):`**
   - Iteratively updates model parameters using the following steps:

   - **Predictions**:
     - `predictions = sigmoid(np.dot(X, weights) + bias)`
     - Calculates the predicted probabilities for the current weights and bias.

   - **Cost Calculation**:
     - `cost = compute_cost(X, y, weights, bias)`
     - Computes the cost function (log loss) to evaluate the model's performance.

   - **Gradients**:
     - `dw, db = compute_gradients(X, y, predictions)`
     - Calculates the gradients of the cost function with respect to weights (`dw`) and bias (`db`).

   - **Parameter Updates**:
     - `weights, bias = update_parameters(weights, bias, dw, db, learning_rate)`
     - Updates weights and bias using the learning rate and computed gradients.

   - **Cost Logging**:
     - Every 100 iterations (`if i % 100 == 0`):
       - Appends the current cost to `cost_history`.
       - Prints the cost value for monitoring progress.

4. **`return weights, bias, cost_history`**
   - Returns the optimized weights, bias, and the cost history.

**Model Training**

1. **`weights, bias, _ = train(X_train, y_train, weights, bias, 0.01, 1000)`**
   - Trains the logistic regression model using the training data (`X_train`, `y_train`).
   - Parameters:
     - Learning rate: `0.01`
     - Iterations: `1000`
   - Outputs:
     - Optimized weights and bias after training.
     - Logs cost values every 100 iterations.

- Trains the model by minimizing the cost function using gradient descent.
- Outputs optimized model parameters to make predictions.

In [35]:
def train(X, y, weights, bias, learning_rate, iterations):

    cost_history = []

    for i in range(iterations):

        predictions = sigmoid(np.dot(X, weights) + bias)
        cost = compute_cost(X, y, weights, bias)

        dw, db = compute_gradients(X, y, predictions)

        weights, bias = update_parameters(weights, bias, dw, db, learning_rate)

        if i % 100 == 0:
            cost_history.append(cost)
            print(f'Cost after iteration {i}: {cost:.4f}')
    return weights, bias, cost_history

weights, bias, _ = train(X_train, y_train, weights, bias, 0.01, 1000)

Cost after iteration 0: 0.6931
Cost after iteration 100: 0.2522
Cost after iteration 200: 0.1897
Cost after iteration 300: 0.1615
Cost after iteration 400: 0.1448
Cost after iteration 500: 0.1336
Cost after iteration 600: 0.1253
Cost after iteration 700: 0.1190
Cost after iteration 800: 0.1139
Cost after iteration 900: 0.1097


# 10. Evaluating Model Performance with Test Data

1. **`final_predictions = sigmoid(np.dot(X_test, weights) + bias)`**:
   - Computes the predicted probabilities for the test set (`X_test`) using the trained model.

2. **`accuracy = np.mean((final_predictions > 0.5) == y_test)`**:
   - Calculates the accuracy of the model:
     - Converts predicted probabilities to binary labels using a threshold of `0.5`.
     - Compares predicted labels with actual labels (`y_test`) to calculate the proportion of correct predictions.

3. **`print(f'Test accuracy: {accuracy * 100:.2f}%')`**:
   - Prints the test accuracy as a percentage.

- Trains the logistic regression model using the training dataset.
- Evaluates the model's performance on unseen test data, reporting its classification accuracy.


**Formula Summary**


- Prediction for Test Data:
  $$
  \large \hat{y}_{(i)} = \sigma \left( \sum_{j=1}^{n} w_j x_j^{(i)} + b \right)
  $$

- Accuracy:
  $$
  \small \text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Predictions}} \times 100
  $$


- The model outputs the test accuracy, e.g.,:

In [49]:
(sigmoid(np.dot(X_test, weights) + bias) > .5) == y_test

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True, False,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True])

In [48]:
y_test

array([1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1,
       0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1,
       0, 1, 1, 0])

In [36]:
final_predictions = sigmoid(np.dot(X_test, weights) + bias)

accuracy = np.mean((final_predictions > 0.5) == y_test)

print(accuracy)

print(f'Test accuracy: {accuracy * 100:.2f}%')

0.9912280701754386
Test accuracy: 99.12%


In [54]:
model = {'weights' : weights, 'bias' : bias}

In [63]:
import json
import numpy as np

model = {'weights': weights, 'bias': bias}
model['weights'] = model['weights'].tolist()

with open('sample_data/model.json', 'w') as f:
    json.dump(model, f, indent=4)

In [55]:
model

{'weights': array([-0.37522937, -0.36006938, -0.37107979, -0.37113421, -0.15591433,
        -0.13275051, -0.28928913, -0.39138848, -0.09339357,  0.1497156 ,
        -0.32934913, -0.01345871, -0.27531747, -0.29989143, -0.00308254,
         0.11363218,  0.093067  , -0.04782715,  0.0757411 ,  0.19622723,
        -0.43974159, -0.43492227, -0.41869554, -0.41332874, -0.30778806,
        -0.20946152, -0.29105911, -0.39796251, -0.30886471, -0.09375225]),
 'bias': 0.3265103861459601}

In [53]:
bias

0.3265103861459601

# 11. Conclusion

In this project, we successfully implemented a **logistic regression model** from scratch using gradient descent.


## Key Findings:
- The logistic regression model was successfully trained and optimized using gradient descent.
- The final learned weights and bias demonstrated effective convergence.
- Test accuracy provided a measure of the model's performance on unseen data.

---

## Benefits of Logistic Regression:
- **Probabilistic Predictions**: Outputs probabilities for binary classification, making it interpretable.
- **Simplicity**: Easy to implement and computationally efficient.
- **Linear Decision Boundary**: Works well for linearly separable data.

---

## Drawbacks of Logistic Regression:
- **Assumption of Linearity**: Assumes linearity in the relationship between input features and the log-odds of the target variable.
- **Limited to Binary Classification**: Needs extensions for multi-class problems (e.g., one-vs-rest, softmax regression).
- **Sensitivity to Outliers**: Outliers can influence the decision boundary significantly.

---

## Conclusion:
Logistic regression is a robust, interpretable model for binary classification tasks with linearly separable data. While its simplicity is an advantage, it may require feature engineering or advanced techniques for non-linear problems or datasets with significant noise. For more complex scenarios, models like support vector machines or neural networks can provide better performance.