### **Stage 1: Foundational Concepts**

#### **1.1 What is Logistic Regression?**

Logistic Regression is a supervised learning algorithm used for **binary classification** tasks. It predicts the probability that an input belongs to one of two classes (e.g., "Yes" or "No", "Spam" or "Not Spam"). Despite its name, it is not used for regression but for classification.

Key points:

* Output is a probability value between 0 and 1.
* The decision boundary is typically set at 0.5: if *P*(*y*=1∣*x*)≥0.5, predict class 1; otherwise, predict class 0.
* It assumes a linear relationship between the input features and the log-odds of the output.

#### **1.2 Why Use Logistic Regression?**
   * **Simplicity**: Easy to implement and interpret.
   * **Efficiency**: Fast to train, even on large datasets.
   * **Probabilistic Output**: Provides probabilities, which can be useful for decision-making.
   * **Baseline Model**: Often serves as a baseline for more complex models.


#### **1.3 Key Terminology**

Before diving deeper, let's define some key terms:

   * **Binary Classification**: A task where the output has only two possible classes.
   * **Logit Function**: The natural logarithm of the odds (log(p/(1−p))).
   * **Sigmoid Function**: A mathematical function that maps any real-valued number into the range [0, 1].
   * **Decision Boundary**: The threshold that separates the two classes.


#### **1.4 The Sigmoid Function**

The core of logistic regression is the **sigmoid function**, defined as:
*σ*(*z*)=1/(1+*e*^(−*z*))

Where:
   * *z*=*w^T*·*x*+*b* is the linear combination of inputs *x*, weights *w*, and bias *b*.
   * *σ*(*z*) outputs a value between 0 and 1, representing the probability of class 1.

**Properties of the Sigmoid Function**:
   * As *z*→∞, *σ*(*z*)→1.
   * As *z*→−∞, *σ*(*z*)→0.
   * At *z*=0, *σ*(*z*)=0.5.


#### **1.5 Probabilistic Interpretation**

Logistic Regression models the probability of class 1 as:
*P*(*y*=1∣*x*)=*σ*(*w^T*·*x*+*b*)
And the probability of class 0 as:
*P*(*y*=0∣*x*)=1−*P*(*y*=1∣*x*)
The model predicts the class with the highest probability.


#### **Stage 2: Mathematical Derivation**

#### **2.1 The Logistic Regression Model**

Given a dataset {(*x*^(*i*),*y*^(*i*))}_{*i*=1}^{*n*}, where:
   * *x*^(*i*)∈ℝ^*d* is the feature vector for the *i*-th sample.
   * *y*^(*i*)∈{0,1} is the binary label.

The logistic regression model predicts:

*P*(*y*=1∣*x*)=*σ*(*w^T*·*x*+*b*)

Where:
   * *w*∈ℝ^*d* is the weight vector.
   * *b*∈ℝ is the bias term.


#### **2.2 Loss Function**

To train the model, we need a loss function that measures how well the predicted probabilities match the true labels. Logistic Regression uses the **Binary 

**Cross-Entropy Loss**:

$L(w,b) = -\frac{1}{n}\sum_{i=1}^{n}[y^{(i)}\log(P(y=1|x^{(i)})) + (1-y^{(i)})\log(1-P(y=1|x^{(i)}))]$

Substituting $P(y=1|x^{(i)}) = \sigma(w^Tx^{(i)}+b)$, the loss becomes:

$L(w,b) = -\frac{1}{n}\sum_{i=1}^{n}[y^{(i)}\log(\sigma(w^Tx^{(i)}+b)) + (1-y^{(i)})\log(1-\sigma(w^Tx^{(i)}+b))]$

$\frac{\partial L}{\partial w_j} = \frac{1}{n}\sum_{i=1}^{n}[\sigma(w^Tx^{(i)}+b) - y^{(i)}]x_j^{(i)}$

$\frac{\partial L}{\partial b} = \frac{1}{n}\sum_{i=1}^{n}[\sigma(w^Tx^{(i)}+b) - y^{(i)}]$

#### **2.3 Gradient Descent**


Using these gradients, we update $w$ and $b$ iteratively:

$w_j := w_j - \alpha\frac{\partial L}{\partial w_j}$

$b := b - \alpha\frac{\partial L}{\partial b}$

Where $\alpha$ is the learning rate.


In [4]:
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def initialize_parameters(dim):
    w = np.zeros((dim, 1))
    b = 0.0
    return w, b

def propagate(w, b, X, Y):
    m = X.shape[1]

    # Forward Propagation
    A = sigmoid(np.dot(w.T, X) + b)
    cost = -1/m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))

    # Backward Propagation
    dw = 1/m * np.dot(X, (A - Y).T)
    db = 1/m * np.sum(A - Y)

    grads = {"dw": dw, "db": db}

    return grads, cost

def optimize(w, b, X, Y, num_iterations, learning_rate):
    for i in range(num_iterations):
        grads, cost = propagate(w, b, X, Y)
        dw = grads["dw"]
        db = grads["db"]

        w = w - learning_rate * dw
        b = b - learning_rate * db

        if i % 100 == 0:
            print(f"Cost after iteration {i}: {cost}")

    params = {"w": w, "b": b}
    return params

def predict(w, b, X):
    m = X.shape[1]
    Y_prediction = np.zeros((1, m))
    w = w.reshape(X.shape[0], 1)

    A = sigmoid(np.dot(w.T, X) + b)

    for i in range(A.shape[1]):
        Y_prediction[0, i] = 1 if A[0, i] > 0.5 else 0

    return Y_prediction

def model(X_train, Y_train, num_iterations=2000, learning_rate=0.5):
    w, b = initialize_parameters(X_train.shape[0])
    parameters = optimize(w, b, X_train, Y_train, num_iterations, learning_rate)
    w = parameters["w"]
    b = parameters["b"]
    return w, b

#### Let’s test the implementation on a simple dataset:

In [5]:
# Example usage
if __name__ == "__main__":
    # Example dataset
    X_train = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
    Y_train = np.array([[0, 1, 0, 1]])

    # Train the model
    w, b = model(X_train.T, Y_train, num_iterations=2000, learning_rate=0.5)

    # Make predictions
    predictions = predict(w, b, X_train.T)
    print("Predictions:", predictions)

Cost after iteration 0: 0.6931471805599453
Cost after iteration 100: 2.638934186852725
Cost after iteration 200: 1.0562398391277186
Cost after iteration 300: 0.7594047322538188
Cost after iteration 400: 0.8867141066305116
Cost after iteration 500: 0.6531997430225687
Cost after iteration 600: 1.024434540112927
Cost after iteration 700: 1.0329198779365159
Cost after iteration 800: 0.9518824441374012
Cost after iteration 900: 2.569520318590931
Cost after iteration 1000: 0.7184225918680391
Cost after iteration 1100: 0.7255740571535026
Cost after iteration 1200: 0.6646253753098406
Cost after iteration 1300: 0.752144183333699
Cost after iteration 1400: 2.9093590392223088
Cost after iteration 1500: 1.1320000403016182
Cost after iteration 1600: 2.60781689099062
Cost after iteration 1700: 0.7410914097270515
Cost after iteration 1800: 1.3161173113611984
Cost after iteration 1900: 0.677039916993291
Predictions: [[0. 0. 1. 1.]]
