# Logistic Regression

In this section we are going to show how we can solve logistic regression using only NumPy.

In [1]:
import numpy as np

We are dealing with a classification problem and are trying to predict the probability of a sample $i$ to belong to the class **1**, given the feature vector $\mathbf{x}$. For that purpose we are going to fit the weights $\mathbf{w}$ such that $\sigma(z)$ is close to 1, when the true label $y$ is 1 and $\sigma(z)$ is 0 when the $y$ is 0.

$\sigma(z) = \dfrac{1}{1 + e^{-z}}$, where $z = {\mathbf{x}\mathbf{w}^T+b}$

We use the `make_classification` function from sklearn to create a binary classification dataset with 100 samples and 10 features.

In [2]:
import sklearn.datasets as datasets
X, y = datasets.make_classification(n_samples=100, n_features=10)

In [3]:
# we reshape the data from (100,) to (100, 1)
y = y.reshape(100, 1)

We initialize weights $\mathbf{w}$ and the bias $b$ using the standard normal distribution.

In [4]:
w = np.random.randn(1, 10)
b = np.random.randn(1, 1)

Very few hyperparameters are required for logistic regression. The learning rate and the number of epochs are sufficient in our case.

In [5]:
# hyperparameters
alpha = 0.1
epochs = 100

Below we implement logistic regression.

1. We start by calculating $\sigma(\mathbf{z})$, the probability vector. Each of $n$ elements calculates the probability that the element $i$ belongs to category 1.
2. We calcualte the partial derivatives based on the cross-entropy loss $H = \dfrac{1}{n} \sum_i^n -\Big[y^{(i)} \ln\big(\sigma(z^{(i)})\big) + \big(1 - y^{(i)}\big)\ln\big(1 - \sigma(z^{(i)})\big)\Big] $

$\dfrac{\partial H}{\partial \sigma^{(i)}} = - \Big(y^{(i)} \dfrac{1}{\sigma^{(i)}} - (1 - y^{(i)}) \dfrac{1}{1 - \sigma^{(i)}} \Big)$

$\dfrac{\partial \sigma^{(i)}}{\partial z^{(i)}} = \sigma^{(i)}(1 - \sigma^{(i)})$

$\dfrac{\partial z^{(i)}}{\partial w_j^{(i)}} = x_j^{(i)}, \dfrac{\partial z^{(i)}}{\partial b^{(i)}} = 1$

3. And we apply the chain rule
$\dfrac{\partial H}{\partial w_j} = \dfrac{1}{n}\sum^n_i \dfrac{\partial H}{\partial \sigma^{(i)}} \dfrac{\partial \sigma^{(i)}}{\partial z^{(i)}} \dfrac{\partial z^{(i)}}{\partial w^{(i)}_j}$

3. Finally we apply batch gradient descent $\mathbf{w} := \mathbf{w} - \alpha \mathbf{\nabla}$ and $b := b - \alpha \dfrac{\partial L}{\partial b}$.

In [6]:
for epoch in range(epochs):
    # 1. calculate predicted probabilities
    z = X @ w.T + b
    sigma = 1 / (1 + np.exp(-z))

    if epoch % 10 == 0:
        cross_entropy = -(y * np.log(sigma) + (1-y) * np.log(1 - sigma)).mean()
        print(f"Epoch: {epoch} | Cross Entropy: {cross_entropy:.4f}",)
    
    # 2. calculate the partial derivatives 
    dH_dsigma = -(y * (1 / sigma) - (1 - y) * (1 / (1 - sigma)))
    dsigma_dz = sigma * (1 - sigma)
    dz_dx = X
    dz_db = 1
    
    # 3. apply the chain rule
    grad_w = (dH_dsigma * dsigma_dz * dz_dx).mean(axis=0)
    grad_b = (dH_dsigma * dsigma_dz * dz_db).mean()
    
    # 4. apply batch gradient descent
    w = w - alpha * grad_w
    b = b - alpha * grad_b

Epoch: 0 | Cross Entropy: 0.7766
Epoch: 10 | Cross Entropy: 0.6268
Epoch: 20 | Cross Entropy: 0.5355
Epoch: 30 | Cross Entropy: 0.4750
Epoch: 40 | Cross Entropy: 0.4319
Epoch: 50 | Cross Entropy: 0.3996
Epoch: 60 | Cross Entropy: 0.3743
Epoch: 70 | Cross Entropy: 0.3541
Epoch: 80 | Cross Entropy: 0.3375
Epoch: 90 | Cross Entropy: 0.3238


The loss decreases over the period of 100 epochs.