The optimization problem of logistic regression is a binary classification problem. Given a set of features $x$, we want to predict whether a sample belongs to one of two classes, which we'll call "positive" and "negative", i.e. $y=1$ or $0$. The loss of the problem is the sigmoid cross entropy loss, which has the following form
$$L(θ) = -\sum_{i} \big[y_i log(\hat{y}_i(θ) + (1 - y_i) log(1 - \hat{y}_i(θ)\big]$$
where in our case $\hat{y} = \sigma (x^T\theta)$ and $\sigma$ is the sigmoid function, $z = x^T\theta$ is called logit

#Creating a Dataset

To create a dataset, we'll use scikit-learn's make_classification function, which generates a random n-class classification problem. We'll use two classes (n_classes=2) and generate 1000 samples with 10 features (n_features=10).

In [3]:
from sklearn.datasets import make_classification
import numpy as np

X, y = make_classification(n_samples=100, n_features=10, n_classes=2, class_sep=0.3, random_state=42)
# Add intercept column to X (for bias)
X = np.hstack((np.ones((X.shape[0], 1)), X))
# make y matrix
y = y[:, np.newaxis]
print(X.shape, y.shape)

(100, 11) (100, 1)


Next, we define a gradient descent function, i.e. apply the gradients and update the parameter values iteratively

In [1]:
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_cross_entropy(y, y_pred):
    '''
    y: labels
    y_pred: probabilistic output in [0,1]
    '''
    assert y.shape == y_pred.shape, "label and prediction shapes should be equal"
    L = -(np.mean(y*np.log(y_pred) + (1-y)*np.log(1-y_pred)))
    return L

def logistic_regression(X, y, learning_rate=0.01, num_iterations=1000):
    m, n = X.shape
    theta = np.zeros((n, 1))
    for i in range(num_iterations):
        z = np.dot(X, theta)
        y_pred = sigmoid(z)
        gradient = np.dot(X.T, (y_pred - y)) / m
        theta = theta - learning_rate * gradient
        loss = sigmoid_cross_entropy(y, y_pred)
        print(f"Step {i}, Loss: {loss}")
    return theta


Now let's evaluate the model

In [4]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

theta = logistic_regression(X_train, y_train, num_iterations = 1000)
y_pred = sigmoid(np.dot(X_test, theta)) > 0.5
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

Step 0, Loss: 0.6931471805599453
Step 1, Loss: 0.692322107826435
Step 2, Loss: 0.6915025000368387
Step 3, Loss: 0.6906883095760641
Step 4, Loss: 0.6898794891918815
Step 5, Loss: 0.6890759919945875
Step 6, Loss: 0.6882777714565933
Step 7, Loss: 0.6874847814119394
Step 8, Loss: 0.6866969760557391
Step 9, Loss: 0.6859143099435524
Step 10, Loss: 0.6851367379906931
Step 11, Loss: 0.684364215471468
Step 12, Loss: 0.6835966980183562
Step 13, Loss: 0.6828341416211237
Step 14, Loss: 0.6820765026258794
Step 15, Loss: 0.6813237377340735
Step 16, Loss: 0.6805758040014388
Step 17, Loss: 0.6798326588368785
Step 18, Loss: 0.6790942600013012
Step 19, Loss: 0.6783605656064056
Step 20, Loss: 0.6776315341134146
Step 21, Loss: 0.6769071243317641
Step 22, Loss: 0.6761872954177452
Step 23, Loss: 0.6754720068731019
Step 24, Loss: 0.6747612185435872
Step 25, Loss: 0.6740548906174775
Step 26, Loss: 0.6733529836240489
Step 27, Loss: 0.6726554584320148
Step 28, Loss: 0.6719622762479276
Step 29, Loss: 0.671273398

As a homework try to solve the following problem with logistic regression: Given a dimension n, detect if n dimensional vector is located in n dimensional sphere with a radius equal to 1.