# Logistic Regression

#### When

If the target variable is binary it is logical to model probabilities for the 2 events. 

When modelling probabilities it is not appropriate to use OLS, since this method does not take into account that probabilities need to be between 0 and 1. 

Often when you model probabilities it is better to use for example logistic regression or probit.

#### The Model

$P[Y=1|x]=F(x^T\beta)$, for probit $F(z)=\Phi(z)$ and for logistic regression $F(z)=\frac{e^z}{1+e^z}=\frac{1}{e^{-z}+1}$, $z=h(x)=x^T\beta$.

#### Assumptions

Logistic regression does not require distributional assumptions, it only requires that the log-odds function is linear in the features.

Log-odds: $log(R)=log(\frac{p_1}{p_0})=log(\frac{\frac{e^z}{1+e^z}}{\frac{1}{1+e^z}})=z=x^T\beta$.

#### From Likelihood to Loss Function

From the model is easy to see that the likelihood function for logistic regression is given by: $L(\beta)=\prod_{i=1}^n(1-F(\beta^Tx_i))^{(1-y_i)} \text{ }F(\beta^Tx_i)^{y_i}$.

And from this it is easy to see that the log likelihood is given by: $l(\beta)=\sum_{i=1}^n(1-y_i)\text{  }log((1-F(\beta^Tx_i)))+ y_i \text{ }log(F(\beta^Tx_i))$.

A logical approach is to maximize the (log) likelihood. One way to do this is by minimizing a cost function that is equal to -log likelihood:

$J(\beta)=-\frac{1}{n}\sum_{i=1}^n(1-y_i)\text{  }log((1-F(\beta^Tx_i)))+ y_i \text{ }log(F(\beta^Tx_i))$

#### Gradient Descent

This minimization of the loss function is here done by gradient descent.

The update rule: $\hat{\beta_j}^{new} = \hat{\beta_j}^{old} - \alpha \frac{\delta}{\delta \beta_j}J(\beta)|_{\beta = \hat{\beta}\text{ }^{old}}=\hat{\beta_j}^{old} - \alpha \frac{1}{n}\sum_{i=1}^n(\frac{1}{e^{-x^T\beta}+1}-y_{i,j})x_{i,j}|_{\beta = \hat{\beta}\text{ }^{old}}$, where $\alpha$ is the learning rate.

Which is the same as: $w_j = w_j - \alpha \cdot dw_j= w_j - \alpha \cdot \frac{1}{n} \sum_{i=1}^n 2 x_{i,j} (\hat{y}-y_i)$ and $b= b-\alpha \cdot db= b - \alpha \cdot \frac{1}{n} \sum_{i=1}^n 2 (\hat{y}-y_i)$, here b is sometimes called the bias or the estimated coefficient of the constant.

#### Import Libraries

In [1]:
import numpy as np
import pandas as pd

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

#### Make Function

In [2]:
def Accuracy(y_test, y_pred):
    acc = np.sum(y_test == y_pred)/len(y_test)
    return acc

#### Make Class

In [3]:
class LogisticRegression:
    def __init__(self, lr = 0.001, max_iters = 1000):
        self.lr = lr
        self.max_iters = max_iters
        self.weights = None
        self.b = None

    def fit(self, X, y):
        n, p = X.shape
        self.weights = np.zeros(p)
        self.b = 0

        for _ in range(self.max_iters):
            linear_model = np.dot(X, self.weights) + self.b
            y_pred = 1/(1 + np.exp(-linear_model))

            dw = 1/n * np.dot(X.T, (y_pred - y))
            db = 1/n * np.sum(y_pred - y)
            
            self.weights -= self.lr * dw
            self.b -= self.lr * db

    def predict(self, X):
        linear_model = np.dot(X, self.weights) + self.b
        y_pred = 1/(1 + np.exp(-linear_model))
        y_pred_cls = [1 if i >= 0.5 else 0 for i in y_pred]
        return np.array(y_pred_cls)

#### Load Data

In [4]:
br = load_breast_cancer()
X, y = br.data, br.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

#### Train and Predict

In [5]:
clf = LogisticRegression(lr=0.0001, max_iters=10000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

print(f"The accuracy is {Accuracy(y_test, y_pred)}")

The accuracy is 0.8596491228070176
