# Logistic Regression

## Summary

A logistic regression model tries to model $P(Y|X)$. We use a logit transformation to make sure the $P(Y|X)$ is between 0 and 1. The linear function (used inside logit) is trained via maximum likelihood estimation.


## Detailed summary

$$\log(\frac{p(X)}{1 - p(X)}) = \beta_0 + \beta_1 X$$

As $p(X)$ increases, the $\frac{p(X)}{1 - p(X)}$ will increase monotonically. 
As $p(x)$ increases, the $\beta_0 + \beta_1 x$ increases monotonically.
$p(x)$ is always between 0 and 1.

We train this via MLE: describe the distribution of observing $Y | X$ where $$\begin{cases} 0 & \hbox{with probability } 1 - p(x)\\ 1 & \hbox{with probability } p(x) \\\end{cases}$$

where $P(Y=1) = \frac{e^{\beta_0 + \beta_1 x}}{1 + e^{\beta_0 + \beta_1 x}} = p(x)$.

If $Y_i \sim Bernuoilli(p(X_i))$, then $p(Y=k) = p^k(x) (1 - p(x))^{1 - k}$

Given each sample is independent, then the likelihood is defined as the product (for all samples) of $p(Y=k)$

$$L(\hat{\beta}) = \Pi_{i=1}^n p(y_i = k) = \Pi_{i=1}^n \{ p(x_i)^y_i [1 - p(x_i)]^{1 - y_i} \} = \Pi_{i: y_i = 1} p(x_i) \Pi_{i: y_i = 0} [1 - p(x_i)]$$

From here, we can solve normally by taking log-likelihood and derivative equal to zero.

The maximum likelihood is an optimization which allows us to solve $\vec{\beta}$.

The coefficient describes the effect of parameter $X$ on the log-odds of the class.



## Multiple classes



$p_k(X) = Pr(Y=k|X) = \frac{e^{\beta_k^T X}}{\sum_{j=1}^K e^{\beta_k^T X}}$

Let $p_k(x) = e^{\beta_k^T x}$

The log odds of class 2 and 3 can be found by $\beta_k^\star = \beta_k - \beta_K$. 




In [None]:
import numpy as np

def softmax(X):
    return np.exp(X) / np.sum(np.exp(X), axis=0)

class LogisticRegression:
    def __init__(self):
        pass

    def fit(self, X, y, learning_rate=0.01, num_iters=100):
        pass

    def forward(self, X):
        pass


def logistic_regression():
    pass