# Logistic Regression(LR)

## Part I. Theory Overview

### Logistic Distribution

If $X$ is a continuous variable, then the CDF and PDF of logistic distribution can be represented as:

$F(X) = P(X\leq x) = \frac{1}{1+\exp(\frac{-(x-\mu)}{\gamma})}$

$f(X) = F^{'}(X) = \frac{\exp(\frac{-(x-\mu)}{\gamma})}{\gamma(1+\exp(\frac{-(x-\mu)}{\gamma}))^{2}}$

where $F(X)$ is the sigmoid function in binomial case and $f(x)$ is symmetric about $(\mu, \frac{1}{2})$

### Binomial Logistic Regression

#### Conditional Probability

Given input $X$, logistic regression predicts class $Y$ based on the conditional probability $P(Y|X)$. In the binomial case:

$P(Y=1|X) = \frac{\exp(wx+b)}{1+\exp(wx+b)}$

$P(Y=0|X) = \frac{1}{1+\exp(wx+b)}$

#### Odds

Odds describe the ratio between probabilities. In binomial logistic regression, the log odds(logit) is:

$\hspace{5mm} log(\frac{P}{P-1})$

$=log(\frac{P(Y=1|X)}{1-P(Y=1|X)})$

$=log(\exp(wx+b))$

$=wx+b$

Therefore, 

$P(Y=1|X) \rightarrow 1$ as $wx+b \rightarrow \infty$ 

$P(Y=1|X) \rightarrow 0$ as $wx+b \rightarrow -\infty$

#### Model Parameter Estimate

Let: 

$P(Y=1|X)=\pi(x)$ 

$P(Y=0|X)=1-\pi(x)$

The likelihood function thus becomes:

$l(X,Y;\pi) = \prod_{i=1}^{N}\pi(x_{i})^{y_{i}}(1-\pi(x_{i})^{1-y_{i}}$

The log likelihood function is:

$log(l(X,Y;\pi)) = \sum_{i=1}^{N}y_{i}(wx_{i})-log(1+\exp(wx_{i}))\hspace{1mm}$(formula simplification is omitted)

where $\pi(x)$ is assumed to be the $\frac{\exp(wx+b)}{1+\exp(wx+b)}$ for logistic regression model

The goal now is to find $w^{*}$ to maximize the log likelihood by gradient ascend(since we are MAXIMIZING the likelihood). The gradient of $L$ w.r.t $w$ is:

$\frac{dL}{dw} = (y_{i}-\pi(x_{i}))x_{i}$

So we update $w$ using:

$w_{t+1} = w_{t} + \alpha(y_{i}-\pi(x_{i}))x_{i}$

where $\alpha$ is the learning rate.

### Multinomial Logistic Regression

We can easily extend the case to K-class multinomial logistic regression. Now the conditional probability function becomes:

$\frac{\exp(w_{k}x+b)}{\sum_{k=1}^{K}\exp(w_{k}x+b)}, k=1,2,...,K$

which is also known as the softmax function. The same optimization strategy used in binomial LR still applies to the multinomial one. Note that here the model implementation becomes more convenient if we use one-hot encoding to represent the label.

## Part II. Logistic Regression as Multiclass Classifier

In this example, we will train a logistic regression model on the Iris dataset. The model performance will be evaluated using cross-validation.

In [1]:
import numpy as np
import glob
import re
import pandas as pd
import os

In [2]:
class MultinomialLR(object):
    def __init__(self, max_iter=100, learning_rate=0.01):
        self.max_iter = max_iter
        self.learning_rate = learning_rate

    def sigmoid(self, W, X):
        return np.exp(np.dot(W, X))/np.sum(np.exp(np.dot(W, X)))
    
    def train(self, X, Y):
        class_num = len(np.unique(Y))
        X = np.append(X, np.ones((len(X),1)), 1)
        Y_one_hot = np.zeros((len(Y), class_num))
        Y_one_hot[np.arange(len(Y)).astype('int32'), Y.astype('int32')] = 1 # The order of class is [0, 1, -1] in the one-hot vector
        self.W = np.zeros((class_num, X.shape[1]))
        for _ in range(self.max_iter):
            for i in range(len(X)):
                dW = (Y_one_hot[i] - self.sigmoid(self.W, X[i])).reshape((-1,1)) * X[i]
                self.W += self.learning_rate * dW
                    
    def predict(self, X):
        if X.ndim <= 1:
            X = X.reshape((1, -1))
        X = np.append(X, np.ones((len(X),1)), 1)
        self.result_list = []
        for i in range(len(X)):
            result = np.dot(self.W, X[i])
            Y_predict = np.argmax(result)
            self.result_list.append(Y_predict)
        return self.result_list
    
    def score(self, Y):
        if Y.ndim <= 1:
            Y = Y.reshape((1, -1))
        correct_predictions = np.equal(Y, self.result_list)
        return len(np.where(correct_predictions)[0])/Y.shape[-1]

In [3]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

In [4]:
X, Y = load_iris(return_X_y=True)
clf = MultinomialLR(max_iter=150, learning_rate=0.005)

for validate_num in range(10):
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
    clf.train(X_train, Y_train)
    predictions = clf.predict(X_test)
    score = clf.score(Y_test)
    print('For the {}th validation, the accuracy is {:.2f}%'.format(validate_num+1, score*100))

For the 1th validation, the accuracy is 96.67%
For the 2th validation, the accuracy is 100.00%
For the 3th validation, the accuracy is 96.67%
For the 4th validation, the accuracy is 100.00%
For the 5th validation, the accuracy is 96.67%
For the 6th validation, the accuracy is 96.67%
For the 7th validation, the accuracy is 96.67%
For the 8th validation, the accuracy is 93.33%
For the 9th validation, the accuracy is 100.00%
For the 10th validation, the accuracy is 93.33%


In [5]:
sk_clf = LogisticRegression(random_state=0, solver='lbfgs', multi_class='multinomial', max_iter=1000)
for validate_num in range(10):
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
    sk_clf.fit(X_train, Y_train)
    score = sk_clf.score(X_test, Y_test)
    print('For the {}th validation, the accuracy is {:.2f}%'.format(validate_num+1, score*100))

For the 1th validation, the accuracy is 100.00%
For the 2th validation, the accuracy is 100.00%
For the 3th validation, the accuracy is 90.00%
For the 4th validation, the accuracy is 93.33%
For the 5th validation, the accuracy is 96.67%
For the 6th validation, the accuracy is 100.00%
For the 7th validation, the accuracy is 96.67%
For the 8th validation, the accuracy is 96.67%
For the 9th validation, the accuracy is 100.00%
For the 10th validation, the accuracy is 96.67%
