### Imports

In [1]:
# Data Imports
import numpy as np
import pandas as pd 
from pandas import Series, DataFrame

# Math
import math

# Data visualization imports
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline

# Machine Learning imports
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split

# For evaluating our machine learning results
from sklearn import metrics

# Dataset import 
import statsmodels.api as sm

### Mathematical Derivation of the Algorithm

Logistic regression is a regression model where the response/ target variable is categorical, and the explanatory
variables are either categorical or continuous. The conditional distribution p(y \mid x) is a Bernoulli 
distribution rather than a Gaussian distribution as in the linear regression case, because the dependent variable 
is binary. The estimated probabilities are restricted to [0,1] through the logistic distribution function because 
logistic regression predicts the probability of the instance being positive. The logistic function for a simple
logistic regression is given by:
\begin{aligned}
p = \frac{e^{\beta_0 + \beta_1 x}}{1 + e^{\beta_0 + \beta_1 x}}
\end{aligned}

$\beta_0$ and $\beta_1$ are the model parameters to be estimated using the log likelihood method. Another way to
write the logistic regression is as follows:
\begin{aligned}
p = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}}
\end{aligned}
The above equation was derived after multiplying numerator and denominator by $e^{-(\beta_0 + \beta_1 x)}$.

The logistic function is bounded between 0 and 1 as is shown below:
$$\lim_{x\to -\infty} \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}} = 0$$
And
$$\lim_{x\to \infty} \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}} = 1$$
Since $\lim_{x\to -\infty} e^{-u} = \infty$ and $\lim_{x\to \infty} e^{-u} = 0$ where u is a function of x.

Thus, the logistic regression function is a probability mass function!

With some algebra, one can show that odds is: $$\frac{p}{1-p} = e^{\beta_0 + \beta_1 x}$$

The odds ratio for a single predictor is $e^{\beta_1}$ i.e. it is the odds multiplied by $e^{\beta_1}$ for every 
1-unit increase in $x$. 

**Note**: Odds ratio is given as follows: $$OR = \frac{odds(x+1)}{odds(x)}$$

#### Deviance

Deviance is used to measure the lack of fit to the data in a logistic regression model. Deviance is given as: 

$$D = -2ln\frac{likelihood of the fitted model}{likelihood of the saturated model}$$

Smaller D's indicate better fit as the model deviates less from the saturated model. If the saturated model is not
available, then deviance is: 

$$D = -2ln(likelihood of the fitted model)$$