# Logistic Regression

Logistic regression is the go-to linear classification algorithm for two-class problems (e.g binary classification). It is easy to implement, easy to understand and gets great results on a wide variety of problems, even when the expectations the method has for your data are violated.


## Resources:

[Logistic Regression Tutorial for Machine Learning](http://machinelearningmastery.com/logistic-regression-tutorial-for-machine-learning/)

[Logistic Regression for Machine Learning](http://machinelearningmastery.com/logistic-regression-for-machine-learning/)

[How To Implement Logistic Regression With Stochastic Gradient Descent From Scratch With Python](http://machinelearningmastery.com/implement-logistic-regression-stochastic-gradient-descent-scratch-python/)

### Description

#### Logistic Regression

Logistic regression is named for the function used at the core of the method, the [logistic function](https://en.wikipedia.org/wiki/Logistic_function).

The logistic function, also called the **Sigmoid function** was developed by statisticians to describe properties of population growth in ecology, rising quickly and maxing out at the carrying capacity of the environment. It’s an S-shaped curve that can take any real-valued number and map it into a value between 0 and 1, but never exactly at those limits.

$$\frac{1}{1 + e^{-x}}$$

$e$ is the base of the natural logarithms and $x$ is value that you want to transform via the logistic function.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn
%matplotlib inline

In [None]:
# ploting the sigmoid function
x = np.linspace(-6, 6, num = 100)
plt.figure(figsize = (10,10))
plt.plot(x, 1 / (1 + np.exp(-x))); # Sigmoid Function
plt.title("Sigmoid Function");

***

The logistic regression equation has a very simiar representation like linear regression. The difference is that the output value being modelled is binary in nature.

$$\hat{y}=\frac{e^{\beta_0+\beta_1x_1}}{1+\beta_0+\beta_1x_1}$$

or

$$\hat{y}=\frac{1.0}{1.0+e^{-\beta_0-\beta_1x_1}}$$

$\beta_0$ is the intecept term

$\beta_1$ is the coefficient for $x_1$

$\hat{y}$ is the predicted output with real value between 0 and 1. To convert this to binary output of 0 or 1, this would either need to be rounded to an integer value or a cutoff point be provided to specify the class segregation point.

***

# Making Predictions with Logistic Regression

In [None]:
# this is our dataset !
# when we see the first values of dataset, we achieve the boolean result (0 & 1)
dataset = [[-2.0011, 0],
           [-1.4654, 0],
           [0.0965, 0],
           [1.3881, 0],
           [3.0641, 0],
           [7.6275, 1],
           [5.3324, 1],
           [6.9225, 1],
           [8.6754, 1],
           [7.6737, 1]]

Let's say you have been provided with the coefficient which computed with some mL algorithm for predicting the y value using the x's. xD

in other words, we were given $\beta_0$ and $\beta_1$

In [None]:
coef = [-0.806605464, 0.2573316]

In [None]:
# checking how well our precomputed coefficients works! (using sigmoid logestic function)
for row in dataset:
    yhat = 1.0 / (1.0 + np.exp(- coef[0] - coef[1] * row[0]))
    print("yhat {0:.4f}, yhat {1}".format(yhat, round(yhat)))

***

in the real world, we are not gonna have $\beta_0$ and $\beta_1$ coefficients and we are use to achieve it. it's a cruel world actually. the crap below is mostly about approaches of finding $\beta_0$ and $\beta_1$ coefficients. there is approaches like "maximum-likelihood estimation" and "gradient descent".

# Learning the Logistic Regression Model

The coefficients (Beta values b) of the logistic regression algorithm must be estimated from your training data. This is done using [maximum-likelihood estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation).

Maximum-likelihood estimation is a common learning algorithm used by a variety of machine learning algorithms, although it does make assumptions about the distribution of your data (more on this when we talk about preparing your data).

The best coefficients would result in a model that would predict a value very close to 1 (e.g. male) for the default class and a value very close to 0 (e.g. female) for the other class. The intuition for maximum-likelihood for logistic regression is that a search procedure seeks values for the coefficients (Beta values) that minimize the error in the probabilities predicted by the model to those in the data (e.g. probability of 1 if the data is the primary class).

We are not going to go into the math of maximum likelihood. It is enough to say that a minimization algorithm is used to optimize the best values for the coefficients for your training data. This is often implemented in practice using efficient numerical optimization algorithm (like the Quasi-newton method).

When you are learning logistic, you can implement it yourself from scratch using the much simpler gradient descent algorithm.

# Learning with Stochastic Gradient Descent

Logistic Regression uses gradient descent to update the coefficients.

Each gradient descent iteration, the coefficients are updated using the equation:

$$\beta=\beta+\textrm{learning rate}\times (y-\hat{y}) \times \hat{y} \times (1-\hat{y}) \times x $$


***

# Using Scikit Learn to Estimate Coefficients by itself

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
# where is my money bitch?!
dataset

In [None]:
# deviding dataset into two variables
X = np.array(dataset)[:, 0:1]
y = np.array(dataset)[:, 1]

In [None]:
# what X values gonna look like?!
X

In [None]:
# so as Y values...
y

In [None]:
# instantiating the regression module
clf_LR = LogisticRegression(C=1.0, penalty='l2', tol=0.0001)

In [None]:
# fitting the model
clf_LR.fit(X,y)

In [None]:
# and predict Y using X values and the model we have learned before
clf_LR.predict(X)

***

# Classification Exercise

In [None]:
dataset2 = [[ 0.2,  0. ],
            [ 0.2,  0. ],
            [ 0.2,  0. ],
            [ 0.2,  0. ],
            [ 0.2,  0. ],
            [ 0.4,  0. ],
            [ 0.3,  0. ],
            [ 0.2,  0. ],
            [ 0.2,  0. ],
            [ 0.1,  0. ],
            [ 1.4,  1. ],
            [ 1.5,  1. ],
            [ 1.5,  1. ],
            [ 1.3,  1. ],
            [ 1.5,  1. ],
            [ 1.3,  1. ],
            [ 1.6,  1. ],
            [ 1. ,  1. ],
            [ 1.3,  1. ],
            [ 1.4,  1. ]]

In [None]:
X = np.array(dataset2)[:, 0:1]
y = np.array(dataset2)[:, 1]

In [None]:
clf_LR = LogisticRegression(C=1.0, penalty='l2', tol=0.0001)

clf_LR.fit(X,y)

In [None]:
y_pred = clf_LR.predict(X)
clf_LR.predict(X)

In [None]:
np.column_stack((y_pred, y))

***

as we see, the model predicted the values 100% correct.

***