# 

# <center> Logistic Regression

## Overview

Logistic regression is a supervised learning classification algorithm. The model estimates the probability of an instance belonging to a class. Logistic regression is also a typical Generalized Linear Model, resulting from flexibility of the assumptions of linear regression models.

While in a linear regression model we look for the parameters that best fit a straight line to the data, in logistic regression we look for the parameters that best fit a sigmoid curve to the data.

* #### Motivation

Could we use linear regression for a classification problem where the target variable has two classes? We would get some result, but with some problems.

Linear regression models do not handle classification problems well for two main reasons:

* The estimated line does not fit the data well
* Can have outputs with values less than 0 and greater than 1

The sigmoid function meets these two characteristics well, that is, it is bounded between 0 and 1 and the 'S' shape fits better to the data.

## Some math on Logistic Regression


The estimation of the model is done by estimating the parameters of the function of a sigmoid curve that best fits the data, that is, estimating the parameters that minimize forecasting errors.

Like linear regression, the logistic regression model calculates the weighted sum of input characteristics plus bias. However, instead of generating the output directly as a linear regression model, it generates the logistic output of that model.

First let's see what we want to get and then build the path.

## $ \frac{\partial logL(\beta)}{\partial \beta_{j}} = \frac{1}{m }\sum\limits _{i=1} ^{m}(\sigma(\beta^Tx_{i}) - y_{i})x_{ij} = 0 $  

or

## $ \frac{\partial logL(\beta)}{\partial \beta_{j}} = \frac{1}{m }\sum\limits _{i=1} ^{m}(\frac{e^{\beta^{T}x_{i}}}{1 + e^{\beta^{T}x_{i}}} - y_{i})x_{ij} = 0 $  

Taking the partial derivatives of each parameter for each $ x_i $ is to minimize the cost function.
But, there is no known closed equation for calculating the value of $ \beta $ that minimizes this cost function. However, this function is convex, so applying gradient descent can certainly find the global minimum.

### How to get here?

1 - Sigmoide function

## $ \sigma = \frac{e^x}{1 + e^x} $

---
2 - Logistic function

### $ p(y=1|x) = \frac{e^{\beta^{T}x_{i}}}{1 + e^{\beta^{T}x_{i}}} = \frac{e^{(\hat{\beta_{0}} + \hat{\beta_{1}}x_{1} + ... + \hat{\beta_{n}}x_{n})}}{1 + e^{(\hat{\beta_{0}} + \hat{\beta_{1}}x_{1} + ... + \hat{\beta_{n}}x_{n})}}$ 

### $ p(y0|x) = \frac{1}{1 + e^{\beta^{T}x_{i}}} = \frac{1}{1 + e^{(\hat{\beta_{0}} + \hat{\beta_{1}}x_{1} + ... + \hat{\beta_{n}}x_{n})}} $

--- 

3 - Remember Bernoulli

### $ p(x=y_{i}) = p^{y_{i}}(1-p)^{1-y_{i}}  $
if $ y_i = 1 $ then p

if $ y_i = 0 $ then 1 - p

---
4 - Write Bernoulli for each observation

### $ L(\beta) = \prod\limits _{i=1} ^{n}p^{y_i}(1-p)^{1 - y_i} $

Remember we are looking for the cost function and we will need to differentiate it. So it's easier to replace the multiplications by sum by taking the logarithm.

---
5 - Taking log
### $ logL(\beta) = \sum\limits _{i=1}  ^{n} y_{i}log(p) + (1-y_{i})log(1-p)$

---
6 - Some algebra
### $ logL(\beta) = \sum\limits _{i=1}  ^{n} y_{i}log(\frac{e^{\beta^{T}x_{i}}}{1 + e^{\beta^{T}x_{i}}}) + (1-y_{i})log(\frac{1}{1 + e^{\beta^{T}x_{i}}})$

### $ logL(\beta) = \sum\limits _{i=1}  ^{n} y_{i}log\frac{e^{\beta^{T}x_{i}}}{1 + e^{\beta^{T}x_{i}}} + \sum\limits _{i=1}  ^{n} (1-y_{i})log\frac{1}{1 + e^{\beta^{T}x_{i}}}$

### $ logL(\beta) = \sum\limits _{i=1 y=1}  ^{n} log\frac{e^{\beta^{T}x_{i}}}{1 + e^{\beta^{T}x_{i}}} + \sum\limits _{i=1 y=0}  ^{n} log\frac{1}{1 + e^{\beta^{T}x_{i}}}$

### $ logL(\beta) = \sum\limits _{i=1 y=1}  ^{n} log{e^{\beta^{T}x_{i}}} - log{(1 + e^{\beta^{T}x_{i}})} + \sum\limits _{i=1 y=0}  ^{n} log{1} - log{(1 + e^{\beta^{T}x_{i}})}$

### $ logL(\beta) = \sum\limits _{i=1 y=1}  ^{n} \beta^{T}x_{i} - log{(1 + e^{\beta^{T}x_{i}})} + \sum\limits _{i=1 y=0}  ^{n} 0 - log{(1 + e^{\beta^{T}x_{i}})}$

### $ logL(\beta) = \sum\limits _{i=1 y=1}  ^{n} \beta^{T}x_{i} - log{(1 + e^{\beta^{T}x_{i}})} - \sum\limits _{i=1 y=0}  ^{n} log{(1 + e^{\beta^{T}x_{i}})}$


---
7 - Taking partial derivatives

## $ \frac{\partial logL(\beta)}{\partial \beta} = \sum\limits _{i=1 y=1}  ^{n} x_{i} - \frac{1}{1 + e^{\beta^{T}x_{i}}}x_{i} e^{\beta^{T}x_{i}}) - \sum\limits _{i=1 y=0}  ^{n} \frac{1}{1 + e^{\beta^{T}x_{i}}}x_{i} e^{\beta^{T}x_{i}})$

## $ \frac{\partial logL(\beta)}{\partial \beta} = \sum\limits _{i=1} ^{m}(y_{i} - \frac{e^{\beta^{T}x_{i}}}{1 + e^{\beta^{T}x_{i}}})x_{i} = 0 $  


The output of this function represents the log odds ratio. Therefore, a variation in $ x_i $ of 1 unit modifies the log odds ratio according to the parameter. If, for example, the parameter is 2, a variation of 1 unit in xi would double the log odds ratio.

Taking the antilogarithm in the sigmoid function we will directly have the probability that an instance belongs to a class. If we define a limit to reduce the outputs to 0 and 1, we have a binary classifier.

---

## Imports

In [1]:
# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

import numpy as np
from sklearn.model_selection import train_test_split

___

## Data

In [2]:
# imports
from sklearn import datasets

In [3]:
# Load data
bc = datasets.load_breast_cancer()

# Define X and y
X, y = bc.data, bc.target

# Separate train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)

## From scratch

In [4]:
# imports
from my_LinearModels import my_LogisticRegression, accuracy, confusionMatrix

In [5]:
# Define model
clf = my_LogisticRegression()

# Fit model
clf.fit(X_train, y_train)

# Predictions
predictions = clf.predict(X_test)

# Accuracy
accuracy(predictions, y_test)

0.9210526315789473

In [6]:
# confusion matrix
confusionMatrix(y_test, predictions)

Predicted,0,1,All
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,39,6,45
1,3,66,69
All,42,72,114


___

## From sklearn

In [7]:
# Imports
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report

In [8]:
# Define model
clf = LogisticRegression()

# Fit model
clf.fit(X_train, y_train)

# Predictions
predict = clf.predict(X_test)

# Accuracy
clf.score(X_test, y_test)

0.9385964912280702

In [9]:
# Report 
print(classification_report(y_test, predict))

              precision    recall  f1-score   support

           0       0.97      0.87      0.92        45
           1       0.92      0.99      0.95        69

    accuracy                           0.94       114
   macro avg       0.95      0.93      0.93       114
weighted avg       0.94      0.94      0.94       114



___