# Logistic Regression
Is a classification algorithm (yes his name can be confusing) which means that we only can predict discrete values.

A Logistic Regression is a Linear Regression with an activation function in their output (sigmoid) that return us the probability to zero or one. So is almost the same!!

There are 3 kinds of Logistic regression:
1. Binary Logistic Regression: is a binary classification, for example if your email is 'Spam' or 'Ham'
2. Multinomial Logistic Regression: more than two categories without order, for example if an image is a dog, a cat or a python.
3. Ordinal Logistic Regression: more than two categories with order, for example a ratting from 1 to 5
![linear_regression](https://www.saedsayad.com/images/LogReg_1.png)

## Scikit Learn Logistic Regression
Now we are going to use the linear regression of sklearn no predict data and see all the posibilities that this package offer to us

In [9]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, accuracy_score, f1_score, confusion_matrix, classification_report

%load_ext version_information
%matplotlib inline
%version_information pandas, matplotlib, sklearn

The version_information extension is already loaded. To reload it, use:
  %reload_ext version_information


Software,Version
Python,3.7.4 64bit [GCC 7.3.0]
IPython,7.8.0
OS,Linux 5.0.0 27 generic x86_64 with debian buster sid
pandas,0.25.1
matplotlib,3.1.1
sklearn,0.21.2
Sun Sep 15 18:52:42 2019 CEST,Sun Sep 15 18:52:42 2019 CEST


### 1 Prepare the data to the model
We have the definition of the dataset in the notebook

In [4]:
df_ad = pd.read_csv('data/Advertising.csv')
df_ad.drop(columns='Unnamed: 0', inplace=True)
df_ad['Sales'] = df_ad['Sales'] * 100
df_ad

Unnamed: 0,TV,Radio,Newspaper,Sales,City_size
0,230.1,37.8,69.2,2210.0,big
1,44.5,39.3,45.1,1040.0,small
2,17.2,45.9,69.3,930.0,small
3,151.5,41.3,58.5,1850.0,big
4,180.8,10.8,58.4,1290.0,small
...,...,...,...,...,...
195,38.2,3.7,13.8,760.0,small
196,94.2,4.9,8.1,970.0,small
197,177.0,9.3,6.4,1280.0,small
198,283.6,42.0,66.2,2550.0,big


In [None]:
train, test = train_test_split(df_ad)
X_train = train.drop(columns=['City_size'])
y_train = train['City_size']
X_test = test.drop(columns=['City_size'])
y_test = test['City_size']

### 2. Train the model and test it
First of all we are going to see the hyperparameters that Linear regressiona have

* fit_intercept: True or False if you want a constant $\beta$
* penalty: could be l1/l2/elasticnet/none:
    * L1: makes some $\beta$ zero, so will not affect to the prediction, this is a way of forcer the variable selection
    * L2 or Ridge: makes that the weights of the linear regression (the $\beta$ )
    * elasticnet: use the L1 and L2 penalty and you can choose the percentage of each one
* tol: is the tolerance for the stopping criteria, this tells our algorithm to stop searching for a mininum once some tolerance is achived
* C: is a regularization parameter that apply a penalty in order to reduce the overfitting, smaller vaules specify stronger regularization
* class_weight: balanced/None if you want to adds weights to your data in order to deal with the unbalanced classes
* solver: newton-cg/lbfgs/liblinear/sag/saga the kernel (or the algorithm) that will solve our problem. I need a large article to explain all of these methods. If you want to know more you can visit [this answerd in stackoverflow who's brilliant](https://stackoverflow.com/questions/38640109/logistic-regression-python-solvers-defintions)
* max_iter: maximun number of iterations taken for the solver to converge
* multi_class: ovr/multinomial/auto the way we want to solve the multiclass problem
* warm_start: True/False is used to reduce the time of your kernel, you can use with the following kernerl, lbfgs, newton-cg, sg and saga
* l1_ratio: only if you choose elasticnet regularization, is a number between 0 and 1 that represent the porcentage of l1 and l2 regularization, 0 is equal to l2 regularization and 1 is equal to l1 regularization

In [82]:
LogisticRegression?

In [103]:
lr = LogisticRegression(fit_intercept=True, penalty='l2', tol=1e-5, C=0.8, solver='lbfgs', max_iter=60,
                       warm_start=True)
lr_default = LogisticRegression()

In [104]:
lr.fit(X_train, y_train)
preds_train = lr.predict(X_train)
preds_test = lr.predict(X_test)
print('accuracy in train:', accuracy_score(preds_train, y_train))
print('accuracy in test:', accuracy_score(preds_test, y_test))

accuracy in train: 0.9466666666666667
accuracy in test: 0.92




Here we have a convergencewarning wich means that the algorithm has stop for our max_iter parameter but it not converge, so probably if we fix more iterations we will obtain better result

In [106]:
lr2 = LogisticRegression(fit_intercept=True, penalty='l2', tol=1e-5, C=0.8, solver='lbfgs', max_iter=75,
                       warm_start=True)

lr2.fit(X_train, y_train)
preds_train2 = lr2.predict(X_train)
preds_test2 = lr2.predict(X_test)
print('accuracy in train:', accuracy_score(preds_train2, y_train))
print('accuracy in test:', accuracy_score(preds_test2, y_test))

accuracy in train: 0.9733333333333334
accuracy in test: 0.96


We have improve in almost 3% of train and 4% in test which is great. Now we are going to train the default logistic regression of sklearn.

In [108]:
lr_default.fit(X_train, y_train)
preds_train_default = lr_default.predict(X_train)
preds_test_default = lr_default.predict(X_test)
print('accuracy in train:', accuracy_score(preds_train_default, y_train))
print('accuracy in test:', accuracy_score(preds_test_default, y_test))

accuracy in train: 0.9066666666666666
accuracy in test: 0.92




Here we can see the power of the hyperparameter, if we have a total comprenhension of our dataset and what the hyperparameters do, we can obtain amazing results of our models, we have improve in train a 7% the accuracy of the model.

Now let use more classification report to see how we have improve with the hyperparameters

In [127]:
print('Classification report of our model\n')
print(classification_report(preds_test2, y_test))

Classification report of our model

              precision    recall  f1-score   support

         big       0.88      1.00      0.94        15
       small       1.00      0.94      0.97        35

    accuracy                           0.96        50
   macro avg       0.94      0.97      0.95        50
weighted avg       0.96      0.96      0.96        50



In [128]:
print('Classification report of sklearn default model \n')
print(classification_report(preds_test_default, y_test))

Classification report of sklearn default model 

              precision    recall  f1-score   support

         big       0.82      0.93      0.87        15
       small       0.97      0.91      0.94        35

    accuracy                           0.92        50
   macro avg       0.90      0.92      0.91        50
weighted avg       0.93      0.92      0.92        50

