# Logistic Regression

- It's a classification technique based on the linear combination between X variables.
- Also called _logit_.

ex: $f(X) = \alpha + b_1x_1 + b_2x_2 +\dots + b_nx_n$.

## Sigmoid Function
- Sigmoid is a special function used in logistic regression
- Logistic regression's goal is to find $p($**x**$)$ which the **y** = $p(x_i)$ stay closest to the real values $y_i$ for each sample $i = 1, 2, \dots, n$
- Here, we are going to deal with binary classification (at the first moment)
- The outputs should be only $0$ or $1$. 


## Linear combination **x** Sigmoid function

- Logistic regression determines the best values for bias $\alpha$ and weights $b_1, b_2, \dots, b_n$
- The function $p($**x**$)$, then, should ne closer to the real value
- _fitting_ is the optimization process to find the best hyerparameters

In this case, the input **x** to the Sigmoid function ($p$) is the output from the linear combination $f(X) = \alpha + b_1x_1 + b_2x_2 +\dots + b_nx_n$.

## How to tune the bias and the weights?

- Max (_log-likelihood_ ) is used to find the best weights
    - This method is called _maximum likelihood estimation_ - MLE which is represented by:

\begin{equation}
MLE = \sum_{i=1}^n(y_i log(p(x_i)) + (1 − y_i) log(1 − p(x_i))).
\end{equation}

- `Scikit-learn` has great mathemathical ways to maximize the hyperparameters showed above:
   - solver : {'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'}, 
   - default='lbfgs'

   - Algorithm to use in the optimization problem.

       - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones.
      - For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs' handle multinomial loss; 'liblinear' is limited to one-versus-rest schemes.
      - 'newton-cg', 'lbfgs', 'sag' and 'saga' handle L2 or no penalty
      - 'liblinear' and 'saga' also handle L1 penalty
      - 'saga' also supports 'elasticnet' penalty
      - 'liblinear' does not support setting ``penalty='none'``

## Implementation:
# 1) Binary classification with one input variable:

In [1]:
# Importing the necessary libs
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

In [2]:
# Generating random input values
x = np.arange(10) 
print(x)
x.shape      #Note the shape here, we need to reshape to use this array in the LogisticRegression instance

[0 1 2 3 4 5 6 7 8 9]


(10,)

In [4]:
x = x.reshape(-1, 1)       #after this reshaping we won't get any error message

# Generating the classes [0, 1] for each sample
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

print(f'x = {x}')
print(f'y = {y}')

x = [[0]
 [1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]]
y = [0 0 0 0 1 1 1 1 1 1]


In [7]:
# Instancing the model LogisticRegression

# solver: used to optimize the parameters
#    - liblinear: best option for SMALL and BINARY datasets.
#    - random_state = x, no matter what number. Just to make sure it will always be the same result
model = LogisticRegression(solver='liblinear', random_state=0)

# Training the model with our data
model.fit(x, y)

# Extracting information from the model
print(f'classes = {model.classes_}')
print(f'bias = {model.intercept_}')
print(f'weights = {model.coef_}')

classes = [0 1]
bias = [-1.04608067]
weights = [[0.51491375]]


In [16]:
# Evaluating the model

# Let's look at the probabilities of each class
probabilities = model.predict_proba(x)
estimated_class = model.predict(x)

print(f'probabilities = \n{probabilities}\n')
print(f'estimated_class = \n {estimated_class}\n')
print(f'real_class = \n {y}')

probabilities = 
[[0.74002157 0.25997843]
 [0.62975524 0.37024476]
 [0.5040632  0.4959368 ]
 [0.37785549 0.62214451]
 [0.26628093 0.73371907]
 [0.17821501 0.82178499]
 [0.11472079 0.88527921]
 [0.07186982 0.92813018]
 [0.04422513 0.95577487]
 [0.02690569 0.97309431]]

estimated_class = 
 [0 0 0 1 1 1 1 1 1 1]

real_class = 
 [0 0 0 0 1 1 1 1 1 1]


## If we compare the probabilities with the predicted values..
1) [0.74002157 - 0.259978 ] = [class 0 - class 1]   --->  class 0 has higher probability of beeing the right class
3) [0.5040632 - 0.4959368 ]  = [class 0 - class 1]   --->  the values are really close.

- so, we have a probabilty associated with the choice of the class, and depending on the matter of the problem, we may need that value to make confident decisions.

#### Example: 
   - if we are predicting if the person has cancer, based on her exams, the doctor should say that the pacient is really with cancer if the chance is almost 50/50%? Well, maybe he will think about doing the exam again checking other indicators before saying that to the pacient.
   - on the other hand, if the probability is 95% of the pacient having cancer, the doctor should be alert about it and make an investigation into the pacient's medical record to confirm it.
   

In [14]:
# Accuracy is another important metric to evaluate the model
# - values are between 0 and 1, it's based on the ( correct answers / total )  

print('Having 10 samples, the model correctly predicted 9, so the accuracy is:')
model.score(x, y)

Having 10 samples, the model correctly predicted 9, so the accuracy is:


0.9

In [18]:
# Another way of accessing the accuracy metric:
from sklearn.metrics import accuracy_score
accuracy_score(y, estimated_class)

0.9

## Confusion Matrix

- With the confusion matrix we can retrieve more detailed information from the classification model
- We look at the correct answers, using 4 new metrics:

    - (True negatives - **TN**): (zeros) correctly estimated
    - (True positives - **TP**): (1's) correctly estimated
    - (False negatives - **FN**): (zeros) wrongly estimated
    - (False positives - **FP**): (1's) wrongly estimated
![confusion matrix](https://miro.medium.com/max/2102/1*fxiTNIgOyvAombPJx5KGeA.png)

In [19]:
cm = confusion_matrix(y, estimated_class)
print('CM = ')
print(cm)

CM = 
[[3 1]
 [0 6]]
