# Logistic Regression

## Types of Logistic Regression

### Binomial Logistic Regression (binary)

As the name suggest, binary is when we have just two classes. E.g. "Yes" or "No", "win" or "loss", "dead" or "alive".

The logistic regression model transforms the linear regression function continuous value output into categorical value output using a sigmoid function.

![image.png](attachment:image.png)


Sigmoid maps any real-valued set of independent variables input into a value between 0 and 1.

### Explanation, starting with Linear Regression until Logistic Regression


You have your data as    ![image.png](attachment:image-2.png)

And for y you have binary values (e.g. 0 and 1)

When you apply the multi-linear function to the input variables X, you have:
                            ![image.png](attachment:image.png)

Where xi is the observation of X, and wi is the coefficient or weights and b is the intercept or bias.

What that means? If we simplify, it can be represented as the dot product of the coefficient and intercept.

                                  z = w . X + b

Rings you a bel?

![image.png](attachment:image-3.png)


Until here it is just the principle of the Linear regression.

Then we use z as the input in the sigmoid function to return a normalized value between 0 and 1.

![image.png](attachment:image.png)


### Practical example of binomial Logistic Regression

Here we will develop a Logistic Regression model for breast cancer.



In [None]:
# import the necessary libraries



We will use this data: https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic

You can donwload it from sklearn.datasets importing the load_breast_cancer.

In the website you can see that there is another option to download the data, use what you prefer.

In [None]:
from sklearn.datasets import load_breast_cancer
# load the breast cancer dataset
X, y = load_breast_cancer(return_X_y=True)

In [None]:
X

In [None]:
y

Split your data in train and test set

Load the Logistic Regression and then fit it in your data

In [None]:
# LogisticRegression


Predict your data set

In [None]:
# Prediction


Look at your predictions.

- Is it already applied the Sigmoid function? 
- What you need to do?
- Is it ready to compare with your target?

In [None]:
y_pred

In [None]:
y_test

Calcule the accuracy of your model

## Multinomial Logistic Regression

Now, your target can have more than two classes. Then  you use Multinomial Logistic Regression, one example could be “disease A” vs “disease B” vs “disease C”.

#### What changes from Binomial Logistic Regression?


Instead of using **Sigmoid funtion** we will use **Softmax function**!

![image.png](attachment:image.png)

Needs more information about this? https://deepai.org/machine-learning-glossary-and-terms/softmax-layer


For this problem we will use a data set based in hand written numbers. 

The hand written numbers are 32x32 bitmaps, they are divided into nonoverlapping blocks of 4x4 and the number of on pixels are counted in each block. 

This generates an input matrix of 8x8 where each element is an integer in the range 0..16. This reduces dimensionality and gives invariance to small distortions.

What that means?

![image.png](attachment:image.png)

As you can see in the image above, we have 32x32 small squares in the image. Then each 4x4 squares were merged to represent one data point in our data. 

Maybe you are questioning, How??

They are represented by a integer, this integer is the "darkness" of this space in the image. In this data, they have 16 levels of darkness.
So if you try to see the data, it is a matrix 8x8 with numbers from 0 until 16.

Let's call the data

In [None]:
# LOOK AT THE WAY WE IMPORT THE DATA AND THE MODELS, THERE ARE SEVERAL WAYS TO DO IT. 
# You do not need to know all ways, use the one you likes the most

#Import what is necessary
from sklearn import datasets, linear_model, metrics
#from sklearn.datasets import load_digits

digits = datasets.load_digits()
 
# defining feature matrix(X) and response vector(y)
X = digits.data
y = digits.target

Split the data in train and test set

In [None]:
# splitting X and y into training and testing sets


Model

In [None]:
# create logistic regression object

 
# train the model using the training sets


Predict

In [None]:
# making predictions on the testing set


Accuracy

In [None]:
#Add your code here

If you want more details and also have more mathematical intuituin about Logistic Regression, you can check this website: https://www.geeksforgeeks.org/understanding-logistic-regression/


### And what about the parameters of the model?

Logistic Regression from Scikit-Learn has the following parameters:

**penalty**='l2', 

**dual**=False, 

**tol**=0.0001, 

**C**=1.0, 

**fit_intercept**=True, 

**intercept_scaling**=1, 

**class_weight**=None, 

**random_state**=None, 

**solver**='lbfgs', 

**max_iter**=100, 

**multi_class**='auto', 

**verbose**=0, 

**warm_start**=False, 

**n_jobs**=None, 

**l1_ratio**=None

All the parameters are followed by the defauld attribute from the model.

### Do you know what they means?
Write your answer here: 

After you wrote your answer, please read the documentation:

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html


And this one:  (JUST THE 1.1.11 AND THE DEPENDENTS)

https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


---> The two links do not explain the penalty. Below you can find more information about the penalty (also called regularization).

- L2 adds “squared magnitude” of coefficient as penalty term to the loss function. The highlighted part represents L2 regularization element.
![image.png](attachment:image.png)

If lambda is zero then you can imagine we get back OLS (Ordinary Least Squares). However, if lambda is very large then it will add too much weight and it will lead to under-fitting. Having said that it’s important how lambda is chosen. This technique works very well to avoid over-fitting issue.

_________________________________________________________________________


- L1 adds “absolute value of magnitude” of coefficient as penalty term to the loss function.
![image.png](attachment:image-2.png)

Again, if lambda is zero then we will get back OLS whereas very large value will make coefficients zero hence it will under-fit.

____________________________________________________

**The key difference between these techniques is that L1 shrinks the less important feature’s coefficient to zero thus, removing some feature altogether. So, this works well for feature selection in case we have a huge number of features.**

Answer the following questions:

### Question 1: Which parameters would you use for the breast cancer dataset?
(Try to think which ones would be the best, do not change this answer afterwords)

### Question 2: Perform a search (Random Search, Grid Search, ...) in some parameters to tune parameters for the breast cancer dataset.
(Write here the best parameters found by the search)

In [None]:
#Add your code here

### Question 3: Check the model performance with the new parameters. Is there a difference?
(Try to think why there is or there isn't a difference, then write down your thoughts)

In [None]:
#Add your code here

### Question 4: Check the Digits dataset, for this one which parameters would you use?
(Write which ones you believe to be the best, do not change this answer afterwords)

### Question 5: Perform a search (Random Search, Grid Search, ...) in some parameters to tune parameters for the Digits dataset.
(Write here the best parameters found by the search)

In [None]:
#Add your code here

### Question 6: Check the model performance with the new parameters. Is there a difference?
(Try to think why there is or there isn't a difference, then write down your thoughts)

In [None]:
#Add your code here