In [10]:
from IPython.display import HTML
css_file = './custom.css'
HTML(open(css_file, "r").read())

# Logistic Regression

© 2018 Daniel Voigt Godoy

## 1. Definition

Logistic Regression is used for ***classification***, even though its called ***regression***.

Therefore, it works on ***categorical labels***, namely, 0 and 1 for ***binary classification***. 

The ***Logistic Regression*** is a model that makes predictions in the [0, 1] interval, denoting ***probabilities***. Labels of the ***negative class*** are associated with 0, as labels of the ***positive class*** are associated with 1. So, the output is the ***probability of being a sample of the positive class***.

Why is it called ***regression*** then? It actually fits a ***linear regression*** on the features and ***squishes*** the outputs using a ***Logistic / Sigmoid*** function.

$$
\hat{p} = \sigma(z) = \frac{1}{1 + e^{-z}} = \frac{1}{1 + e^{-(b + w_1x_1 + w_2x_2 + \dots + w_nx_n)}}
$$

![sigmoid](https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Logistic-curve.svg/320px-Logistic-curve.svg.png)
<center>Source: Wikipedia</center>

Since its output is a ***probability***, we need to ***threshold*** it to get the predicted class. The default threshold is 0.5:

$$
\hat{y} = 
\begin{cases} 0 &\mbox{if } \hat{p} \lt 0.5 \\
1 & \mbox{if } \hat{p} \geq 0.5
\end{cases}
$$

It is possible to ***change the threshold*** to achieve different goals, as reducing ***false positive*** or ***false negatives*** for instance - the next lesson on evaluation metrics will cover this topic in more depth.

You can observe this behavior on the ***interactive example*** below. The sliders allow you to control the ***bias*** and ***coefficient*** of the simple linear regression that is going to be ***squished*** by the ***sigmoid function***.

In [2]:
from intuitiveml.supervised.classification.LogisticRegression import *

In [3]:
plotLogistic.plot_sigmoid_curve(x=np.linspace(-3, 3, 100))

VBox(children=(FigureWidget({
    'data': [{'line': {'color': 'black'},
              'name': 'Sigmoid',
     …

### 1.1 Loss Function

How do we train the model? Differently from a linear regression, the ***Logistic Regression*** uses ***binary cross entropy*** (also called ***log loss***) as its loss function.

What does it mean? It takes the ***log*** of the probability of ***correctly classifying*** a sample as positive or negative and then average it over all samples. For a single instance:

$$
loss = 
\begin{cases} -log(\hat{p}) &\mbox{if } y = 1 \\
-log(1-\hat{p}) & \mbox{if } y = 0
\end{cases}
$$

And, for all $m$ instances:

$$
J={-\frac{1}{m}\sum_{i=1}^{m}{y^{(i)} \cdot log(\hat{p}(y^{(i)})) + (1-y^{(i)}) \cdot log(1-\hat{p}(y^{(i)}))}}
$$

I've written a very thorough explanation of this loss function on Towards Data Science: [Understanding binary cross-entropy / log loss: a visual explanation](https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a)

## 2. Experiment

Time to try it yourself!

You have 8 data points, either ***green (positive)*** or ***red (negative)***.

There is only ***one feature***, represented on the horizontal axis. The ***y axis*** is the probability output of your ***Logistic Regression***.

You want to start training your logistic regression, so you need to find both the ***bias*** (intercept) $b$ and the ***single weight*** $w_1$ that minimize the ***log loss***.

The sliders below allow you to change both values, and you can observe the effect they have on the distribution of losses (on the upper right plot), as well as the ***log loss***.

Use the slider to play with different configurations and answer the ***questions*** below.

In [4]:
mylr = plotLogistic(x=(-1, 0), n_samples=8, betas=(2, 1))
vb1 = VBox(build_figure_fit(mylr))
vb1.layout.align_items = 'center'

In [5]:
vb1

VBox(children=(FigureWidget({
    'data': [{'line': {'color': 'red', 'dash': 'dash', 'width': 3},
            …

#### Questions

1. What happens to the probabilities as you increase $w_1$? What about the losses?

2. What happens to the probabilities as you increase $b$? What about the losses?

3. Try to ***minimize*** the log loss.

## 3. Scikit-Learn

[Logistic Regression](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression)

Please check Aurelién Geron's "Hand-On Machine Learning with Scikit-Learn and Tensorflow" notebook on Linear Models [here](http://nbviewer.jupyter.org/github/ageron/handson-ml/blob/master/04_training_linear_models.ipynb).

## 4. More Resources

[InfoGraphic](https://github.com/Avik-Jain/100-Days-Of-ML-Code/blob/master/Info-graphs/Day%204.jpg)

## 5. Keras

Just like we did with Linear Regression, we can also build a simple one neuron network to train a ***Logistic Regression***. The model has ***two*** differences:

1. It has a ***sigmoid activation*** (instead of linear)
2. It uses ***binary cross-entropy*** as loss (instead of MSE)

Effectively, it computes:

$$
z = \sigma(b + w_1x)
$$

If you compare the ***weights*** of a model trained using Keras versus a model trained using Scikit-Learn, you'll see they are somewhat different.

This is due to the fact that Scikit-Learn uses a different optimizer and uses regularization by default. This is ***not*** the case of the simple neural network we built using Keras.

```python
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD

model = Sequential()
model.add(Dense(input_dim=1, units=1, activation='sigmoid', kernel_initializer='glorot_uniform'))
model.compile(loss='binary_crossentropy', optimizer=SGD(lr=0.1))
model.fit(m.x, m.y, epochs=100)
```

In [6]:
import warnings
warnings.filterwarnings("ignore")
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD

Using TensorFlow backend.


In [7]:
model = Sequential()
model.add(Dense(input_dim=1, units=1, activation='sigmoid', kernel_initializer='glorot_uniform'))
model.compile(loss='binary_crossentropy', optimizer=SGD(lr=0.1))
model.fit(mylr.x, mylr.y, epochs=100, verbose=False)
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 1)                 2         
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________


```python
model.get_weights()
```

In [8]:
print(model.get_weights())

[array([[1.2090513]], dtype=float32), array([-0.45385373], dtype=float32)]


#### This material is copyright Daniel Voigt Godoy and made available under the Creative Commons Attribution (CC-BY) license ([link](https://creativecommons.org/licenses/by/4.0/)). 

#### Code is also made available under the MIT License ([link](https://opensource.org/licenses/MIT)).

In [9]:
from IPython.display import HTML
HTML('''<script>
  function code_toggle() {
    if (code_shown){
      $('div.input').hide('500');
      $('#toggleButton').val('Show Code')
    } else {
      $('div.input').show('500');
      $('#toggleButton').val('Hide Code')
    }
    code_shown = !code_shown
  }

  $( document ).ready(function(){
    code_shown=false;
    $('div.input').hide()
  });
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>''')