# SLU08 - Classification With Logistic Regression: Exercise notebook

In [None]:
import pandas as pd 
import numpy as np 
import hashlib

In this notebook you will practice the following: 

    - What classification is for
    - Logistic regression
    - Cost function
    - Binary classification
    
You thought that you would get away without implementing your own little Logistic Regression? Hah!


# Exercise 1. Implement the Exponential part of Sigmoid Function


In the first exercise you will implement a bit of the sigmoid function. 

Here's a quick reminder of the formula:

$$\hat{p} = \frac{1}{1 + e^{-z}}$$

In this exercise we only want you to complete this bit: $$e^{-z}$$

Where z will be a two variable linear equation + the intercept: 

$$z = \beta_0 + \beta_1 x_1 + \beta_2 x_2$$

**Hint: Divide youe z into pieces by Betas, I've left the placeholders in there!**

**Complete here:**

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
value_arr = [1, 1, 2, 2, 0.4]

exponential= exponential_z(
    value_arr[0], value_arr[1], value_arr[2], value_arr[3], value_arr[4])


expected_hash_1 = 'dde84a2ea5e05e536408c4ae10402420ebd7d8ee1bce6028ac9e0752914891f5'
assert hashlib.sha256(str(exponential).encode('utf-8')).hexdigest() == expected_hash_1

Expected output:

    Exponential part: 0.02

# Exercise 2: Make a Prediction

The next step is to implement a function that receives an observation and returns the predicted probability.

For instance:

$$\hat{p} = \frac{1}{1 + e^{-z}}$$

Where Z is the linear equation - you can't use the same function that you used above for the Z part as the input are now two arrays, one with the train example and another with the coefficients.

**Complete here:**

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
x = np.array([-1, -1])
coefficients = np.array([0 ,3.2, -1])

expected_hash_2 = 'aef234a36241b1803c64b1e12a9c77eb4934592967c002eafb0802c2f4e3abf7'
assert hashlib.sha256(str(predict_proba(x, coefficients)).encode('utf-8')).hexdigest() == expected_hash_2

x_1 = np.array([-1, -1, 2, 0])
coefficients_1 = np.array([0 ,2, -1, 0.2, 0])

expected_hash_3 = '15430ee47ba42ef9fd64b5a3aab25407a23c1545c6479b91751f7761e809db7f'
assert hashlib.sha256(str(predict_proba(x_1, coefficients_1)).encode('utf-8')).hexdigest() == expected_hash_3

Expected output:

    Predicted probabilities for example with 2 variables:  0.0975
    
    Predicted probabilities for example with 3 variables:  0.3545

# Exercise 3: Compute the Maximum Log-Likelihood Cost Function

As you will implement stochastic gradient descent, you only have to do the following for each prediction: 

$$H_{\hat{p}}(y) =  - (y \log(\hat{p}) + (1-y) \log (1-\hat{p}))$$

In the next exercise you will loop through some examples stored in a array and calculate the cost function for the full dataset. Recall that the formula to generalize the cost function across several examples is: 

$$H_{\hat{p}}(y) = - \frac{1}{N}\sum_{i=1}^{N} \left [{ y_i \ \log(\hat{p}_i) + (1-y_i) \ \log (1-\hat{p}_i)} \right ]$$

You will basically simulate what stochastic gradient descent does - computing the log for each example, sum each log-loss and then averaging the result across the number of observations in the x dataset/array.

**Complete here:**

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
x = np.array([[-1, -1], [3, 0], [3, 2]])
coefficients = np.array([[0 ,2, -1]])
y = np.array([[1],[1],[0]])
expected_hash_4 = '5f34754c780fd8925a4f86ebcb782c31eb6af9b97ce71bce746c43df2fea1c4d'
assert hashlib.sha256(str(cost_function(x, coefficients, y)).encode('utf-8')).hexdigest() == expected_hash_4

x_1 = np.array([[-1, -1], [3, 0], [3, 2], [1, 0]])
y_1 = np.array([[1],[1],[0],[1]])

expected_hash_5 = '8aedf0c2ad0b89e903585870d0c372fd485d84e2464f1ea01191f9a0ddc49a9c'
assert hashlib.sha256(str(cost_function(x_1, coefficients, y_1)).encode('utf-8')).hexdigest() == expected_hash_5




Expected output:
    
    Computed log loss for first training set:  1.77796243
    
    Computed log loss for second training set:  1.36520383

# Exercise 4: Compute a first pass on Stochastic Gradient Descent

Now that the warmup is done, let's do the most interesting exercise - computing the derivatives and updating our coefficients. Here you will do a full pass a bunch of examples, computing the gradient descent for each one of them.

In this exercise, you should compute a single iteration of the gradient descent! 

## Quick reminders:

Remember our formulas for the gradient:

$$\beta_{0(t+1)} = \beta_{0(t)} - learning\_rate \frac{\partial H_{\hat{p}}(y)}{\partial \beta_{0(t)}}$$

$$\beta_{t+1} = \beta_t - learning\_rate \frac{\partial H_{\hat{p}}(y)}{\partial \beta_t}$$

which can be simplified to

$$\beta_{0(t+1)} = \beta_{0(t)} + learning\_rate \left [(y - \hat{p}) \ \hat{p} \ (1 - \hat{p})\right]$$

$$\beta_{t+1} = \beta_t + learning\_rate \left [(y - \hat{p}) \ \hat{p} \ (1 - \hat{p}) \ x \right]$$

You will have to initialize the coefficients in some way. If you have a training set $X$, you can initialize it them to zero, this way:
```python
coefficients = np.zeros(X.shape[1]+1)
```

where the $+1$ is adding the intercept.

Note: We are doing a stochastic gradient descent so don't forget to go observation by observation and updating the coefficients everytime!

**Complete here:**

In [None]:
def compute_coefficients(x_train, y_train, learning_rate = 0.1, verbose = False):
    """ 
    Implementation of a function that returns the a first iteration of 
    batch gradient descent

    Args:
        x_train (np.array): a numpy array of shape (m, n)
            m: number of training observations
            n: number of variables
        y_train (np.array): a numpy array of shape (m,) with 
        the real value of the target
        learning_rate (np.float64): a float

    Returns:
        coefficients (np.array): a numpy array of shape (n+1,)

    """
    
    # Number of observations
    m = x_train.shape[0]
    
    # Number of variables 
    n = x_train.shape[1]

    # initialize the coefficients array with zeros
    # hint: use np.zeros()
    coefficients = np.zeros(x_train.shape[1]+1)
    
    # run the stochastic gradient descent and update the coefficients after 
    # each observation
    for i in range(m):                  
        # compute the predicted probability - you can use a function we have done previously - don't forget about
        # intercept!
        
        observation_i = x_train[i]
        proba = predict_proba(observation_i, coefficients)                    
        # Update intercept:
        coefficients[0] += learning_rate * (y_train[i]-proba)*proba*(1-proba)        
        # Update the rest of the coefficients by looping through the variables
        for col in range(n):
            coefficients[col+1] += learning_rate * (y_train[i]-proba)*proba*(1-proba)*x_train[i, col]    
    
    return coefficients

In [None]:
x_train = np.array([[1,2,3], [2,5,9], [3,1,4], [8,2,9]])
y_train = np.array([0,1,0,1])
learning_rate = 0.1

expected_hash_6 = 'f95e17d924a3e26cae40f026a07c5060de2de80029c1942ee1f3c8a3a9f13d20'
assert hashlib.sha256(str(compute_coefficients(x_train, y_train, learning_rate)[0]).encode('utf-8')).hexdigest() == expected_hash_6

expected_hash_7 = '7e40fa32e8fbfd16907a8d5e2c388a88515cd95a38e4e1d1f15f8620ffdcb077'
assert hashlib.sha256(str(compute_coefficients(x_train, y_train, learning_rate)[1]).encode('utf-8')).hexdigest() == expected_hash_7

expected_hash_8 = 'f45991f62e49913d9887d333bdf7d2c973ba74d32ed440b64224acb7f5a2de36'
assert hashlib.sha256(str(compute_coefficients(x_train, y_train, learning_rate)[2]).encode('utf-8')).hexdigest() == expected_hash_8

expected_hash_9 = '1882ce141295f7bdf9685a4243521842d835dba9485b503da4a1a22a3a4b2384'
assert hashlib.sha256(str(compute_coefficients(x_train, y_train, learning_rate)[3]).encode('utf-8')).hexdigest() == expected_hash_9

x_train_1 = np.array([[4,5,2,7], [2,5,7,2], [3,1,2,1], [8,2,9,5], [1,2,9,4]])
y_train_1 = np.array([0,1,0,1,1])

expected_array = '9bf566f2c5521b165b3cb8bc67b7b2bea3c36f791086a0ac7134e82e8a6a2b13'
assert hashlib.sha256(compute_coefficients(x_train_1, y_train_1, learning_rate)).hexdigest() == expected_array


# Exercise 5: Normalize Data

To get this concept in your head, let's do a quick and easy function to normalize the data using a MaxMin approach. It is crucial that your variables are adjusted between $[0;1]$ (normalized) or standardized so that you can correctly analyse some logistic regression coefficients for your possible future employer.

You only have to implement this formula

$$ x_{normalized} = \frac{x - x_{min}}{x_{max} - x_{min}}$$

Don't forget that the `axis` argument is critical when obtaining the maximum, minimum and mean values! As you want to obtain the maximum and minimum values of each individual feature, you have to specify `axis=0`. Thus, if you wanted to obtain the maximum values of each feature of data $X$, you would do the following:

```python
X_max = np.max(X, axis=0)
```

Not an assertable question but can you remember why it is important to normalize data for Logistic Regression?

**Complete here:**

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
data = np.array([[7,7,3], [2,2,11], [9,5,2], [0,9,5], [10,1,3], [1,5,2]])
normalized_data = normalize_data(data)
print('Before normalization:')
print(data)
print('\n-------------------\n')
print('After normalization:')
print(normalized_data)

Expected output:
    
    Before normalization:
    [[ 7  7  3]
     [ 2  2 11]
     [ 9  5  2]
     [ 0  9  5]
     [10  1  3]
     [ 1  5  2]]

    -------------------

After normalization:

    [[0.7        0.75       0.11111111]
     [0.2        0.125      1.        ]
     [0.9        0.5        0.        ]
     [0.         1.         0.33333333]
     [1.         0.         0.11111111]
     [0.1        0.5        0.        ]]

In [None]:
data = np.array([[2,2,11,1], [7,5,1,3], [9,5,2,6]])
normalized_data = normalize_data(data)
assert hashlib.md5(normalized_data).hexdigest() == '277af6f0e6721a66ca19931c726aeb86'

data = np.array([[1,3,1,3], [9,5,3,1], [2,2,4,6]])
normalized_data = normalize_data(data)
assert hashlib.md5(normalized_data).hexdigest() == '2548d399591b9f950ab5fed9cc89a4e5'

# Exercise 6: Training a Logistic Regression with Sklearn

In this exercise we will load the Titanic dataset and try to use the available numerical variables to predict the probability of a person surviving the titanic sinking.

Prepare to use your sklearn skills!

In [None]:
# We will load the dataset for you
titanic = pd.read_csv('data/titanic.csv')

In this exercise you need to do the following: 
    - Create an array with the target variable (Survived)
    - Create an array with the X numeric variables (Pclass, Age, Siblings/Spouses Aboard, Parents/Children Aboard and Fare)
    - Scale all the X variables.
    - Fit a logistic regression for maximum of 100 epochs and random state = 100.
    - Return an array of the predicted probas and return the coefficients
    
After this, feel free to explore your predictions! As a bonus why don't your construct a decision boundary using two variables eh? ;) 

In [None]:
from sklearn.linear_model import LogisticRegression

def train_model(dataset):
    '''
    Returns the predicted probas and coefficients 
    of a trained logistic regression on dataset.
    
    Args:
        dataset(pd.DataFrame): dataset to train on.
    
    Returns:
        probas (np.array): Array of floats with the probability 
        of surviving for each passenger
        coefficients (np.array): Returned coefficients of the 
        trained logistic regression.
    '''
    
    # Get the Survived variable for y
    y = titanic.Survived
    
    # Select the Numerical variables for X 
    X = titanic[['Pclass','Age','Siblings/Spouses Aboard',
       'Parents/Children Aboard', 'Fare']]
    
    # Scale the X dataset - you can use a function we have already
    # constructed or resort to the sklearn implementation
    X_norm = normalize_data(X)
    
    # Define logistic regression from sklearn with the hyperparameters 
    # defined above - also add random_state = 100
    
    # Hint: for epochs look at the max_iter hyper param!
    lr = LogisticRegression(max_iter=100, random_state = 100)
    
    # Fit logistic
    lr.fit(X_norm, y)
    
    # Obtain probability of surviving
    probas = lr.predict_proba(X_norm)[:,1]
    
    # Obtain Coefficients from logistic regression
    # Hint: see the sklearn logistic regression documentation
    # if you do not know how to do this
    # No need to return the intercept, just the variable coefficient!
    coef = lr.coef_
    
    return probas, coef
    

In [None]:
probas, coef = train_model(titanic)

In [None]:
lr_1_hash = 'fed49f7123774be4f63b89f8f05d78e2a37b813a1fcc4e594f8349589a156b75'
assert hashlib.sha256(probas.round(2)).hexdigest() == lr_1_hash

lr_2_hash = '3f90397ddc3abdec0374caf926d93d8377ec2411e05c6f6da5b17ce0d430c1c5'
assert hashlib.sha256(coef.round(2)).hexdigest() ==  lr_2_hash