# Deep Learning

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

In [None]:
!pip install tensorflow

In [None]:
# get dataset
loans_df =  pd.read_csv('../../data/loans_day3.csv', index_col=0)

In [None]:
#show first 5 lines
loans_df.head()

## Regression

Let's solve the same regression problem of yesterday (predicting loan amount) with a Neural Network!
Fisrt we need to split the data into train and test sets, and **scale** our data. NNs work poorly on unscaled data!

In [None]:
from sklearn.model_selection import train_test_split

# Features
X = loans_df.drop(columns=['loan_amnt'])

#Target
y = loans_df.loan_amnt

#train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [None]:
# Scalling 
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

### Model

We'll use the [Tensorflow Keras package](https://www.tensorflow.org/guide/keras/sequential_model) - very common for DL in Python.

We start by defining the model architecture. Don't forget that the last(output) layer must make sense with the task at hand! Always think of the number of neurons and the proper activation function!

In [None]:
## Deep Learning library tensorflow.keras
from tensorflow.keras.models import Sequential #Standard Model
from tensorflow.keras.layers import Dense #Standard Layers

In [None]:
# Designing model
model = Sequential() # Start model
# Adding layers
model.add(Dense(10, input_dim = X_train.shape[1]))
model.add(Dense(5))
model.add(Dense(1, activation = 'linear')) # Output layer - Activation must be task appropriate

In [None]:
# Checking parameters
model.summary()

When specifying the loss we need to think about the task at hand.

In [None]:
# Define specifications
model.compile(optimizer = 'adam',
              loss = 'mse',
              metrics = ['mae'])

In [None]:
# Training Model
history = model.fit(X_train,
                    y_train,
                    batch_size = 32,
                    epochs = 50,
                    validation_split = 0.3,
                    verbose = 0)

Let's look at how the loss changed overtime! We want to avoid **overfitting** to the train set. ![](https://i.imgur.com/eP0gppr.png)

In [None]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])

In [None]:
model.evaluate(X_test,y_test)

In [None]:
y_pred = model.predict(X_test)

## Classification

Now let's try using NNs to solve our classification problem - predicting whether a loan will be good (1) or bad (0).

In [None]:
# Features
X = loans_df.drop(columns=['loan_status'])

#Target
y = loans_df.loan_status

#train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [None]:
# Scalling 
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Our architecture needs to change in the final layer!

In [None]:
# Designing model
model_2 = Sequential() # Start model
# Adding layers
model_2.add(Dense(10, input_dim = X_train.shape[1]))
#model_2.add(Dense(5))
model_2.add(Dense(1, activation = 'sigmoid')) # Output layer - Activation must be task appropriate

In [None]:
model_2.summary()

In `model.compile()` one can choose different metrics, appropriate to classification to check later.

In [None]:
from tensorflow.keras.metrics import Recall, Precision
model_2.compile(optimizer = 'adam',
                loss = 'binary_crossentropy',
                metrics = ['accuracy',Precision(), Recall()])

In [None]:
# Training Model
history_2 = model_2.fit(X_train,
                        y_train,
                        batch_size = 32,
                        epochs = 50,
                        validation_split = 0.3,
                        verbose = 0)

We can plot any one of the metrics we chose in `compile`. Let's see how the accuracy evolved over training.

In [None]:
plt.plot(history_2.history['accuracy'])
plt.plot(history_2.history['val_accuracy'])

Now, we'll see the performance in the test set.

In [None]:
model_2.evaluate(X_test, y_test)

In [None]:
from sklearn.metrics import confusion_matrix

In [None]:
confusion_matrix(y_test, pred_df.prob >= 0.5)

Remember the default value for our **threshold** for the prediction to be 1 is **0.5**. This is not necessarily the right choice for all tasks. Let's see the distribution of the ouput probabilities.

In [None]:
pred_df = pd.DataFrame({'prob': [el[0] for el in model_2.predict(X_test)],
                        'target': y_test})

In [None]:
sns.boxplot(data = pred_df, y = 'prob', x = y_test)

We can even define a custom metric appropriate to our business scenario and find the threshold that **minimizes** our actual loss.

In [None]:
# Custom metric 
fp_cost = 30000
fn_cost = 10000

def get_cost(thresh):
    cm = confusion_matrix(y_test, pred_df.prob >= thresh)
    return fp_cost*cm[0][1] + fn_cost*cm[1][0]

In [None]:
plt.plot(np.linspace(0.5,0.99),[get_cost(t) for t in np.linspace(0.5,0.99)]);

## BONUS: Image classification

We're going to use DL to solve a classic image recognition task in ML and DL - handwritten digit recognition, using the famous [MNIST](http://yann.lecun.com/exdb/mnist/) dataset.

In [None]:
from tensorflow.keras import datasets

(X_train, y_train), (X_test, y_test) = datasets.mnist.load_data(path="mnist.npz")

In [None]:
plt.imshow(X_train[0], cmap='gray')
plt.show()

A classic way to scale image data is to simply divide data by 255 (maimum intensity of a pixel).

In [None]:
# Preprocessing (Scaling)
X_train = X_train / 255.
X_test = X_test / 255.

In [None]:
print(X_train.shape)
print(X_test.shape)

Notice that the shape of each image is (28 x 28). To feed an image to a normal neural network we need this to be one vector, for this we use the `Flatten()` layer.

The output layer will use `softmax` as an activation function on the 10 final neurons (one for each class), each neuron will output the probability of an input being of each class.

In [None]:
from tensorflow.keras.layers import Flatten

In [None]:
model_3 = Sequential()
model_3.add(Flatten())
model_3.add(Dense(100))
model_3.add(Dense(50))
model_3.add(Dense(10, activation = 'softmax'))

In [None]:
model_3.compile(optimizer = 'adam',
                loss = 'categorical_crossentropy',
                metrics = ['accuracy'])

In [None]:
from tensorflow.keras.utils import to_categorical
y_train_cat = to_categorical(y_train)

In [None]:
history_3 = model_3.fit(X_train,
                        y_train_cat,
                        batch_size = 16,
                        epochs = 10,
                        verbose = 0,
                        validation_split = 0.3)

In [None]:
model_3.evaluate(X_test, to_categorical(y_test))

Note, that with a very simple network, looping over the data only **10** times, we got an accuracy of over 90%! 🤯🤯🤯