# Improving Your Model Performance

In the previous chapters, you've trained a lot of models! You will now learn how to interpret learning curves to understand your models as they train. You will also visualize the effects of activation functions, batch-sizes, and batch-normalization. Finally, you will learn how to perform automatic hyperparameter optimization to your Keras models using sklearn.

# (1) Learning curves

<img src="image/Screenshot 2021-01-29 142344.png">
<img src="image/Screenshot 2021-01-29 142401.png">
<img src="image/Screenshot 2021-01-29 142416.png">
<img src="image/Screenshot 2021-01-29 142445.png">
<img src="image/Screenshot 2021-01-29 142506.png">
<img src="image/Screenshot 2021-01-29 142536.png">

## Learning curves

```
# Store initial model weights
init_weights = model.get_weights()
# Lists for storing accuracies
train_accs = []
test_accs = []
```

```
for train_size in train_sizes:
    # Split a fraction according to train_size
    X_train_frac, _, y_train_frac, _ = train_test_split(X_train, Y_train, Train_size=train_size)
    # Set model initial weigths
    model.set_weights(init_weights)
    # Fit model on the training set fractopm
    model.fit(X_train_frac, y_train_frac, epoch=100, verbose=0, callbacks=[EarlyStopping(mornitor='loss', patience=1)])
    # Get the accuracy for this training set fraction
    train_acc = model.evaluate(X_train_frac, y_train_frac, verbose=0)[1]
    train_accs.append(train_acc)
    # Get the accuracy on the whole test set
    test_acc = model.evaluate(X_test, y_test, verbose=0)[1]
    test_accs.append(test_acc)
    print("Done with size: ", train_size)
```

# Exercise I: Learning the digits

You're going to build a model on the &**digits dataset**, a sample dataset that comes pre-loaded with scikit learn. The **digits dataset** consist of **8x8 pixel handwritten digits from 0 to 9**:

<img src="image/digits_dataset_sample.png">

You want to distinguish between each of the 10 possible digits given an image, so we are dealing with multi-class classification.
The dataset has already been partitioned into `X_train`, `y_train`, `X_test`, and `y_test`, using 30% of the data as testing data. The labels are already one-hot encoded vectors, so you don't need to use Keras `to_categorical()` function.

Let's build this new `model`!

### Instructions

- Add a `Dense` layer of 16 neurons with `relu` activation and an `input_shape` that takes the total number of pixels of the 8x8 digit image.
- Add a `Dense` layer with 10 outputs and `softmax` activation.
- Compile your model with `adam`, `categorical_crossentropy`, and `accuracy` metrics.
- Make sure your model works by predicting on `X_train`.

In [None]:
# Instantiate a Sequential model
model = Sequential()

# Input and hidden layer with input_shape, 16 neurons, and relu 
model.add(Dense(16, input_shape = (64,), activation = 'relu'))

# Output layer with 10 neurons (one per digit) and softmax
model.add(Dense(10, activation='softmax'))

# Compile your model
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Test if your model is well assembled by predicting before training
print(model.predict(X_train))

# Exercise II: Is the model overfitting?

Let's train the `model` you just built and plot its learning curve to check out if it's overfitting! You can make use of the loaded function `plot_loss()` to plot training loss against validation loss, you can get both from the history callback.

If you want to inspect the `plot_loss()` function code, paste this in the console: `show_code(plot_loss)`

### Instructions

- Train your model for 60 `epochs`, using `X_test` and `y_test` as validation data.
- Use `plot_loss()` passing `loss` and `val_loss` as extracted from the history attribute of the `h_callback` object.

In [None]:
# Train your model for 60 epochs, using X_test and y_test as validation data
h_callback = model.fit(X_train, y_train, epochs = 60, validation_data = (X_test, y_test), verbose=0)

# Extract from the h_callback object loss and val_loss to plot the learning curve
plot_loss(h_callback.history['loss'], h_callback.history['val_loss'])

In [None]:
## Question

Just by looking at the picture, do you think the learning curve shows this model is overfitting after having trained for 60 epochs?

### Possible Answers

- Yes, it started to overfit since the test loss is higher than the training loss.

- No, the test loss is not getting higher as the epochs go by. (T)

# Exercise III: Do we need more data?

It's time to check whether the **digits dataset** `model` you built benefits from more training examples!

In order to keep code to a minimum, various things are already initialized and ready to use:

    - The `model` you just built.
`X_train`, `y_train`, `X_test`, and `y_test`.
    - The `initial_weights` of your model, saved after using `model.get_weights()`.
    - A pre-defined list of training sizes: `training_sizes`.
    - A pre-defined early stopping callback monitoring loss: `early_stop`.
    - Two empty lists to store the evaluation results: `train_accs` and `test_accs`.
Train your model on the different training sizes and evaluate the results on `X_test`. End by plotting the results with `plot_results()`.

The full code for this exercise can be found on the slides!

### Instructions

- Get a fraction of the training data determined by the `size` we are currently evaluating in the loop.
- Set the model weights to the `initial_weights` with `set_weights()` and train your model on the fraction of training data using `early_stop` as a callback.
- Evaluate and store the accuracy for the training fraction and the test set.
- Call `plot_results()` passing in the training and test accuracies for each training size.

In [None]:
for size in training_sizes:
  	# Get a fraction of training data (we only care about the training data)
    X_train_frac, y_train_frac = X_train[:size], y_train[:size]

    # Reset the model to the initial weights and train it on the new training data fraction
    model.set_weights(initial_weights)
    model.fit(X_train_frac, y_train_frac, epochs = 50, callbacks = [early_stop])

    # Evaluate and store both: the training data fraction and the complete test set results
    train_accs.append(model.evaluate(X_train, y_train)[1])
    test_accs.append(model.evaluate(X_test, y_test)[1])
    
# Plot train vs test accuracies
plot_results(train_accs, test_accs)

# (2) Activation functions

<img src="image/Screenshot 2021-01-29 153515.png">

## Sigmoid & Tanh function
<img src="image/Screenshot 2021-01-29 154557.png">

## RelU & Leaky ReLU
<img src="image/Screenshot 2021-01-29 154805.png">

## Effects of activation functions

<img src="image/Screenshot 2021-01-29 154916.png">

## Effects of Sigmoid & Tanh
<img src="image/Screenshot 2021-01-29 154938.png">

## Effects of ReLU & Leaky ReLU
<img src="image/Screenshot 2021-01-29 155020.png">

## Which activation function to use?
- No magic formula
- Different properties
- Depends on our problem
- Goal to archieve in a given layer
- ReLU are a goof first choice
- Sigmoid not recommended for deep models

## Comparing activation functions

```
# Set a random seed
np.random.seed(1)
# Return a new model with given activation
def get_model(act_function):
    model = Sequential()
    model.add(Dense(4, input_shape=(2,), activation=act_function))
    model.add(Dense(1, activation='sigmoid'))
    return model
```

## Comparing activation functions

```
# Activation functions to try out
activations = ['relu', 'sigmoid', 'tanh']

# Dictionary to store results
activation_results = {}
for funct in activations:
    model = model.get_model(act_function=funct)
    history = model.fit(X_train, y_train, validation=(X_test, y_test), epoch=100, verbose=0)
    activation_result[funct] = history
```

```
import pandas as pd

# Extract val_loss history of each activation function
val_loss_per_funct = {k:v.history['val_loss'] fot k,v in activation_results.itemห()}

# Turn the dictionary into a pandas dataframe
val_loss_curves = pd.DataFrame(val_loss_per_funct)

# Plot the curves
val_loss_curves.plot(title='Loss per Activation function')
```

# Exercise IV: Different activation functions

The `sigmoid()`, `tanh()`, `ReLU()`, and `leaky_ReLU()` functions have been defined and ready for you to use. Each function receives an input number X and returns its corresponding Y value.

Which of the statements below is **false**?

### Possible Answers

- The `sigmoid()` takes a value of 0.5 when X = 0 whilst `tanh()` takes a value of 0.

- The `leaky_ReLU()` takes a value of -0.01 when X = -1 whilst `ReLU()` takes a value of 0.

- The `sigmoid()` and `tanh()` both take values close to -1 for big negative numbers. (T)

# Exercise V: Comparing activation functions

Comparing activation functions involves a bit of coding, but nothing you can't do!

You will try out different activation functions on the multi-label model you built for your farm irrigation machine in chapter 2. The function `get_model('relu')` returns a copy of this model and applies the `'relu'` activation function to its hidden layer.

You will loop through several activation functions, generate a new model for each and train it. By storing the history callback in a dictionary you will be able to visualize which activation function performed best in the next exercise!

`X_train`, `y_train`, `X_test`, `y_test` are ready for you to use when training your models.

### Instructions

- Fill up the activation functions array with `relu`, `leaky_relu`, `sigmoid`, and `tanh`.
- Get a new model for each iteration with `get_model()` passing the current activation function as a parameter.
- Fit your model providing the train and `validation_data`, use 20 `epochs` and set verbose to 0.


In [None]:
# Activation functions to try
activations = ['relu', 'leaky_relu', 'sigmoid', 'tanh']

# Loop over the activation functions
activation_results = {}

for act in activations:
  # Get a new model with the current activation
  model = get_model(act)
  # Fit the model and store the history results
  h_callback = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=20, verbose=0)
  activation_results[act] = h_callback

# Exercise VI: Comparing activation functions II

What you coded in the previous exercise has been executed to obtain the `activation_results` variable, this time 100 epochs were used instead of 20. This way you will have more epochs to further compare how the training evolves per activation function.

For every `h_callback` of each activation function in `activation_results`:

    - The `h_callback.history['val_loss']` has been extracted.
    - The `h_callback.history['val_acc']` has been extracted.

Both are saved into two dictionaries: `val_loss_per_function` and `val_acc_per_function`.

Pandas is also loaded as pd for you to use. Let's plot some quick validation loss and accuracy charts!

### Instructions

- Use `pd.DataFrame()` to create a new DataFrame from the `val_loss_per_function` dictionary.
- Call `plot()` on the DataFrame.
- Create another pandas DataFrame from `val_acc_per_function`.
- Once again, plot the DataFrame.

In [None]:
# Create a dataframe from val_loss_per_function
val_loss= pd.DataFrame(val_loss_per_function)

# Call plot on the dataframe
val_loss.plot()
plt.show()

# Create a dataframe from val_acc_per_function
val_acc = pd.DataFrame(val_acc_per_function)

# Call plot on the dataframe
val_acc.plot()
plt.show()

## Valuation Loss per function 
<img src="image/2021-29-01 163353.svg">

## Valuation Accuracy per function
<img src="image/2021-29-01 163523.svg">