# Optimisation

Hard to accomplich:
1. In neural networks, we optimise 1000s of weights simultaneously
2. Small learning rates may not improve our models
3. Too large learning rates might take us too far


The easiest way to see the effect of different learning rates is to use the simpliest optimiser: Stochastic Gradient Descent (SGD).
SGD uses a fixed learning rate (around 0.01 is commonly used), but we can specify our own learning rates by passing lr argument into the optimiser.

![](Image/Image1.jpg)

Even if we have an optimal learning rate, we may encounter "Dying Neuron Problem".

## Dying Neuron problem

Occurs when a neuron takes a negative value for all rows of the data.


For ReLU function, negative input values produce outputs of 0. At this point, the slope is 0 as well.

![](Image/Image2.jpg)

Because of the slope is 0, the slope of any weight flowing into this node is also 0 (according to back-propagation). Then those weights won't get updated.

因此，如果一次forward propagation的時候全部的input都是負數，則此node將不再有用 (dead neuron)

為了避免此問題發生，可用沒有0 slope的activation function，例如：
tanh() function
![](Image/Image3.jpg)

然而，圖中可以看到距離原點較遠的地方，其slope很小(趨近於0)，因此如果經過很多次back-propagation，讓很多很小的slope相乘，則每次weight的更新量趨近於0。

#### --> 這稱作 Vanishing Gradient Problem

這些問題表示我們使用的activation function不應該有太小的slope。同時，如果發現訓練結果不如預期，也可以換個activation function看看

# Model Validation

1. Few people run k-fold Cross Validation in deep learning because deep learning is widely used on large datasets, so CV is time consuming.
2. A single validation is effective enough because those validation runs are reasonably large.

Specify validation split (testing data size) in the fitting stage

In [1]:
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
model.fit(predictors, target, validation_split = 0.3)

SyntaxError: unexpected EOF while parsing (<ipython-input-1-a4a10c8c6099>, line 2)

結果的example

![](Image/Image4.jpg)

Stop training when the validation accuracy isn't improving.

## Early Stopping


![](Image/Image5.jpg)

Pass a 'patience' argument **(代表如果我的validation accuracy不再大量進步時，我還要跑幾個Epoch才會停止訓練模型)** to a function called "EarlyStopping". Patience = 2 or 3 is good enough，太大就代表跑很多次無意義的計算。

We pass the "EarlyStopping" object into the model fitting stage to an argument call "callbacks"

**重點：** argument callback 為傳入一個list, 代表可以同時傳入其他控制訓練的物件(更進階)。同時代表EarlyStopping物件要放在list ("[]")裡面

另一方面，keras的default epoch number為10次，我們可以傳入參數nb_epoch = 20來變成20次，之後直接停止。

# 尋找最佳模型

![](Image/Image6.jpg)

## Model Comparison Example

### 1 input layer + 1 hidden layer + 1 output layer

In [None]:
# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience=2)

# Create the new model: model_2
model_2 = Sequential()
# Add the first and second layers
model_2.add(Dense(100, activation='relu', input_shape=input_shape))
model_2.add(Dense(100, activation='relu'))

# Add the output layer
model_2.add(Dense(2, activation='softmax'))

# Compile model_2
model_2.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics=['accuracy'])

# Fit model_1
model_1_training = model_1.fit(predictors, target, epochs=15, validation_split=0.2, callbacks=[early_stopping_monitor], verbose=False)

# Fit model_2
model_2_training = model_2.fit(predictors, target, epochs=15, validation_split=0.2, callbacks=[early_stopping_monitor], verbose=False)

# Create the plot
plt.plot(model_1_training.history['val_loss'], 'r', model_2_training.history['val_loss'], 'b')
plt.xlabel('Epochs')
plt.ylabel('Validation score')
plt.show()

### 1 input layer + 2 hidden layer + 1 output layer

In [None]:
# The input shape to use in the first hidden layer
input_shape = (n_cols,)

# Create the new model: model_2
model_2 = Sequential()

# Add the first, second, and third hidden layers
model_2.add(Dense(50, activation = 'relu', input_shape = input_shape))
model_2.add(Dense(50, activation = 'relu'))
model_2.add(Dense(50, activation = 'relu'))

# Add the output layer
model_2.add(Dense(2, activation = 'softmax'))

# Compile model_2
model_2.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fit model 1
model_1_training = model_1.fit(predictors, target, epochs=20, validation_split=0.4, callbacks=[early_stopping_monitor], verbose=False)

# Fit model 2
model_2_training = model_2.fit(predictors, target, epochs=20, validation_split=0.4, callbacks=[early_stopping_monitor], verbose=False)

# Create the plot
plt.plot(model_1_training.history['val_loss'], 'r', model_2_training.history['val_loss'], 'b')
plt.xlabel('Epochs')
plt.ylabel('Validation score')
plt.show()

# Model Capacity (Network Capacity)

![](Image/Image7.jpg)

Adding layers or neurons in layers will increases model capacity.

A good procedure:
1. Start with a simple or small network, get its validation score
2. Keep adding capacities if the score increases
3. Stop adding capacities if the score starts to decrease

**Example:**

![](Image/Image8.jpg)


___