<font color="green">
    
## 1. Fighting overfitting   
### Here the most common ways to prevent overfitting in neural networks
- Getting more training data.
- Reducing the capacity of the network.
- Adding weight regularization.
    - *network.add (layers.Dense (16, kernel_regularizer = regularizers.l2(0.001), activation='relu'))*
    - *network.add (layers.Dense (16, kernel_regularizer = regularizers.l1_l2(l1=0.001, l2=0.001), activation='relu'))*
- Adding dropout.
    - *network.add (layers.Dropout(0.5))*
</font>

In [2]:
from keras.datasets import boston_housing
(train_data, train_labels), (test_data, test_labels) = boston_housing.load_data()
import numpy as np
test_data = (test_data - np.mean(train_data, axis=0))/np.std(train_data, axis=0)
train_data = (train_data - np.mean(train_data, axis=0))/np.std(train_data, axis=0)

from keras import models
from keras import layers
from keras import regularizers
def build_model():
    network = models.Sequential()
    network.add(layers.Input(shape=(train_data.shape[1],)))  # Specify the input shape
    network.add(layers.Dense(16, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
    network.add(layers.Dense(16, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
    network.add(layers.Dense(1))
    network.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
    return network
network=build_model()
network.fit(train_data, train_labels, epochs=10, batch_size=10, verbose=0)
test_loss, test_mae = network.evaluate(test_data, test_labels)
print(f"The mean absolute error: {test_mae}")

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - loss: 49.6468 - mae: 5.4789 
The mean absolute error: 5.478940486907959


In [3]:
from keras.datasets import boston_housing
(train_data, train_labels), (test_data, test_labels) = boston_housing.load_data()
import numpy as np
test_data = (test_data - np.mean(train_data, axis=0))/np.std(train_data, axis=0)
train_data = (train_data - np.mean(train_data, axis=0))/np.std(train_data, axis=0)

from keras import models
from keras import layers
from keras import regularizers
def build_model():
    network = models.Sequential()
    network.add(layers.Input(shape=(train_data.shape[1],)))  # Specify the input shape
    network.add(layers.Dense(16, kernel_regularizer=regularizers.l1_l2(l1=0.001, l2=0.001), activation='relu'))
    network.add(layers.Dense(16, kernel_regularizer=regularizers.l1_l2(l1=0.001, l2=0.001), activation='relu'))
    network.add(layers.Dense(1))
    network.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
    return network

network=build_model()
network.fit(train_data, train_labels, epochs=10, batch_size=10, verbose=0)
test_loss, test_mae = network.evaluate(test_data, test_labels)
print(f"The mean absolute error: {test_mae}")

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - loss: 32.6508 - mae: 4.5953 
The mean absolute error: 4.595333576202393


<font color="green">
Dropout is one of the most effective and most commonly used regularization techniques
for neural networks, developed by Hinton and his students at the University of Toronto.
Dropout, applied to a layer, consists of randomly "dropping out" (i.e. setting to zero) a
number of output features of the layer during training. Let’s say a given layer would
normally have returned a vector for a given input sample [0.2, 0.5, 1.3, 0.8, 1.1]
during training; after applying dropout, this vector will have a few zero entries distributed
at random, e.g. . The "dropout rate" is the fraction of the [0, 0.5, 1.3, 0, 1.1]
features that are being zeroed-out; it is usually set between 0.2 and 0.5. At test time, no
units are dropped out, and instead the layer’s output values are scaled down by a factor
equal to the dropout rate, so as to balance for the fact that more units are active than at
training time.
    
</font>

In [6]:
from keras.datasets import boston_housing
(train_data, train_labels), (test_data, test_labels) = boston_housing.load_data()
import numpy as np
test_data = (test_data - np.mean(train_data, axis=0))/np.std(train_data, axis=0)
train_data = (train_data - np.mean(train_data, axis=0))/np.std(train_data, axis=0)

from keras import models
from keras import layers
def build_model():
    network = models.Sequential()
    network.add(layers.Input(shape=(train_data.shape[1],)))  # Specify the input shape
    network.add(layers.Dense(16, activation='relu'))
    network.add(layers.Dropout(0.5))
    network.add(layers.Dense(16, activation='relu'))
    network.add(layers.Dropout(0.5))
    network.add(layers.Dense(1))
    network.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
    return network

network=build_model()
network.fit(train_data, train_labels, epochs=10, batch_size=10, verbose=0)
test_loss, test_mae = network.evaluate(test_data, test_labels)
print(f"The mean absolute error: {test_mae}")

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - loss: 130.4847 - mae: 9.5264 
The mean absolute error: 9.526391983032227


<font color="green">
    
## 2. Here is a table to help you pick a last-layer activation and a loss function for a few common problem types 
| Problem type      |Last-layer activation| Loss function |
| :---------------- | :------: | ----: |
|Binary classification                   |   sigmoid  | binary_crossentropy |
|Multi-class, single-label classification|   softmax  | categorical_crossentropy |
|Multi-class, multi-label classification |  softmax   | binary_crossentropy |
|Regression to arbitrary values          |  None      | mse |
|Regression to values between 0 and 1    |  sigmoid   | mse or binary_crossentropy |

</font>