# Keras, overfitting and regulariation

## Regularization -underfitting and overfitting

**L-Norm regularization (Lasso)**: "introduce a cost for large weights"

$$C = Loss + Regularization term$$

**L1:** $ Loss + \lambda\sum_{l=1}^{L}||W_1|| $

**L2:** $ Loss + \lambda\sum_{l=1}^{L}||W_1^2|| $


**Dropout**

In each SGD step, randomly ignore a fraction $p$ of neurons
- can select $p$ in wide range. Typical is 0.2-0.8, dependent on size of ANN
- can apply only in specific layers. It is typical to only do dropout in a designated "dropout layer" somewhere close to output.
> dropout helps to stop specific neurons from learning specific patterns and allows the model to generalize better

**Data augmentation**

Shear, shift, scale and/or rotate input data
> not only generalizes better but also adds more data points

**early stopping**

stop training when performance on validation dataset starts worsening


## Vanishing gradients - a problem in deep neural nets

**Problem:**
- Gradients closer and closer to the input tend to get smaller and smaller
- Leads to smaller weight updates near input and larger weight updates near output

**Solution:**
- Use an activation function without small gradient for high values
- candidate activate function: ReLU

**Problems with ReLU:**
- Exploding gradients

**Solution:**
- Batch normalization, gradient clipping, weight regularization


## Keras

In [1]:
#to generate data
import numpy as np
import matplotlib.pylab as plt

def generate_X_linear(N=200):
    X = np.vstack([
        np.random.normal([-2, -2], 1, size=(int(N/2), 2)),
        np.random.normal([2, 2], 1, size=(int(N/2), 2))
    ])

    y = np.array([0] * int(N/2) + [1] * int(N/2)).reshape(-1, 1)
    
    return X, y

def generate_X_nonlinear(N=200, R=5):
    X_inner = np.random.normal([0, 0], 1, size=(int(N/2), 2))

    X_outer = np.array([
        [R*np.cos(theta), R*np.sin(theta)]
        for theta in np.linspace(0, 2 * np.pi, int(N/2))
    ]) + np.random.randn(int(N/2), 2)

    X = np.vstack([X_inner, X_outer])
    y = np.array([0] * int(N/2) + [1] * int(N/2)).reshape(-1, 1)
    
    return X, y



x, y = generate_X_nonlinear(1000)
plt.title("Non-linear", fontsize=12)
plt.scatter(x[:, 0], x[:, 1], c=list(y.reshape(-1)))

plt.show()

<Figure size 640x480 with 1 Axes>

In [2]:
from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.


In [3]:
model = Sequential()

#building layers
model.add(Dense(units=3, activation='relu', input_dim=2))
model.add(Dense(units=1, activation='relu', input_dim=3))



model.compile(loss='mse',
              optimizer='sgd',
              metrics=['accuracy','mse'])

In [10]:
hist = model.fit(x, y, epochs=50, batch_size=50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [5]:
import tensorflow as tf

tf.__version__

'2.0.0-rc0'