***
- Created by: Adipta Martulandi
- Email : adipta.martulandi@gmail.com
- LinkedIn : https://www.linkedin.com/in/adiptamartulandi/
***

## 2.0 Setting Up the Environment

- To follow this chapter, you must have several packages and python installed in your machine:
    1. Python 3.x
    2. Keras
    3. Tensorflow

## 2.1 Getting Started with DL in Keras
- Let’s start by studying the DNN and its logical components, understanding what each component is used for and how these building blocks are mapped in the Keras framework.

### 2.1.1 First Component 'Input Data'

- Input data for a DL algorithm can be of a variety of types. Essentially, the model understands data as “tensors”. Tensors are nothing but a generic form for vectors, or in computer engineering terms, a simple n-dimensional matrix. 

- Additionally, DL models can interpret only numeric data. If the dataset has any categorical data like “gender” with values of “male” and “female,” we will need to convert them to one-hot encoded variables (0 and 1).

- Image data also needs to be transformed into an n-dimensional tensor. An image is stored in data as a three-dimensional tensor where two dimensions define the pixel values on a 2D plane and a third dimension defines the values for RGB color channels.
- So essentially, one image becomes a three-dimensional tensor and n images will be a four-dimensional tensor, where the fourth dimension will stack a 3D tensor image as a training sample. Therefore, if we have 100 images with a 512 × 512-pixel resolution, they will be represented as a 4D tensor with shape 512 × 512 × 3 × 100.

- Lastly, it is a good practice to normalize, standardize, or scale the input values before training. Normalizing the values will bring all values in the input tensor into a 0–1 range, and standardization will bring the values into a range where the mean is 0 and the standard deviation is 1.
- This helps to reduce computation, as the learning improves by a great margin and so does performance, as the activation functions (covered in the following) behave more appropriately.

### 2.1.2 Neuron

- At the core of the DNN, we have neurons where computation for an output is executed. A neuron receives one or more inputs from the neurons in the previous layer. If the neurons are in the first hidden layer, they will receive the data from the input data stream.

### 2.1.3 Activation Function

- An activation function is the function that takes the combined input values, applies a function on it, and passes the output value, thus trying to mimic the activate/deactivate function.

- To keep things simple, we would always need a nonlinear activation function (at least in all hidden layers) to get the network to learn properly.
- There are a variety of choices available to use as an activation function. The most common ones are the sigmoid function and the ReLU (rectified linear unit).

#### 2.1.3.1 Sigmoid Function

- A sigmoid function is defined as which renders the output between 0 and 1.
- The nonlinear
output (s shaped as shown) improves the learning process very well, as it
3closely resembles the following principle—lower influence: low output and
higher influence: higher output—and also confines the output within the
0-to-1 range.

#### 2.1.3.2

- ReLU uses the function f(z) = max(0,z), which means that if the output is positive it would output the same value, otherwise it would output 0. The function’s output range is shown in the following visual.

### 2.1.4 Model

- The overall structure of a DNN is developed using the model object in Keras. This provides a simple way to create a stack of layers by adding new layers one after the other.
- The easiest way to define a model is by using the sequential model, which allows easy creation of a linear stack of layers.

### 2.1.5 Layers

- A layer in the DNN is defined as a group of neurons or a logically separated group in a hierarchical network structure. As DL became more and more popular, there were several experiments conducted with network architectures to improve performance for a variety of use cases. 

- Keras provides us with several types of layers and various means to connect them. We will take a close look at a few layers and also glance through some important layers.

### 2.1.6 Core Layers

#### 2.1.6.1 Dense Layer

- A dense layer is a regular DNN layer that connects every neuron in the defined layer to every neuron in the previous layer. For instance, if Layer 1 has 5 neurons and Layer 2 (dense layer) has 3 neurons, the total number of connections between Layer 1 and Layer 2 would be 15 (5 × 3).
- We also need to define the input shape for the Keras layer. The input shape needs to be defined for only the first layer. 

#### 2.1.6.2 Dropout Layer

- The dropout layer in DL helps reduce overfitting by introducing regularization and generalization capabilities into the model. In the literal sense, the dropout layer drops out a few neurons or sets them to 0 and reduces computation in the training process. The process of arbitrarily dropping neurons works quite well in reducing overfitting.

### 2.1.7 Loss Functions

- The loss function is the metric that helps a network understand whether it is learning in the right direction.
- Similarly, how does a network understand whether it is improving its learning process in each iteration? It uses the loss function. The loss function essentially measures the loss from the target. Say you are developing a model to predict whether a student will pass or fail and the chance of passing or failing is defined by the probability.

- Example of Loss Functions for Regression Problem:
    1. Mean Squared Error (MSE)
    2. Mean Absolute Error (MAE)
    3. Mean Absolute Percentage Error (MAPE)
    4. Mean Squared Logarithmic Error (MSLE)

- Example of Loss Functions for Classification Problem:
    1. Binary Cross-Entropy
    2. Categorical Cross-Entropy

### 2.1.8 Optimizers

- The most important part of the model training is the optimizer. Up to this point, we have addressed the process of giving feedback to the model through an algorithm called backpropagation; this is actually an optimization algorithm.
- Optimizers is used to minimize Error.

- Example of Optimizers:
    1. Stochastic Gradient Descent
    2. Adam
    3. RMSProp
    4. Adagrad

- Each of the optimization techniques has its own pros and cons. A major problem which we often face in DL is the vanishing gradient and saddle point problem. You can explore these problems in more detail while choosing the best optimizer for your problem. But for most use cases, Adam always works fine.

### 2.1.9 Model Configuration

- Once you have designed your network, Keras provides you with an easy one-step model configuration process with the ‘compile’ command. To compile a model, we need to provide three parameters: an optimization function, a loss function, and a metric for the model to measure performance on the validation dataset.

*** 

# Let's Practice

## Import Required Packages

In [1]:
import keras
import numpy as np

Using TensorFlow backend.


## Create Dummy Data

In [2]:
#Setting seed for reproducibility
np.random.seed(2018)

#Generate Dummy Data
x_train = np.random.random((6000, 10))
y_train = np.random.randint(2, size=(6000,1))

#Validation Data
x_val = np.random.random((2000,10))
y_val = np.random.randint(2, size=(2000, 1))

#Testing Data
x_test = np.random.random((2000,10))
y_test = np.random.randint(2, size=(2000, 1))

## Create Model Architecture

In [4]:
#Defining the model structure with the required layers, # of neurons, activation function and optimizers
model = keras.models.Sequential()

#First hidden layer
model.add(keras.layers.Dense(64, input_dim=10, activation='relu'))

#Second hidden layer
model.add(keras.layers.Dense(32, activation='relu'))

#Third hidden layer
model.add(keras.layers.Dense(16, activation='relu'))

#Fourth hidden layer
model.add(keras.layers.Dense(8, activation='relu'))

#Fifth hidden layer
model.add(keras.layers.Dense(4, activation='relu'))

#Output layer
model.add(keras.layers.Dense(1, activation='sigmoid'))

#Compile all layers
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

## Training Model and Creating Predicition

In [5]:
#Train the model
model.fit(x_train, y_train, batch_size=64, epochs=3, validation_data=(x_val,y_val));


Train on 6000 samples, validate on 2000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


- We can see that after every epoch, the model prints the mean training loss and accuracy as well as the validation loss and accuracy. We can use these intermediate results to make a judgment on the model performance.

## Model Evaluation

In [6]:
model.evaluate(x_test, y_test)



[0.6932061853408813, 0.493]

In [7]:
model.metrics_names

['loss', 'acc']

***

## Practice with Boston Housing Dataset

In [38]:
from keras.datasets import boston_housing

In [39]:
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()

In [40]:
#Explore the data structure using basic python commands

print("Type of the Dataset:",type(y_train))
print("Shape of training data :",x_train.shape)
print("Shape of training labels :",y_train.shape)
#---------------------------------------
print("Shape of testing data :",type(x_test))
print("Shape of training data :",x_test.shape)
print("Shape of testing labels :",y_test.shape)

Type of the Dataset: <class 'numpy.ndarray'>
Shape of training data : (404, 13)
Shape of training labels : (404,)
Shape of testing data : <class 'numpy.ndarray'>
Shape of training data : (102, 13)
Shape of testing labels : (102,)


In [41]:
#5 Top Rows Data
x_train[:5]

array([[1.23247e+00, 0.00000e+00, 8.14000e+00, 0.00000e+00, 5.38000e-01,
        6.14200e+00, 9.17000e+01, 3.97690e+00, 4.00000e+00, 3.07000e+02,
        2.10000e+01, 3.96900e+02, 1.87200e+01],
       [2.17700e-02, 8.25000e+01, 2.03000e+00, 0.00000e+00, 4.15000e-01,
        7.61000e+00, 1.57000e+01, 6.27000e+00, 2.00000e+00, 3.48000e+02,
        1.47000e+01, 3.95380e+02, 3.11000e+00],
       [4.89822e+00, 0.00000e+00, 1.81000e+01, 0.00000e+00, 6.31000e-01,
        4.97000e+00, 1.00000e+02, 1.33250e+00, 2.40000e+01, 6.66000e+02,
        2.02000e+01, 3.75520e+02, 3.26000e+00],
       [3.96100e-02, 0.00000e+00, 5.19000e+00, 0.00000e+00, 5.15000e-01,
        6.03700e+00, 3.45000e+01, 5.98530e+00, 5.00000e+00, 2.24000e+02,
        2.02000e+01, 3.96900e+02, 8.01000e+00],
       [3.69311e+00, 0.00000e+00, 1.81000e+01, 0.00000e+00, 7.13000e-01,
        6.37600e+00, 8.84000e+01, 2.56710e+00, 2.40000e+01, 6.66000e+02,
        2.02000e+01, 3.91430e+02, 1.46500e+01]])

In [42]:
#Divide 300 data for training and 104 data for validation
x_train_2 = x_train[:300]
y_train_2 = y_train[:300]
x_val = x_train[300:]
y_val = y_train[300:]

In [46]:
#Model Architecture
model = keras.models.Sequential()

#1 Hidden Layer
model.add(keras.layers.Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))

#2 Hidden Layer
model.add(keras.layers.Dense(6, kernel_initializer='normal', activation='relu'))

#3 Hidden Layer
model.add(keras.layers.Dense(1, kernel_initializer='normal'))

#Compile Model
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_absolute_percentage_error'])

In [44]:
#Train the model
model.fit(x_train, y_train, batch_size=32, epochs=3, validation_data=(x_val, y_val))

Train on 404 samples, validate on 104 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x1e2cdf20780>

In [45]:
#Evaluate the model
result = model.evaluate(x_test, y_test)
for i in range(len(model.metrics_names)):
    print(model.metrics_names[i]," : ", result[i])

loss  :  370.2984056659773
mean_absolute_percentage_error  :  65.85465240478516


- We can see that MAPE is around 65%, which is actually not a great number to have for model performance. This would translate into our model predictions at around 65% error. So, in general, if a house was priced at 10K, our model would have predicted ~17K.

In [47]:
#Lets change epoch to 30
model.fit(x_train, y_train, batch_size=32, epochs=30, validation_data=(x_val, y_val))

Train on 404 samples, validate on 104 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x1e2cede1f28>

In [48]:
#Evaluate again the model
result = model.evaluate(x_test, y_test)
for i in range(len(model.metrics_names)):
    print(model.metrics_names[i]," : ", result[i])

loss  :  62.62122988233379
mean_absolute_percentage_error  :  30.606018216002223


- MAPE has decreased almost half from 65% to 30%!