<div class="alert alert-block alert-info">
<font size="6"><b><center> Section 3</font></center>
<br>
<font size="6"><b><center> Practical Aspects of Building Deep Learning Models </font></center>
</div>

**In this example, we will illustrate the comparison between training and validation losses and regularization techniques using dropout and L2-norm regularization.**

<div class="alert alert-block alert-info">
</div>

# Notations for Regularization

Regularized objective function:

$ C = C_0 + \text{Penalty Term} $

where $C_0$ is the unregularized objective function

$L^2$ norm penalty term typically take the form of $ \frac{1}{2} ||w||^2_2 $

### Equation by equation

**Inputs to hidden layer 1**

$z_1(\mathbf{x}; \mathbf{w_1},b_1) = \sum_{j=1}^4 w_{1,j}^{(1)} x_j + b_1^{(1)}$


$z_2(\mathbf{x}; \mathbf{w_2},b_2) = \sum_{j=1}^4 w_{2,j}^{(1)} x_j + b_2^{(1)}$

$\vdots$

$z_6(\mathbf{x}; \mathbf{w_6},b_6) = \sum_{j=1}^4 w_{6,j}^{(1)} x_j + b_6^{(1)}$

where $j=1,2,3,4$ (in this example) is the index for the inputs

$h_k(\mathbf{x}; \mathbf{w_k},b_k) = g_1(z_k)$ where $k=1,2,\dots,6$ is the index for the hidden units


**Hidden layer 1 to output**

$y_1 = g_2 \left( \sum_{k=1}^6 w_{1,k}^{(2)} + b_1^{(2)} \right)$

$y_2 = g_2 \left( \sum_{k=1}^6 w_{2,k}^{(2)} + b_2^{(2)} \right)$

### Sequential layerwise architecture

First layer:
$$ \mathbf{h}^{(1)} = g^{(1)} \left(\mathbf{W}^{(1)} \mathbf{x} + \mathbf{b}^{(1)}  \right) $$

Second layer:
$$ \mathbf{h}^{(2)} = g^{(2)} \left(\mathbf{W}^{(2)T} \mathbf{h}^{(1)} + \mathbf{b}^{(2)}  \right) $$



$\vdots$


$l^{th}$ layer:
$$ \mathbf{h}^{(l)} = g^{(l)} \left(\mathbf{W}^{(l)T} \mathbf{h}^{(l-1)} + \mathbf{b}^{(l)}  \right) $$



<div class="alert alert-block alert-info">
</div>

# Example 1: Feedforward Network Without Regularization

## Simulate and Visualize Data

The output is related to the input using the following function

$$y_i = 3x_{i,1} + x^2 exp(x_{i,1}) + \epsilon_i$$

where $\epsilon$ is an independently and identically distributed (i.i.d.) random variable and $i = 1,2,\dots,n$ is an index of examples (or observations)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

n = 200 # number of examples (or observations)

# Generate a set of n random numbers from a standard normal distribution
epsilon = np.random.randn(n)

# Generate a set of n random numbers from a uniform[0,1] distribution; name it x1
# and create another variable, which we name x2, from x1
x1 = np.random.uniform(0,1,n)
x2 = 1. / (1. + np.exp(-np.power(x1,3))) + 0.5*epsilon
x3 = np.random.randn(n)

X = pd.DataFrame({'x1':x1, 'x2':x2})

# Create the data generating mechanism
y = 3*x1 + np.power(x1,2)*np.exp(x1) + 0.8*x2 + 0.5*x3 + epsilon

## Split the data into a Training and Validation Set

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.5)

### Basic Descriptive Analysis

In [None]:
print(X_train.describe().T.round(4), "\n")
print(pd.DataFrame({'y':y_train}).describe().T.round(4))

In [None]:
# Correlation between x1 and x2 in the training set
np.corrcoef(X_train.iloc[:,0], X_train.iloc[:,1])[0][1].round(2)

In [None]:
fig = plt.figure(figsize=(15,10))

plt.subplot(3, 3, 1)
plt.hist(y_train)
plt.title("y (Training Set)")

plt.subplot(3, 3, 2)
plt.scatter(X_train.iloc[:,0], y_train)
plt.title("y vs x1 (Training Set)")

plt.subplot(3, 3, 3)
plt.scatter(X_train.iloc[:,1], y_train)
plt.title("y vs x2 (Training Set)")

plt.subplot(3, 3, 5)
plt.hist(X_train.iloc[:,0])
plt.title("x1 (Training Set)")

plt.subplot(3, 3, 6)
plt.scatter(X_train.iloc[:,0], X_train.iloc[:,1])
plt.title("x2 vs x1 (Training Set)")

plt.subplot(3, 3, 9)
plt.hist(X_train.iloc[:,1])
plt.title("x2 (Training Set)")

plt.tight_layout()

<div class="alert alert-block alert-warning">
Note: Before training, `numpy array` and `pandas dataframe` need to be converted to `PyTorch's tensors`
</div>

In [None]:
# convert numpy array to tensor in shape of input size
import torch 

X_train_ts = torch.from_numpy(X_train.values.reshape(-1,2)).float()
y_train_ts = torch.from_numpy(y_train.reshape(-1,1)).float()

X_test_ts = torch.from_numpy(X_test.values.reshape(-1,2)).float()
y_test_ts = torch.from_numpy(y_test.reshape(-1,1)).float() # y_test is a numpy array

In [None]:
print("X_train_ts is of %s type" %type(X_train_ts))
print("y_train_ts is of %s type" %type(y_train_ts))
print("X_test_ts is of %s type" %type(X_test_ts))
print("y_test_ts is of %s type" %type(y_test_ts))

In [None]:
print(X_train_ts.shape)
print(y_train_ts.shape)
print(X_test_ts.shape)
print(y_test_ts.shape)

In [None]:
# First 5 rows of tensor X
print("First 5 rows of tensor X_train_ts", "\n",  X_train_ts[:5], "\n")
print("First 5 rows of tensor y_train_ts", "\n",  y_train_ts[:5], "\n")

## Define a Feed-forward network with 1 hidden layers

In [None]:
# Let's confirm the dimensions of the inputs and outpu
print("train_size: ", len(X_train_ts), "\n")
print("X shape:", X_train_ts.shape, "\n")
print("y shape:", y_train_ts.shape, "\n")
print(X_train_ts.shape[1])

In [None]:
from __future__ import print_function

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

class ffNet(nn.Module):
    """
    D_in: input dimension
    D_h1: dimension of hidden layer 1
    D_out: output dimension
    """
    
    def __init__(self, D_in, D_h1, D_h2, D_out):
        super(ffNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, D_h1)
        self.linear2 = torch.nn.Linear(D_h1, D_out)

    def forward(self, x):
        h1 = F.relu(self.linear1(x))
        y_pred = self.linear2(h1)
        return y_pred

## Construct the model by instantiating the class defined above

In [None]:
X_train.shape[1]

In [None]:
# Construct the model by instantiating the class defined above
D_in = X_train.shape[1]
D_h1, D_h2 = 8,4
D_out = 1

ffnet = ffNet(D_in, D_h1, D_h2, D_out)
print(ffnet)

## Define loss function and optimization algorithm

In [None]:
# Define Optimizer and Loss Function

optimizer = torch.optim.SGD(ffnet.parameters(), lr=0.01)

loss_func = torch.nn.MSELoss()

## Model Training

In [None]:
X_data = X_train_ts
y_data = y_train_ts

X_data_val = X_test_ts
y_data_val = y_test_ts

n_epoch = 500

train_loss, val_loss = [],[]

for epoch in range(1, n_epoch + 1):
    y_pred = ffnet(X_data)    
    epoch_loss_train = loss_func(y_pred, y_data) 
    
    y_pred_val = ffnet(X_data_val)
    epoch_loss_val = loss_func(y_pred_val, y_data_val)     
    
    optimizer.zero_grad()
    epoch_loss_train.backward()        
    optimizer.step()       
    
    train_loss.append(epoch_loss_train)
    val_loss.append(epoch_loss_val)
    
    #if epoch <= 5 or epoch % 100 == 0:
    if epoch % 100 == 0:
        print('Epoch {}, Training loss {}, Validation loss {}'.format(epoch, round(float(epoch_loss_train),4), round(float(epoch_loss_val),4)))
        
        #plt.cla()
        
        fig = plt.figure(figsize=(16,6))
        plt.subplot(1, 2, 1)
        plt.scatter(X_train.iloc[:,0], y_train, label="actual")
        plt.scatter(X_train.iloc[:,0], y_pred.detach().numpy(), label="prediction")
        plt.title("Training")
        plt.legend()

        plt.subplot(1, 2, 2)
        plt.scatter(X_test.iloc[:,0], y_test, label="actual")
        plt.scatter(X_test.iloc[:,0], y_pred_val.detach().numpy(),label="prediction")
        plt.title("Validation")
        plt.legend()
        
        #plt.text(0.5, 0, 'Loss=%.4f' % loss.data.numpy(), fontdict={'size': 10, 'color':  'red'})
        #plt.pause(0.1)

        plt.show()

## Plot loss curves

In [None]:
fig = plt.figure(figsize=(12,8))
plt.plot(range(1,len(train_loss)+1),train_loss,'b',label = 'training loss')
plt.plot(range(1,len(val_loss)+1),val_loss,'g',label = 'validation loss')
plt.legend()
plt.title("Training and Validation Loss Curves")

# Example 2: Feedforward Network With a Dropout Layer

In [None]:
from __future__ import print_function

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

class ffNet(nn.Module):
    """
    D_in: input dimension
    D_h1: dimension of hidden layer 1
    D_out: output dimension
    """
    
    def __init__(self, D_in, D_h1, D_h2, D_out):
        super(ffNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, D_h1)
        self.linear1_drop = torch.nn.Dropout(p=0.8)
        self.linear2 = torch.nn.Linear(D_h1, D_out)

    def forward(self, x):
        h1 = F.relu(self.linear1(x))
        y_pred = self.linear2(h1)
        return y_pred

## Construct the model by instantiating the class defined above

In [None]:
# Construct the model by instantiating the class defined above
D_in = X_train.shape[1]
D_h1, D_h2 = 8,4
D_out = 1

ffnet = ffNet(D_in, D_h1, D_h2, D_out)
print(ffnet)

## Define loss function and optimization algorithm

In [None]:
# Define Optimizer and Loss Function

#optimizer = torch.optim.SGD(ffnet.parameters(), lr=0.01, weight_decay=1e-4)
optimizer = torch.optim.SGD(ffnet.parameters(), lr=0.01)

loss_func = torch.nn.MSELoss()

## Model Training

In [None]:
X_data = X_train_ts
y_data = y_train_ts

X_data_val = X_test_ts
y_data_val = y_test_ts

n_epoch = 500

train_loss, val_loss = [],[]

for epoch in range(1, n_epoch + 1):
    y_pred = ffnet(X_data)    
    epoch_loss_train = loss_func(y_pred, y_data) 
    
    y_pred_val = ffnet(X_data_val)
    epoch_loss_val = loss_func(y_pred_val, y_data_val)     
    
    optimizer.zero_grad()
    epoch_loss_train.backward()        
    optimizer.step()       
    
    train_loss.append(epoch_loss_train)
    val_loss.append(epoch_loss_val)
    
    #if epoch <= 5 or epoch % 100 == 0:
    if epoch % 100 == 0:
        print('Epoch {}, Training loss {}, Validation loss {}'.format(epoch, round(float(epoch_loss_train),4), round(float(epoch_loss_val),4)))
        
        #plt.cla()
        
        fig = plt.figure(figsize=(16,6))
        plt.subplot(1, 2, 1)
        plt.scatter(X_train.iloc[:,0], y_train, label="actual")
        plt.scatter(X_train.iloc[:,0], y_pred.detach().numpy(), label="prediction")
        plt.title("Training")
        plt.legend()

        plt.subplot(1, 2, 2)
        plt.scatter(X_test.iloc[:,0], y_test, label="actual")
        plt.scatter(X_test.iloc[:,0], y_pred_val.detach().numpy(),label="prediction")
        plt.title("Validation")
        plt.legend()
        
        #plt.text(0.5, 0, 'Loss=%.4f' % loss.data.numpy(), fontdict={'size': 10, 'color':  'red'})
        #plt.pause(0.1)

        plt.show()

## Plot loss curves

In [None]:
fig = plt.figure(figsize=(12,8))
plt.plot(range(1,len(train_loss)+1),train_loss,'b',label = 'training loss')
plt.plot(range(1,len(val_loss)+1),val_loss,'g',label = 'validation loss')
plt.legend()
plt.title("Training and Validation Loss Curves")

# Lab 3

In this lab, you will replicate the above exercise with a different simulated dataset. 

  - You will simply run the codes to simulate the data and conduct some descriptive analysis.

  - Then, you will design a network, instantiate it, run the training loop, and evaluate the model by plotting computing the loss of the training and validation sets.

## Simulate and Visualize Data

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

n = 200 # number of examples (or observations)

# Generate a set of n random numbers from a standard normal distribution
epsilon = np.random.randn(n)

# Generate a set of n random numbers from a uniform[0,1] distribution; name it x1
# and create another variable, which we name x2, from x1
x1 = np.random.uniform(0,1,n)

X = pd.DataFrame({'x1':x1})

# Create the data generating mechanism
y = 1.5*x1 + np.power(x1,3)*np.exp(x1) + epsilon

## Split the data into a Training and Validation Set

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.5)

### Basic Descriptive Analysis

In [None]:
print(X_train.describe().T.round(4), "\n")
print(pd.DataFrame({'y':y_train}).describe().T.round(4))

In [None]:
fig = plt.figure(figsize=(15,10))

plt.subplot(2, 2, 1)
plt.hist(y_train)
plt.title("y (Training Set)")

plt.subplot(2, 2, 2)
plt.scatter(X_train.iloc[:,0], y_train)
plt.title("y vs x1 (Training Set)")

plt.subplot(2, 2, 4)
plt.hist(X_train.iloc[:,0])
plt.title("x1 (Training Set)")

plt.tight_layout()

<div class="alert alert-block alert-warning">
Note: Before training, `numpy array` and `pandas dataframe` need to be converted to `PyTorch's tensors`
</div>

In [None]:
# convert numpy array to tensor in shape of input size
import torch 

X_train_ts = torch.from_numpy(X_train.values.reshape(-1,1)).float()
y_train_ts = torch.from_numpy(y_train.reshape(-1,1)).float()

X_test_ts = torch.from_numpy(X_test.values.reshape(-1,1)).float()
y_test_ts = torch.from_numpy(y_test.reshape(-1,1)).float() # y_test is a numpy array

In [None]:
print("X_train_ts is of %s type" %type(X_train_ts))
print("y_train_ts is of %s type" %type(y_train_ts))
print("X_test_ts is of %s type" %type(X_test_ts))
print("y_test_ts is of %s type" %type(y_test_ts))

**Question: Do the shape of these tensors make sense to you?**

In [None]:
print(X_train_ts.shape)
print(y_train_ts.shape)
print(X_test_ts.shape)
print(y_test_ts.shape)

## Define a Feed-forward network with 1 hidden layers

**Question: Do the shape in the following tensors confirm your expectation?**

In [None]:
# Let's confirm the dimensions of the inputs and outpu
print("train_size: ", len(X_train_ts), "\n")
print("X shape:", X_train_ts.shape, "\n")
print("y shape:", y_train_ts.shape, "\n")
print(X_train_ts.shape[1])

**Run the following code to design a one hidden-layer feedforward network.**

In [None]:
from __future__ import print_function

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

class ffNet(nn.Module):
    """
    D_in: input dimension
    D_h1: dimension of hidden layer 1
    D_out: output dimension
    """
    
    def __init__(self, D_in, D_h1, D_out):
        super(ffNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, D_h1)
        self.linear2 = torch.nn.Linear(D_h1, D_out)

    def forward(self, x):
        h1 = F.relu(self.linear1(x))
        y_pred = self.linear2(h1)
        return y_pred

## Construct the model by instantiating the class defined above

In [None]:
X_train.shape[1]

**Choose your parameters**

In [None]:
# Construct the model by instantiating the class defined above
D_in  = 0 # Replace 0 with your code here
D_h1  = 0 # Replace 0 with an integer
D_out = 0 # Replace 0 with the dimension of the output

ffnet = ffNet(D_in, D_h1, D_out)
print(ffnet)

## Define loss function and optimization algorithm

**You can specify the learning rate if you want.**

In [None]:
# Define Optimizer and Loss Function

learning_rate = 0.01 #You can specify the learning rate

optimizer = torch.optim.SGD(ffnet.parameters(), lr=learning_rate)

loss_func = torch.nn.MSELoss()

## Model Training

**1. Choose the number of epochs for your training**
**2. Fill in some details in the code below.**

In [None]:
X_data = X_train_ts
y_data = y_train_ts

X_data_val = X_test_ts
y_data_val = y_test_ts

n_epoch = 0 # Choose the number of epochs for your training

train_loss, val_loss = [],[]

for epoch in range(1, n_epoch + 1):
    y_pred = ffnet(A) # Fill in the blank by replacing A   
    epoch_loss_train = loss_func(y_pred, y_data) 
    
    y_pred_val = ffnet(B) # Fill in the blank by replacing B
    epoch_loss_val = loss_func(C, D) # Fill in the blank by replacing C and D
    
    optimizer.zero_grad()
    epoch_loss_train.backward()        
    optimizer.step()       
    
    train_loss.append(epoch_loss_train)
    val_loss.append(epoch_loss_val)
    
    #if epoch <= 5 or epoch % 100 == 0:
    if epoch % 100 == 0:
        print('Epoch {}, Training loss {}, Validation loss {}'.format(epoch, round(float(epoch_loss_train),4), round(float(epoch_loss_val),4)))
                
        fig = plt.figure(figsize=(16,6))
        plt.subplot(1, 2, 1)
        plt.scatter(X_train.iloc[:,0], y_train, label="actual")
        plt.scatter(X_train.iloc[:,0], y_pred.detach().numpy(), label="prediction")
        plt.title("Training")
        plt.legend()

        plt.subplot(1, 2, 2)
        plt.scatter(X_test.iloc[:,0], y_test, label="actual")
        plt.scatter(X_test.iloc[:,0], y_pred_val.detach().numpy(),label="prediction")
        plt.title("Validation")
        plt.legend()
        
        plt.show()

## Plot loss curves

In [None]:
fig = plt.figure(figsize=(12,8))
plt.plot(range(1,len(train_loss)+1),train_loss,'b',label = 'training loss')
plt.plot(range(1,len(val_loss)+1),val_loss,'g',label = 'validation loss')
plt.legend()
plt.title("Training and Validation Loss Curves")

# Feedforward Network With a Dropout Layer

**Specify the dropout layer.**

In [None]:
from __future__ import print_function

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

class ffNet(nn.Module):
    """
    D_in: input dimension
    D_h1: dimension of hidden layer 1
    D_out: output dimension
    """
    
    def __init__(self, D_in, D_h1, D_out):
        super(ffNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, D_h1)
        self.linear1_drop = DROPOUT_LAYER # Specify the dropout layer by replacing DROPOUT_LAYER
        self.linear2 = torch.nn.Linear(D_h1, D_out)

    def forward(self, x):
        h1 = F.relu(self.linear1(x))
        y_pred = self.linear2(h1)
        return y_pred

## Construct the model by instantiating the class defined above

**Specify the dimensions.**

In [None]:
# Construct the model by instantiating the class defined above
D_in  = 0 # Specify D_in
D_h1  = 0 # Specify D_h1
D_out = 0 # Specify D_out

ffnet = ffNet(D_in, D_h1, D_out)
print(ffnet)

## Define loss function and optimization algorithm

**Change the learning rate.**

In [None]:
# Define Optimizer and Loss Function


optimizer = torch.optim.SGD(ffnet.parameters(), lr=1e-6) # Change the learning rate 

loss_func = torch.nn.MSELoss()

## Model Training

In [None]:
X_data = X_train_ts
y_data = y_train_ts

X_data_val = X_test_ts
y_data_val = y_test_ts

n_epoch = 1000

train_loss, val_loss = [],[]

for epoch in range(1, n_epoch + 1):
    y_pred = ffnet(X_data)    
    epoch_loss_train = loss_func(y_pred, y_data) 
    
    y_pred_val = ffnet(X_data_val)
    epoch_loss_val = loss_func(y_pred_val, y_data_val)     
    
    optimizer.zero_grad()
    epoch_loss_train.backward()        
    optimizer.step()       
    
    train_loss.append(epoch_loss_train)
    val_loss.append(epoch_loss_val)
    
    #if epoch <= 5 or epoch % 100 == 0:
    if epoch % 100 == 0:
        print('Epoch {}, Training loss {}, Validation loss {}'.format(epoch, round(float(epoch_loss_train),4), round(float(epoch_loss_val),4)))
        
        #plt.cla()
        
        fig = plt.figure(figsize=(16,6))
        plt.subplot(1, 2, 1)
        plt.scatter(X_train.iloc[:,0], y_train, label="actual")
        plt.scatter(X_train.iloc[:,0], y_pred.detach().numpy(), label="prediction")
        plt.title("Training")
        plt.legend()

        plt.subplot(1, 2, 2)
        plt.scatter(X_test.iloc[:,0], y_test, label="actual")
        plt.scatter(X_test.iloc[:,0], y_pred_val.detach().numpy(),label="prediction")
        plt.title("Validation")
        plt.legend()
        
        #plt.text(0.5, 0, 'Loss=%.4f' % loss.data.numpy(), fontdict={'size': 10, 'color':  'red'})
        #plt.pause(0.1)

        plt.show()

## Plot loss curves

In [None]:
fig = plt.figure(figsize=(12,8))
plt.plot(range(1,len(train_loss)+1),train_loss,'b',label = 'training loss')
plt.plot(range(1,len(val_loss)+1),val_loss,'g',label = 'validation loss')
plt.legend()
plt.title("Training and Validation Loss Curves")