# Understanding Random Multi-Model

## Learning to create flexible model

In the next implementation of the recipe will be understanding how to design RMDL kind of model in Pytorch. The topics of the discussion will be as given below:


1. Using nn.Sequential to create flexible models
2.  A Flexible model with dense layers
3. A Flexible model with RNN layers


Importing requirement

In [0]:
import torch
import torch.nn as nn

**Using `nn.Sequential` to create flexible models:** We must first understand how variation in the model architecture is made by changing the parameters of the layers. Pytorch has beautiful support fo to build the model which get deeper and shallower only by changing the certain parameters.

While building such an ensemble model it is advisable to stack all the layers in to list and then use **nn.Sequential** method to connect all layers to form the model. **nn.Sequential** can be used really flexibly as shown below

In [0]:
layers = []
layers.append(layer_1)
layers.append(layer_2)
layers = nn.Sequential(*layers)

**A Flexible model with dense layers:** Let's say you want to build a dense network with variable layers and perceptron and dropout in between them then you can implement such model as shown below.

In [0]:
perceptron_in_layers = [200, 100, 50, 25]
dropout = 0.2
activation = torch.nn.ReLU()

In [0]:
layers = []
num_layers = len(perceptron_in_layers)
for i in range(0,num_layers-1):
    layers.append(torch.nn.Linear(in_features = perceptron_in_layers[i], out_features = perceptron_in_layers[i+1]))
    layers.append(activation)
    layers.append(torch.nn.Dropout(dropout))
layers = nn.Sequential(*layers)

In [0]:
layers = nn.Sequential(*layers)

One more thing to learn here is its always better to declare the model that what it is being used for. We were not following this convention till time but it is required. Some layers like dropout and batch-norm function differently when a model is used for train and when a model used for the test. if you do model.train() then you will see below-given output which shows parameters for the model.


If you do model.eval() then below given output will be shown. It let the model known that the weight need not be updated and only forward pass needs to be done without accumulating parameters for the backward pass. If the model is declared for evaluation then it affects layers like Batch normalization and dropout. These layers behave differently during training and evaluation.

In [0]:
layers.train()

**A Flexible model with RNN layers:** This was about the Feed forward model lets see what are all parameters available in the recurrent network if one wants to develop a flexible architecture. GRU or LSTM or VanillRNN has common parameters that can be declared to change the network architecture. For Example, the with LSTM one can change various parameters such as given below:

input_size = Number of features of the input generally will be equal to the size of embedding.
hidden_size = Hidden state size for any RNN unit
num_layers = RNN can be stacked in the layers and it looks like as given below. For more complex data more layers are required
bidirectional = if bidirectional is true then RNN runs in both direction of the sequence. 

![](figures/RMDL_bidirectional.png)

Figure. Showing how bidirectional LSTM works and how the final output is provided. Final output at each time step will be the concatenation of both the forward and backward output from forward and backward run. (Implementation wise RNN runs in only one direction but the sequence is reversed and given to RNN and the output so produced is called reverse direction output. 
Using these options various network architectures can be generated randomly. In addition to this various other additions can be applied to RNN such as attention mechanism. nn.Sequential(*layers) can be used to stack LSTM layers too.

In [0]:
layer = nn.LSTM(input_size = 100, hidden_size = 256, num_layers = 1 , bidirectional = True)

In [0]:
for param_tensor in layer.state_dict():
    print(param_tensor, "\t",layer.state_dict()[param_tensor].size())

In [0]:
layer = nn.LSTM(input_size = 100, hidden_size = 256, num_layers = 2 , bidirectional = True)

In [0]:
for param_tensor in layer.state_dict():
    print(param_tensor, "\t",layer.state_dict()[param_tensor].size())