## Overview
This notebook walks through creating, setting up and training a Deep Markov Model on a synthetic dataset

In [1]:
#General Purpose Imports
import numpy as np
import glob, os, sys, time
sys.path.append('../')
from utils.misc import getConfigFile, readPickle, displayTime

#Matplotlib imports
%matplotlib inline  
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['lines.linewidth']=2.5
mpl.rcParams['lines.markersize']=8
mpl.rcParams['text.usetex']=True
mpl.rcParams['text.latex.unicode']=True
mpl.rcParams['font.family'] = 'serif'
mpl.rcParams['font.serif'] = 'Times New Roman'
mpl.rcParams['text.latex.preamble']= ['\usepackage{amsfonts}','\usepackage{amsmath}']
mpl.rcParams['font.size'] = 20
mpl.rcParams['axes.labelsize']=20
mpl.rcParams['legend.fontsize']=20

## A] Data
* The polyphonic music code (which can be run from the `expt` folder) represents an example of running on binary data
* There is code to create a synthetic dataset in `dmm_data/load.py`, we will load that dataset here (it will be created the first time this function is called)

In [2]:
#Import load function to load synthetic data
from dmm_data.load import load
dataset = load('synthetic')
print type(dataset), dataset.keys()

Loading linear matrices
Loading linear matrices
Reloading...
Read  1  objects
<type 'dict'> ['test', 'dim_observations', 'train', 'valid', 'data_type']


### a) Dataset format
The dataset's are expected to be in a specific format:
* The type of the variable `dataset` is `dict`
* The keys of this dataset correspond to parameters used by the model as well as raw data
* `dim_observations` is of type `int` and it corresponds to the dimensionality of the raw data at each point in time
* `data_type` can be `binary` or `real`
* `train`,`valid` and `test` are dictionaries that house the pre-split data. `dataset['train']` contains three items: 
    * `dataset['train']['tensor']` is a 3D tensor where the dimensions correspond to `Nsamples x T x dim_observations`
    * `dataset['train']['mask']` is a 2D matrix where the dimensions correspond to `Nsamples x T`. Each entry in this matrix corresponds to whether a sample was observed at that point in time. This is used internally by the model to model variable length sequences. 
    * `dataset['train']['tensor_Z']` is not used by the model. It corresponds to the states of the **true** underlying latent variables that created this data

In [3]:
print 'Dimensionality of the observations: ', dataset['dim_observations']
print 'Data type of features:', dataset['data_type']
for dtype in ['train','valid','test']:
    print 'dtype: ',dtype, ' type(dataset[dtype]): ',type(dataset[dtype])
    print [(k,type(dataset[dtype][k]), dataset[dtype][k].shape) for k in dataset[dtype]]
    print '--------\n'

Dimensionality of the observations:  3
Data type of features: real
dtype:  train  type(dataset[dtype]):  <type 'dict'>
[('tensor_Z', <type 'numpy.ndarray'>, (10000, 10, 3)), ('mask', <type 'numpy.ndarray'>, (10000, 10)), ('tensor', <type 'numpy.ndarray'>, (10000, 10, 3))]
--------

dtype:  valid  type(dataset[dtype]):  <type 'dict'>
[('tensor_Z', <type 'numpy.ndarray'>, (1000, 10, 3)), ('mask', <type 'numpy.ndarray'>, (1000, 10)), ('tensor', <type 'numpy.ndarray'>, (1000, 10, 3))]
--------

dtype:  test  type(dataset[dtype]):  <type 'dict'>
[('tensor_Z', <type 'numpy.ndarray'>, (1000, 10, 3)), ('mask', <type 'numpy.ndarray'>, (1000, 10)), ('tensor', <type 'numpy.ndarray'>, (1000, 10, 3))]
--------



### b) Creating your own dataset
* When creating your own data, it should have the following structure: 
    * type: dict
        * `dim_observations`: int
        * `data_type`: `binary` or `real`
        * `train`, `valid`, `test` must be dictionaries with the following keys: 
            * `tensor` : Raw data as a 3D tensor with dimensions `Nsamples x T x dim_observations`
            * `mask`   : Mask for the raw data as a 2D matrix with dimensions `Nsamples x T`
* Now that we have the dataset in the desired format, lets look at setting up the model, we'll first load the necessary files to build the model

In [4]:
start_time = time.time()
from   model_th.dmm import DMM
import model_th.learning as DMM_learn
import model_th.evaluate as DMM_evaluate
displayTime('importing DMM',start_time, time.time())

		< importing DMM > took  3.5654168129   seconds


## B] Model Hyperparameters
* To setup the model, we need to specify the hyperparameters
* Normally, if you were running from a script, you would run the following: `from parse_args import params` (e.g `expt/train.py` within the script.
* This lets you specify hyperparameters for the model via the command line. See the shell scripts in the folder `expt/` for an example of this. 
* Since we're in Ipython, we'll reload a saved version of `params` and see what the default values currently are. To know more about how the choices of hyperparameters affect the model, you can run `python parse_args.py -h` in the main directory.
* The `unique_id` is created based on the default parameters or those specified via the commend line.
* The parameters (`data_type` and `dim_observations`) of the dataset need to be incorporated into `params` for the model to be able to setup the weights appropriately

In [5]:
params = readPickle('../default.pkl')[0]
for k in params:
    print k, '\t',params[k]
params['data_type'] = dataset['data_type']
params['dim_observations'] = dataset['dim_observations']

Read  1  objects
dataset 	mm
epochs 	2000
seed 	1
init_weight 	0.1
dim_stochastic 	100
expt_name 	uid
reg_value 	0.05
reloadFile 	./NOSUCHFILE
reg_spec 	_
dim_hidden 	200
lr 	0.0008
reg_type 	l2
init_scheme 	uniform
optimizer 	adam
use_generative_prior 	approx
maxout_stride 	4
batch_size 	20
savedir 	./chkpt
forget_bias 	-5.0
inference_model 	R
emission_layers 	2
savefreq 	10
rnn_cell 	lstm
rnn_size 	600
paramFile 	./NOSUCHFILE
nonlinearity 	relu
rnn_dropout 	0.1
transition_layers 	2
anneal_rate 	2.0
debug 	False
validate_only 	False
transition_type 	mlp
unique_id 	DMM_lr-0_0008-dh-200-ds-100-nl-relu-bs-20-ep-2000-rs-600-rd-0_1-infm-R-tl-2-el-2-ar-2_0-use_p-approx-rc-lstm-uid
leaky_param 	0.0


## C] Building the DMM

In [None]:
#The dataset is small, lets change some of the default parameters and the unique ID
params['dim_stochastic'] = 2
params['dim_hidden']     = 40
params['rnn_size']       = 80
params['epochs']         = 10
params['batch_size']     = 200
params['unique_id'] = params['unique_id'].replace('ds-100','ds-2').replace('dh-200','dh-40').replace('rs-600','rs-80')
params['unique_id'] = params['unique_id'].replace('ep-2000','ep-10').replace('bs-20','bs-200')

#Create a temporary directory to save checkpoints
params['savedir']   = params['savedir']+'-ipython/'
os.system('mkdir -p '+params['savedir'])

#Specify the file where `params` corresponding for this choice of model and data will be saved
pfile= params['savedir']+'/'+params['unique_id']+'-config.pkl'

print 'Checkpoint prefix: ', pfile
dmm  = DMM(params, paramFile = pfile)

Checkpoint prefix:  ./chkpt-ipython//DMM_lr-0_0008-dh-40-ds-2-nl-relu-bs-200-ep-10-rs-80-rd-0_1-infm-R-tl-2-el-2-ar-2_0-use_p-approx-rc-lstm-uid-config.pkl
	<<Sampling biases for LSTM from exponential distribution>>
	<<Nparameters: 56334>>
	<<Anneal = 1 in 2.0 param. updates>>
	<<Building with RNN dropout:0.1>>
	<<In _LSTM_RNN_layer with dropout 0.1000>>
	<<Modifying : [q_W_input_0,q_b_input_0,W_lstm_r,b_lstm_r,U_lstm_r,q_W_st,q_b_st,q_W_mu,q_b_mu,q_W_cov,q_b_cov,p_trans_W_0,p_trans_b_0,p_trans_W_1,p_trans_b_1,p_trans_W_mu,p_trans_b_mu,p_trans_W_cov,p_trans_b_cov,p_emis_W_0,p_emis_b_0,p_emis_W_1,p_emis_b_1,p_emis_W_out,p_emis_b_out]>>
<< Reg:(l2) Reg. Val:(0.05) Reg. Spec.:(_)>>
<<<<<< Adding l2 regularization for q_W_input_0 >>>>>>
<<<<<< Adding l2 regularization for q_b_input_0 >>>>>>
<<<<<< Adding l2 regularization for W_lstm_r >>>>>>
<<<<<< Adding l2 regularization for b_lstm_r >>>>>>
<<<<<< Adding l2 regularization for U_lstm_r >>>>>>
<<<<<< Adding l2 regularization for q_W_st >>>

## D] Parameter Estimation

In [None]:
#savef specifies the prefix for the checkpoints - we'll use the same save directory as before 
savef    = os.path.join(params['savedir'],params['unique_id'])
savedata = DMM_learn.learn(dmm, dataset['train'], epoch_start =0 ,
                                epoch_end = 100,
                                batch_size = 200,
                                savefreq   = params['savefreq'],
                                savefile   = savef,
                                dataset_eval=dataset['valid'],
                                shuffle    = True )

	<<Original dim: [3 5 3],[3 5]>>
	<<New dim: [10000    10     3],[10000    10]>>
	<<Bnum: 0, Batch Bound: 27.2308, |w|: 34.3823, |dw|: 1.0000, |w_opt|: 0.0000>>
	<<-veCLL:54461.2360, KL:34.9538, anneal:0.0100>>
	<<Bnum: 10, Batch Bound: 26.6092, |w|: 34.0214, |dw|: 1.0000, |w_opt|: 0.6507>>
	<<-veCLL:53199.3876, KL:19.0197, anneal:1.0000>>
	<<Bnum: 20, Batch Bound: 26.2080, |w|: 33.6336, |dw|: 1.0000, |w_opt|: 0.8759>>
	<<-veCLL:52388.2278, KL:27.8222, anneal:1.0000>>
	<<Bnum: 30, Batch Bound: 25.5067, |w|: 33.2729, |dw|: 1.0000, |w_opt|: 0.9516>>
	<<-veCLL:50922.5333, KL:90.9376, anneal:1.0000>>
	<<Bnum: 40, Batch Bound: 23.4843, |w|: 33.0148, |dw|: 1.0000, |w_opt|: 0.9758>>
	<<-veCLL:46639.7084, KL:328.8080, anneal:1.0000>>
	<<(Ep 0) Bound: 25.3746 [Took 28.9434 seconds] >>
	<<Saving at epoch 0>>
	<<Saved model (./chkpt-ipython/DMM_lr-0_0008-dh-40-ds-2-nl-relu-bs-200-ep-10-rs-80-rd-0_1-infm-R-tl-2-el-2-ar-2_0-use_p-approx-rc-lstm-uid-EP0-params) 
		 opt (./chkpt-ipython/DMM_lr-0_0008

	<<(Ep 10) Bound: 8.4199 [Took 32.7044 seconds] >>
	<<Saving at epoch 10>>
	<<Saved model (./chkpt-ipython/DMM_lr-0_0008-dh-40-ds-2-nl-relu-bs-200-ep-10-rs-80-rd-0_1-infm-R-tl-2-el-2-ar-2_0-use_p-approx-rc-lstm-uid-EP10-params) 
		 opt (./chkpt-ipython/DMM_lr-0_0008-dh-40-ds-2-nl-relu-bs-200-ep-10-rs-80-rd-0_1-infm-R-tl-2-el-2-ar-2_0-use_p-approx-rc-lstm-uid-EP10-optParams) weights>>
	<<Original dim: [10000    10     3],[10000    10]>>
	<<New dim: [1000   10    3],[1000   10]>>
	<<(Evaluate) Validation Bound: 8.4420 [Took 0.4100 seconds]>>
	<<Original dim: [1000   10    3],[1000   10]>>
	<<New dim: [10000    10     3],[10000    10]>>
	<<Bnum: 0, Batch Bound: 8.4367, |w|: 24.5764, |dw|: 1.0000, |w_opt|: 0.1230>>
	<<-veCLL:16870.9188, KL:2.4503, anneal:1.0000>>
	<<Bnum: 10, Batch Bound: 8.4283, |w|: 24.4709, |dw|: 1.0000, |w_opt|: 0.1653>>
	<<-veCLL:16853.9246, KL:2.6411, anneal:1.0000>>
