## Common Use Scinarios for Deep Learning Models

### 1. Building Models on Training Data
1. Customize Network Layout: e.g., number of layers, connections, activation functions, weight initializations
2. Customize Cost Function: special regularization
3. Choose or Customize optimization method, e.g., l-bfgs, cg, batch-gd, sgd, etc.
4. Specify early stopping, train/validation data, performance monitoring
5. Save learned weights and be able to recover from it

### 2. Use Models on New Data
1. Restore model from its trained weights and yaml configuration
2. Predict on new data
3. Extract weights from hidden layers as features - mostly only useful for models pre-trained on large data, e.g., Caffe, Overfeat, sklearn-theano

[Machine Learning Using Pylearn2](https://blog.safaribooksonline.com/2014/02/10/pylearn2-regression-3rd-party-data/)


**we will try to cover these points in this notebook**
**use the [data-downloading script](https://github.com/lisa-lab/DeepLearningTutorials/blob/master/data/download.sh) to download the necessary data into `./data` folder** 

## Pylearn2 for Deep Learning
1. pylearn2 to Theano is similiar as scipy to numpy
2. it utilizes yaml for quick experiemnt setup, under a unified framework including `dataset`, `algorithm`(optimizer), `model` (network), which are leigo blocks for deep leanring
3. in the yaml configuration, you can use `!obj:` to create instance (composite of both data and methods), `!import` to attach to customized functions (e.g., cost function), and `!pkl:` to load data 

In [1]:
!pip show pylearn2

---
Metadata-Version: 1.0
Name: pylearn2
Version: 0.1.dev0
Summary: A machine learning library built on top of Theano.
Home-page: UNKNOWN
Author: UNKNOWN
Author-email: UNKNOWN
License: BSD 3-clause license
Location: /home/dola/opt/pylearn2
Requires: numpy, pyyaml, argparse, Theano


In [1]:
from pylearn2.config import yaml_parse
from pylearn2.datasets import DenseDesignMatrix
import cPickle
from sklearn import preprocessing

Couldn't import dot_parser, loading of dot files will not be possible.


### 1. [Basic Example of Building Softmax Regression for MNIST - with YAML](http://nbviewer.ipython.org/github/lisa-lab/pylearn2/blob/master/pylearn2/scripts/tutorials/softmax_regression/softmax_regression.ipynb)

**The exercise focuses on how easy or hard to quickly test a model on an numpy.array in pylearn2**

**its use feels quite counter-intuitive for explorative data science - it needs to force everything into YAML and dump everything on disk first - Theano has a much more friendly interface when integrated with other python objects in this case**

**but pylearn2 may have its point of doing this as it uses configuration to train pre-defined data to get good performance, but not necessarily in an explorative environment**


In [3]:
## prepare data - from single mnist.pkl to create training, valiadtion and testing set and pickle them
## even though pylearn2.datasets.DenseDesignMatrix accepts integer array as y, but SoftmaxRegression Model
## only accepts one-hot encoding
mnist_data_config = r"!pkl: '../data/mnist.pkl'"
train_mnist, valid_mnist, test_mnist = yaml_parse.load(mnist_data_config)
coder = preprocessing.OneHotEncoder()
train_mnist = DenseDesignMatrix(X = train_mnist[0], y = coder.fit_transform(train_mnist[1].reshape((-1, 1))).toarray())
valid_mnist = DenseDesignMatrix(X = valid_mnist[0], y = coder.fit_transform(valid_mnist[1].reshape((-1, 1))).toarray())
test_mnist = DenseDesignMatrix(X = test_mnist[0], y = coder.fit_transform(test_mnist[1].reshape((-1, 1))).toarray())
cPickle.dump(train_mnist, open("../data/train_mnist.pkl", "w"))
cPickle.dump(valid_mnist, open("../data/valid_mnist.pkl", "w"))
cPickle.dump(test_mnist, open("../data/test_mnist.pkl", "w"))

In [4]:
!ls ../data

images	   mnist.pkl.gz       test_mnist.pkl   valid_mnist.pkl
mnist.pkl  mnist_py3k.pkl.gz  train_mnist.pkl


In [14]:
## After a while I realized that breaking up the yaml into different pieces is
## a bad idea as they don't usually hook up with each until you put them together
## not to mention how senstive yaml is to syntax errors

dataset_config = r"""&train !pkl: '../data/train_mnist.pkl'
"""

model_config = r"""!obj:pylearn2.models.softmax_regression.SoftmaxRegression {
  n_classes: 10,
  irange: 0.,
  nvis: 784
}"""


algorithm_config = r"""!obj:pylearn2.training_algorithms.bgd.BGD {
  batch_size: 10000,
  conjugate: 1,
  monitoring_dataset: {
    'train': *train,
    'valid': !pkl: '../data/valid_mnist.pkl'
  },
  termination_criterion: !obj:pylearn2.termination_criteria.MonitorBased {
    channel_name: "valid_y_misclass"
  }
}
"""

driver_config = r"""!obj:pylearn2.train.Train {
  dataset: %s
  , model: %s
  , algorithm: %s
  , extensions: [!obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
             channel_name: 'valid_y_misclass',
             save_path: "../models/softmax_regression_best.pkl"
        },]
  , save_path: "../models/softmax_regression.pkl"
  , save_freq: 1
}
""" % (dataset_config, model_config, algorithm_config)


print driver_config

!obj:pylearn2.train.Train {
  dataset: &train !pkl: '../data/train_mnist.pkl'

  , model: !obj:pylearn2.models.softmax_regression.SoftmaxRegression {
  n_classes: 10,
  irange: 0.,
  nvis: 784
}
  , algorithm: !obj:pylearn2.training_algorithms.bgd.BGD {
  batch_size: 10000,
  conjugate: 1,
  monitoring_dataset: {
    'train': *train,
    'valid': !pkl: '../data/valid_mnist.pkl'
  },
  termination_criterion: !obj:pylearn2.termination_criteria.MonitorBased {
    channel_name: "valid_y_misclass"
  }
}

  , extensions: [!obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
             channel_name: 'valid_y_misclass',
             save_path: "../models/softmax_regression_best.pkl"
        },]
  , save_path: "../models/softmax_regression.pkl"
  , save_freq: 1
}



In [6]:
%%capture log
driver = yaml_parse.load(driver_config)
driver.main_loop()

In [7]:
print log.stdout[-500:]

9996146
	valid_y_mean_max_class: 0.917966512741
	valid_y_min_max_class: 0.248488289737
	valid_y_misclass: 0.0714
	valid_y_nll: 0.261678336848
	valid_y_row_norms_max: 1.99803194598
	valid_y_row_norms_mean: 0.574710145358
	valid_y_row_norms_min: 0.0
Saving to ../models/softmax_regression.pkl...
Saving to ../models/softmax_regression.pkl done. Time elapsed: 0.010190 seconds
Saving to ../models/softmax_regression.pkl...
Saving to ../models/softmax_regression.pkl done. Time elapsed: 0.009725 seconds



In [8]:
## make use of trained model
## You need to do all the geeky stuff here to convert it back to Theano
import theano.tensor as T
import theano
import numpy as np 
softmax_model = cPickle.load(open("../models/softmax_regression.pkl"))
X = softmax_model.get_input_space().make_theano_batch()
y = softmax_model.fprop(X)
ylabel = T.argmax(y, axis = 1)
predict = theano.function([X], ylabel)

yhat = predict(test_mnist.X)
ytarget = cPickle.load(open("../data/mnist.pkl"))[-1][1]
print "classification rate on test data", np.mean(yhat == ytarget)

classification rate on test data 0.924


### 2. [stacked autoencoders example by yaml](http://nbviewer.ipython.org/github/lisa-lab/pylearn2/blob/master/pylearn2/scripts/tutorials/stacked_autoencoders/stacked_autoencoders.ipynb)

1. layerwise pre-training (unsupervised) for denoising autoencoder
2. stacking these layers to form a MLP and fine tune it with supverised learning

In [9]:
%%capture log

## layer 1 - unsupervised training
layer1_yaml = r"""
!obj:pylearn2.train.Train {
    dataset: &train !pkl: '../data/train_mnist.pkl',
    model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
        nvis : 784,
        nhid : 500,
        irange : 0.05,
        corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
            corruption_level: .2,
        },
        act_enc: "tanh",
        act_dec: null,    # Linear activation on the decoder side.
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate : 1e-3,
        batch_size : 100,
        monitoring_batches : 5,
        monitoring_dataset : *train,
        cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
        termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: 10,
        },
    },
    save_path: "../models/dae_l1.pkl",
    save_freq: 1
}
"""
layer1 = yaml_parse.load(layer1_yaml)


%time layer1.main_loop()

In [10]:
print log.stdout[-500:]

 ../models/dae_l1.pkl done. Time elapsed: 0.577441 seconds
Time this epoch: 12.299725 seconds
Monitoring step:
	Epochs seen: 10
	Batches seen: 5000
	Examples seen: 500000
	learning_rate: 0.001
	objective: 11.8870911208
	total_seconds_last_epoch: 16.161059
	training_seconds_this_epoch: 12.299725
Saving to ../models/dae_l1.pkl...
Saving to ../models/dae_l1.pkl done. Time elapsed: 0.574657 seconds
Saving to ../models/dae_l1.pkl...
Saving to ../models/dae_l1.pkl done. Time elapsed: 0.538063 seconds



In [11]:
%%capture log
## second layer trainning - 2nd layer takes the output of the 1st layer as its input
layer2_yaml = r"""!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.transformer_dataset.TransformerDataset {
        raw: !pkl: '../data/train_mnist.pkl',
        transformer: !pkl: "../models/dae_l1.pkl" # use layer 1 as input
    },
    model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
        nvis : 500,
        nhid : 500,
        irange : 0.05,
        corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
            corruption_level: .3,
        },
        act_enc: "tanh",
        act_dec: null,    # Linear activation on the decoder side.
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate : 1e-3,
        batch_size : 100,
        monitoring_batches : 5,
        monitoring_dataset : *train,
        cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
        termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: 10,
        },
    },
    save_path: "../models/dae_l2.pkl",
    save_freq: 1
}
"""

layer2 = yaml_parse.load(layer2_yaml)


%time layer2.main_loop()

In [12]:
print log.stdout[-500:]

o ../models/dae_l2.pkl done. Time elapsed: 0.407241 seconds
Time this epoch: 12.709962 seconds
Monitoring step:
	Epochs seen: 10
	Batches seen: 5000
	Examples seen: 500000
	learning_rate: 0.001
	objective: 4.3132512472
	total_seconds_last_epoch: 16.045796
	training_seconds_this_epoch: 12.709962
Saving to ../models/dae_l2.pkl...
Saving to ../models/dae_l2.pkl done. Time elapsed: 0.424044 seconds
Saving to ../models/dae_l2.pkl...
Saving to ../models/dae_l2.pkl done. Time elapsed: 0.397842 seconds



In [13]:
%%capture log
## supervised tuning of the stacked network
## 1. stack the two DAE into a MLP
## supervised-training the MLP

mlp_yaml = r"""!obj:pylearn2.train.Train {
    dataset: &train !pkl: '../data/train_mnist.pkl',
    model: !obj:pylearn2.models.mlp.MLP {
        batch_size: 100,
        layers: [
                 !obj:pylearn2.models.mlp.PretrainedLayer {
                     layer_name: 'h1',
                     layer_content: !pkl: "../models/dae_l1.pkl"
                 },
                 !obj:pylearn2.models.mlp.PretrainedLayer {
                     layer_name: 'h2',
                     layer_content: !pkl: "../models/dae_l2.pkl"
                 },
                 !obj:pylearn2.models.mlp.Softmax {
                     max_col_norm: 1.9365,
                     layer_name: 'y',
                     n_classes: 10,
                     irange: .005
                 }
                ],
        nvis: 784
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate: .05,
        learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
            init_momentum: .5,
        },
        monitoring_dataset:
            {
                'valid' : !pkl: '../data/valid_mnist.pkl',
            },
        cost: !obj:pylearn2.costs.mlp.Default {},
        termination_criterion: !obj:pylearn2.termination_criteria.And {
            criteria: [
                !obj:pylearn2.termination_criteria.MonitorBased {
                    channel_name: "valid_y_misclass",
                    prop_decrease: 0.,
                    N: 100
                },
                !obj:pylearn2.termination_criteria.EpochCounter {
                    max_epochs: 50
                }
            ]
        },
        update_callbacks: !obj:pylearn2.training_algorithms.sgd.ExponentialDecay {
            decay_factor: 1.00004,
            min_lr: .000001
        }
    },
    extensions: [
        !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
            start: 1,
            saturate: 250,
            final_momentum: .7
        }
    ],
    save_path: "../models/mlp.pkl",
    save_freq: 1
}
"""

mlp_yaml = yaml_parse.load(mlp_yaml)


%time mlp_yaml.main_loop()

In [14]:
print log.stdout[-500:]

n: 1.93520071282
	valid_y_max_max_class: 0.999997682992
	valid_y_mean_max_class: 0.98003508451
	valid_y_min_max_class: 0.548868565573
	valid_y_misclass: 0.0203
	valid_y_nll: 0.0668058268363
	valid_y_row_norms_max: 0.545912116268
	valid_y_row_norms_mean: 0.264345579985
	valid_y_row_norms_min: 0.101699705657
Saving to ../models/mlp.pkl...
Saving to ../models/mlp.pkl done. Time elapsed: 0.929557 seconds
Saving to ../models/mlp.pkl...
Saving to ../models/mlp.pkl done. Time elapsed: 0.927043 seconds



In [15]:
## make use of trained model
## You need to do all the geeky stuff here to convert it back to Theano
import theano.tensor as T
import theano
softmax_model = cPickle.load(open("../models/mlp.pkl"))
X = softmax_model.get_input_space().make_theano_batch()
y = softmax_model.fprop(X)
ylabel = T.argmax(y, axis = 1)
predict = theano.function([X], ylabel)

yhat = predict(test_mnist.X)
ytarget = cPickle.load(open("../data/mnist.pkl"))[-1][1]
print "Accuracy on test data: ", np.mean(yhat == ytarget)

Accuracy on test data:  0.98


### 3. [XOR example by api](http://www.arngarden.com/2013/07/29/neural-network-example-using-pylearn2/)

Here we use yaml to regenerate the example

In [6]:
import numpy as np
from pylearn2 import datasets, config
from pylearn2.models import mlp
import cPickle
from sklearn import preprocessing

In [22]:
## dataset
X = np.random.randint(low = 0, high = 2, size = (1000, 2))
y = np.bitwise_xor(X[:, 0], X[:, 1])
y = np.c_[y, 1-y]
xor_data = datasets.DenseDesignMatrix(X = X, y = y)

cPickle.dump(xor_data, open("../data/tmp/xor.pkl", "w"))

In [47]:
%%capture log

## mpl
driver_yaml = r"""!obj:pylearn2.train.Train {
  dataset: !pkl: '../data/tmp/xor.pkl',
  model: !obj:pylearn2.models.mlp.MLP {
    layers: [
      !obj:pylearn2.models.mlp.Sigmoid {
        layer_name: 'hidden',
        dim: 3,
        irange: .1,
      },
      !obj:pylearn2.models.mlp.Softmax {
        n_classes: 2,
        layer_name: 'output',
        irange: .1,
      }
    ],
    nvis: 2
  },
  algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
    learning_rate: .05,
    batch_size: 10,
    termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {max_epochs: 400}
  },
  save_freq: 1,
  save_path: '../models/xor_mlp.pkl'
}"""

driver = config.yaml_parse.load(driver_yaml)
%time driver.main_loop()

In [48]:
import theano
import theano.tensor as T
xor_mlp = cPickle.load(open("../models/xor_mlp.pkl"))
sX = xor_mlp.get_input_space().make_theano_batch()
sy = T.argmax(xor_mlp.fprop(sX), axis = 1)
predict = theano.function([sX], sy)
np.mean(predict(X) == np.argmax(y, axis = 1))

1.0

### 4. [Customized Model Example in Pylearn2](http://vdumoulin.github.io/articles/extending-pylearn2/)

In [1]:
## MNIST supervised learning examples
from pylearn2.config import yaml_parse

## werid behavior, you cannot import certain modules unless you run !pkl yaml first
from pylearn2.utils import serial 

_, train_minst, valid_mnist = yaml_parse.load("!pkl: '../data/mnist.pkl'")
train = yaml_parse.load("!pkl: '../data/train_mnist.pkl'")

Couldn't import dot_parser, loading of dot files will not be possible.


In [2]:
!cat log_reg.py

## Logistic Regression Model - HOMEMADE
import theano.tensor as T
from pylearn2.costs.cost import Cost, DefaultDataSpecsMixin
from pylearn2.models.model import Model
from pylearn2.utils import sharedX
from pylearn2.space import VectorSpace
import numpy as np 

## The order of DefaultDataSpecsMixin and Cost matters!
class LogisticRegressionCost(DefaultDataSpecsMixin, Cost):
    supervised = True ## specify supervised learning costs
    
    ## need model to map input to output, need data to test with target
    def expr(self, model, data, **kwargs): 
        space, source = self.get_data_specs(model) ## model's data specification
        space.validate(data) ## use model's vector space to validate data
        
        ## All X, y, yhat are theano variables
        X, y = data ## since it is supervised cost, we got both
        yhat = model.logistic_regression(X) ## call model to map X to yhat
        loss = -(y * T.log(yhat)).sum(axis = 1) ## rowwise selection
    

### another way (carton way) to look at the same code
- Cost's responsibility:
    1. show its ID so it gets the right type of data (X alone or both X,y)
    2. always check what you get - model and data - match or not
    3. do the chemistry by creating cost expr, which will be used by training algorithm
    
- Model's responsibility:
    1. specify data specification - the space for inputs and outputs (if any)
    2. create params
    3. map input to output estimation

```python

## cost function is the main interface between model, data 
## and training algorithm. cost utilizes model,data, and it is
## utilized by training algorithm
class LogisticRegressionCost(DefaultDataSpecsMixin, Cost):
    
    supervised = True ## show your ID, it decides how you can mix your model and data
    
    ## create chemistry(cost expr) from model and data
    ## before that, make sure the model match the data
    ## by checking they are in the same SPACE
    ## THEN do chemistry
    def expr(self, model, data, **kwargs): 
        space, source = self.get_data_specs(model) 
        space.validate(data) 
        
        X, y = data 
        yhat = model.logistic_regression(X) 
        loss = -(y * T.log(yhat)).sum(axis = 1)
        return loss.mean()
    
class LogisticRegression(Model):

    ## three main things in constructor;
    ## 1. call super constructer
    ## 2. construct necessary params as theano shared variable
    ## 3. construct data specification (data space) as input_space and output_space
    def __init__(self, nvis, nclasses):
        super(LogisticRegression, self).__init__()
        
        self.nvis = nvis
        self.nclasses = nclasses
        
        ## one tip for remembering the parameter shapes:
        ## the last dimension should always be nclasses (output dimension)
        ## so that the matrix operation can be broadcast in the right way
        W_value = np.random.uniform(size = (self.nvis, self.nclasses))
        self.W = sharedX(W_value, "W")
        b_value = np.zeros(self.nclasses)
        self.b = sharedX(b_value, "b")

        self._params = [self.W, self.b]
        
        self.input_space = VectorSpace(dim = self.nvis)
        self.output_space = VectorSpace(dim = self.nclasses)
    
    ## do calculation
    def logistic_regression(self, X):
        return T.nnet.softmax(T.dot(X, self.W) + self.b)
```


### demostration of broadcasting in matrix manipulation
** and explains why parameters in nn should be casted like that**
```python
M = np.random.random((5, 3))
b = np.array([1, 2, 3])
print M
print b
print M + b 
```

```
[[ 0.56329648  0.84977044  0.61348718]
 [ 0.86771678  0.80962258  0.57615912]
 [ 0.57825582  0.57023821  0.66687874]
 [ 0.83863479  0.00110787  0.39578863]
 [ 0.91545471  0.4787959   0.19133042]]
[1 2 3]
[[ 1.56329648  2.84977044  3.61348718]
 [ 1.86771678  2.80962258  3.57615912]
 [ 1.57825582  2.57023821  3.66687874]
 [ 1.83863479  2.00110787  3.39578863]
 [ 1.91545471  2.4787959   3.19133042]]
```

In [3]:
%%capture log

log_reg_yaml = r"""!obj:pylearn2.train.Train {
  dataset: &train !pkl: '../data/train_mnist.pkl',
  model: !obj:log_reg.LogisticRegression {
    nvis: 784,
    nclasses: 10,
  },
  algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
    batch_size: 200,
    learning_rate: 1e-3,
    monitoring_dataset: {
      'train': *train,
      'valid': !pkl: '../data/valid_mnist.pkl',
    },
    cost: !obj:log_reg.LogisticRegressionCost {},
    termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
      max_epochs: 100,
    },
  },
  save_freq: 1,
  save_path: '../models/log_regssion.pkl'
}"""

train = yaml_parse.load(log_reg_yaml)
train.main_loop()

In [4]:
import theano.tensor as T
import numpy as np
model = serial.load('../models/log_regssion.pkl')
test_data = serial.load('../data/test_mnist.pkl')
test_yhat = T.argmax(model.logistic_regression(test_data.X), axis = 1).eval()
np.mean(test_yhat == np.argmax(test_data.y, axis = 1))

0.82650000000000001

### some notes on the above yaml
1. `!obj:module.to.Object` is a string without any whitespace
2. `!pkl: 'path/to/data'` usually has a space in between
3. yaml prefers whitespace over tabs ?
4. `monitoring_dataset` is a dictionary instead of object, that is why its key got quoted - even thought their syntax is the same
5. it is less common to customize a training algorithms than a cost or model
6. most important common part for training algorithms are (1) monitoring dataset, cost, and termination_criterion
7. specify save_path and save_freq to save models for later use
8. call functions as call object constructors, e.g., `!obj:numpy.random.random {size: [5000, 5]}`


for details about how training monitoring is done, please see the [blog from a developer](http://daemonmaker.blogspot.ca/2014/12/monitoring-experiments-in-pylearn2.html)

In [5]:
## unsupervised learning by autoencoder
!cat auto_encoder.py

import theano.tensor as T 
from pylearn2.costs.cost import Cost, DefaultDataSpecsMixin
from pylearn2.utils import sharedX
from pylearn2.space import VectorSpace
from pylearn2.models.model import Model 
import numpy as np 

class AutoencoderCost(DefaultDataSpecsMixin, Cost):

	supervised = False

	def expr(self, model, data, **kwargs):
		space, source = self.get_data_specs(model)
		space.validate(data)

		X = data 
		Xhat = model.reconstruct(X)
		loss = -(X*T.log(Xhat) + (1-X)*T.log(1-Xhat)).sum(axis = 1)
		return loss.mean()

class Autoencoder(Model):

	def __init__(self, nvis, nhid):
		
		super(Autoencoder, self).__init__()

		self.nvis = nvis
		self.nhid = nhid

		W_value = np.random.uniform(size = (self.nvis, self.nhid))
		self.W = sharedX(W_value, "W")
		b_value = np.zeros(self.nhid)
		self.b = sharedX(b_value, "b")
		c_value = np.zeros(self.nvis)
		self.c = sharedX(c_value, 'c')
		self._params = [self.W, self.b, self.c]

		self.input_space = Ve

In [7]:
%%capture log

autoencoder_yaml = r"""!obj:pylearn2.train.Train {
  dataset: &train !obj:pylearn2.datasets.DenseDesignMatrix {
    X: !obj:numpy.random.random {size: [5000, 5]},
  },
  model: !obj:auto_encoder.Autoencoder {
    nvis: 5,
    nhid: 100,
  },
  algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
    batch_size: 500,
    learning_rate: 1e-3,
    monitoring_dataset: {
      'train': *train,
    },
    cost: !obj:auto_encoder.AutoencoderCost {},
    termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
      max_epochs: 100,
    },
  },
  save_freq: 1,
  save_path: '../models/autoencoder.pkl'
}"""

driver = yaml_parse.load(autoencoder_yaml)
driver.main_loop()

## TODO - simplify pylearn2 interface

## Use Pylearn2 to Implement word/image vector fusion
- [source of both idea and code](https://github.com/mganjoo/zslearning)
- [a theano based implementation] - http://nbviewer.ipython.org/github/renruoxu/data-fusion/blob/master/deprecated/mapping%20(1).ipynb
- it is a standard 1-hidden layer MLP with customized cost function
- the data we use here is that: X (image vectors from DeCaff), Y (word vectors from word2vec)

**see the [use case notebook](USECASE%20-%20pylearn2%20to%20implement%20zero%20shot%20learning.ipynb) **

##Some Thoughts
- Expect some growing pains in fast-developing packages like pylearn2
- After you pick up with the learning curve, things will not feel so bad anymore : )
- Compared to other libraries, pylearn2 is in a similiar position as scipy (intermediate level between numpy and others, say sklearn)
- Pylearn2 is a machine gun as its main developers are from LISA lab - one of the strongest group studying deep learning. And it has the same blood as in Theano. 
- If you want to customize your own deep learning model, use pylearn2. If you just want to try some well-developed models in an easier way, try other packages like theanets or neurolab.
- you need to know not just a little deeplearning to use the module, e.g., why `batch_size` can be defined in SoftmaxRegression `model` instead of in `training_algorithms`
- In order to use pylearn2 fluently, you need to speak the deeplearning guru language - because it is VERY HARD to debug in pylearn2 unless you are yourself a developer