## Common Use Scinarios for Deep Learning Models
### 1. Building Models on Training Data
1. Customize Network Layout: e.g., number of layers, connections, activation functions, weight initializations
2. Customize Cost Function: special regularization
3. Choose or Customize optimization method, e.g., l-bfgs, cg, batch-gd, sgd, etc.
4. Specify early stopping, train/validation data, performance monitoring
5. Save learned weights and be able to recover from it
### 2. Use Models on New Data
1. Restore model from its trained weights and yaml configuration
2. Predict on new data
3. Extract weights from hidden layers as features - mostly only useful for models pre-trained on large data, e.g., Caffe, Overfeat, sklearn-theano


**we will try to cover these points in this notebook**
**use the [data-downloading script](https://github.com/lisa-lab/DeepLearningTutorials/blob/master/data/download.sh) to download the necessary data into `./data` folder** 

## Pylearn2 for Deep Learning
1. pylearn2 to Theano is similiar as scipy to numpy
2. it utilizes yaml for quick experiemnt setup, under a unified framework including `dataset`, `algorithm`(optimizer), `model` (network), which are leigo blocks for deep leanring
3. in the yaml configuration, you can use `!obj:` to create instance (composite of both data and methods), `!import` to attach to customized functions (e.g., cost function), and `!pkl:` to load data 

In [1]:
!pip show pylearn2

---
Metadata-Version: 1.0
Name: pylearn2
Version: 0.1.dev0
Summary: A machine learning library built on top of Theano.
Home-page: UNKNOWN
Author: UNKNOWN
Author-email: UNKNOWN
License: BSD 3-clause license
Location: /home/dola/opt/pylearn2
Requires: numpy, pyyaml, argparse, Theano


In [2]:
from pylearn2.config import yaml_parse
from pylearn2.datasets import DenseDesignMatrix
import cPickle
from sklearn import preprocessing

Couldn't import dot_parser, loading of dot files will not be possible.


### 1. [Basic Example of Building Softmax Regression for MNIST - with YAML](http://nbviewer.ipython.org/github/lisa-lab/pylearn2/blob/master/pylearn2/scripts/tutorials/softmax_regression/softmax_regression.ipynb)

**The exercise focuses on how easy or hard to quickly test a model on an numpy.array in pylearn2**

**its use feels quite counter-intuitive for explorative data science - it needs to force everything into YAML and dump everything on disk first - Theano has a much more friendly interface when integrated with other python objects in this case**

**but pylearn2 may have its point of doing this as it uses configuration to train pre-defined data to get good performance, but not necessarily in an explorative environment**


In [3]:
## prepare data - from single mnist.pkl to create training, valiadtion and testing set and pickle them
## even though pylearn2.datasets.DenseDesignMatrix accepts integer array as y, but SoftmaxRegression Model
## only accepts one-hot encoding
mnist_data_config = r"!pkl: '../data/mnist.pkl'"
train_mnist, valid_mnist, test_mnist = yaml_parse.load(mnist_data_config)
coder = preprocessing.OneHotEncoder()
train_mnist = DenseDesignMatrix(X = train_mnist[0], y = coder.fit_transform(train_mnist[1].reshape((-1, 1))).toarray())
valid_mnist = DenseDesignMatrix(X = valid_mnist[0], y = coder.fit_transform(valid_mnist[1].reshape((-1, 1))).toarray())
test_mnist = DenseDesignMatrix(X = test_mnist[0], y = coder.fit_transform(test_mnist[1].reshape((-1, 1))).toarray())
cPickle.dump(train_mnist, open("../data/train_mnist.pkl", "w"))
cPickle.dump(valid_mnist, open("../data/valid_mnist.pkl", "w"))
cPickle.dump(test_mnist, open("../data/test_mnist.pkl", "w"))

In [4]:
!ls ../data

atis.fold0.pkl.gz  atis.fold4.pkl.gz  mnist.pkl.gz	 test_mnist.pkl
atis.fold1.pkl.gz  imdb.pkl	      mnist_py3k.pkl.gz  train_mnist.pkl
atis.fold2.pkl.gz  midi.zip	      Nottingham	 valid_mnist.pkl
atis.fold3.pkl.gz  mnist.pkl	      Nottingham.zip


In [5]:
dataset_config = r"""&train !pkl: '../data/train_mnist.pkl'
"""

model_config = r"""!obj:pylearn2.models.softmax_regression.SoftmaxRegression {
  n_classes: 10,
  irange: 0.,
  nvis: 784
}"""


algorithm_config = r"""!obj:pylearn2.training_algorithms.bgd.BGD {
  batch_size: 10000,
  conjugate: 1,
  monitoring_dataset: {
    'train': *train,
    'valid': !pkl: '../data/valid_mnist.pkl'
  },
  termination_criterion: !obj:pylearn2.termination_criteria.MonitorBased {
    channel_name: "valid_y_misclass"
  }
}
"""

driver_config = r"""!obj:pylearn2.train.Train {
  dataset: %s
  , model: %s
  , algorithm: %s
  , extensions: [!obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
             channel_name: 'valid_y_misclass',
             save_path: "../models/softmax_regression_best.pkl"
        },]
  , save_path: "../models/softmax_regression.pkl"
  , save_freq: 1
}
""" % (dataset_config, model_config, algorithm_config)



In [7]:
%%capture log
driver = yaml_parse.load(driver_config)
driver.main_loop()

INFO (theano.gof.compilelock): Refreshing lock /home/dola/.theano/compiledir_Linux-3.13--generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/lock_dir/lock
INFO:theano.gof.compilelock:Refreshing lock /home/dola/.theano/compiledir_Linux-3.13--generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/lock_dir/lock


In [8]:
print log.stdout[-500:]

9996146
	valid_y_mean_max_class: 0.917966512741
	valid_y_min_max_class: 0.248488289737
	valid_y_misclass: 0.0714
	valid_y_nll: 0.261678336848
	valid_y_row_norms_max: 1.99803194598
	valid_y_row_norms_mean: 0.574710145358
	valid_y_row_norms_min: 0.0
Saving to ../models/softmax_regression.pkl...
Saving to ../models/softmax_regression.pkl done. Time elapsed: 0.015478 seconds
Saving to ../models/softmax_regression.pkl...
Saving to ../models/softmax_regression.pkl done. Time elapsed: 0.013851 seconds



In [10]:
## make use of trained model
## You need to do all the geeky stuff here to convert it back to Theano
import theano.tensor as T
import theano
import numpy as np 
softmax_model = cPickle.load(open("../models/softmax_regression.pkl"))
X = softmax_model.get_input_space().make_theano_batch()
y = softmax_model.fprop(X)
ylabel = T.argmax(y, axis = 1)
predict = theano.function([X], ylabel)

yhat = predict(test_mnist.X)
ytarget = cPickle.load(open("../data/mnist.pkl"))[-1][1]
print "classification rate on test data", np.mean(yhat == ytarget)

 classification rate on test data 0.924


### 2. [stacked autoencoders example by yaml](http://nbviewer.ipython.org/github/lisa-lab/pylearn2/blob/master/pylearn2/scripts/tutorials/stacked_autoencoders/stacked_autoencoders.ipynb)

1. layerwise pre-training (unsupervised) for denoising autoencoder
2. stacking these layers to form a MLP and fine tune it with supverised learning

In [11]:
%%capture log

## layer 1 - unsupervised training
layer1_yaml = r"""
!obj:pylearn2.train.Train {
    dataset: &train !pkl: '../data/train_mnist.pkl',
    model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
        nvis : 784,
        nhid : 500,
        irange : 0.05,
        corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
            corruption_level: .2,
        },
        act_enc: "tanh",
        act_dec: null,    # Linear activation on the decoder side.
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate : 1e-3,
        batch_size : 100,
        monitoring_batches : 5,
        monitoring_dataset : *train,
        cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
        termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: 10,
        },
    },
    save_path: "../models/dae_l1.pkl",
    save_freq: 1
}
"""
layer1 = yaml_parse.load(layer1_yaml)


layer1.main_loop()

In [12]:
print log.stdout[-500:]

to ../models/dae_l1.pkl done. Time elapsed: 0.756267 seconds
Time this epoch: 8.580836 seconds
Monitoring step:
	Epochs seen: 10
	Batches seen: 5000
	Examples seen: 500000
	learning_rate: 0.001
	objective: 11.8870911208
	total_seconds_last_epoch: 13.212896
	training_seconds_this_epoch: 8.580836
Saving to ../models/dae_l1.pkl...
Saving to ../models/dae_l1.pkl done. Time elapsed: 0.764984 seconds
Saving to ../models/dae_l1.pkl...
Saving to ../models/dae_l1.pkl done. Time elapsed: 0.751901 seconds



In [205]:
%%capture log
## second layer trainning - 2nd layer takes the output of the 1st layer as its input
layer2_yaml = r"""!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.transformer_dataset.TransformerDataset {
        raw: !pkl: '../data/train_mnist.pkl',
        transformer: !pkl: "../models/dae_l1.pkl" # use layer 1 as input
    },
    model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
        nvis : 500,
        nhid : 500,
        irange : 0.05,
        corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
            corruption_level: .3,
        },
        act_enc: "tanh",
        act_dec: null,    # Linear activation on the decoder side.
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate : 1e-3,
        batch_size : 100,
        monitoring_batches : 5,
        monitoring_dataset : *train,
        cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
        termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: 10,
        },
    },
    save_path: "../models/dae_l2.pkl",
    save_freq: 1
}
"""

layer2 = yaml_parse.load(layer2_yaml)


layer2.main_loop()

In [199]:
print log.stdout[-500:]

o ../models/dae_l2.pkl done. Time elapsed: 0.415859 seconds
Time this epoch: 12.082077 seconds
Monitoring step:
	Epochs seen: 10
	Batches seen: 5000
	Examples seen: 500000
	learning_rate: 0.001
	objective: 4.3132512472
	total_seconds_last_epoch: 16.162168
	training_seconds_this_epoch: 12.082077
Saving to ../models/dae_l2.pkl...
Saving to ../models/dae_l2.pkl done. Time elapsed: 0.419657 seconds
Saving to ../models/dae_l2.pkl...
Saving to ../models/dae_l2.pkl done. Time elapsed: 0.381002 seconds



In [200]:
%%capture log
## supervised tuning of the stacked network
## 1. stack the two DAE into a MLP
## supervised-training the MLP

mlp_yaml = r"""!obj:pylearn2.train.Train {
    dataset: &train !pkl: '../data/train_mnist.pkl',
    model: !obj:pylearn2.models.mlp.MLP {
        batch_size: 100,
        layers: [
                 !obj:pylearn2.models.mlp.PretrainedLayer {
                     layer_name: 'h1',
                     layer_content: !pkl: "../models/dae_l1.pkl"
                 },
                 !obj:pylearn2.models.mlp.PretrainedLayer {
                     layer_name: 'h2',
                     layer_content: !pkl: "../models/dae_l2.pkl"
                 },
                 !obj:pylearn2.models.mlp.Softmax {
                     max_col_norm: 1.9365,
                     layer_name: 'y',
                     n_classes: 10,
                     irange: .005
                 }
                ],
        nvis: 784
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate: .05,
        learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
            init_momentum: .5,
        },
        monitoring_dataset:
            {
                'valid' : !pkl: '../data/valid_mnist.pkl',
            },
        cost: !obj:pylearn2.costs.mlp.Default {},
        termination_criterion: !obj:pylearn2.termination_criteria.And {
            criteria: [
                !obj:pylearn2.termination_criteria.MonitorBased {
                    channel_name: "valid_y_misclass",
                    prop_decrease: 0.,
                    N: 100
                },
                !obj:pylearn2.termination_criteria.EpochCounter {
                    max_epochs: 50
                }
            ]
        },
        update_callbacks: !obj:pylearn2.training_algorithms.sgd.ExponentialDecay {
            decay_factor: 1.00004,
            min_lr: .000001
        }
    },
    extensions: [
        !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
            start: 1,
            saturate: 250,
            final_momentum: .7
        }
    ],
    save_path: "../models/mlp.pkl",
    save_freq: 1
}
"""

mlp_yaml = yaml_parse.load(mlp_yaml)


mlp_yaml.main_loop()

In [201]:
print log.stdout[-500:]

n: 1.93520071282
	valid_y_max_max_class: 0.999997682992
	valid_y_mean_max_class: 0.98003508451
	valid_y_min_max_class: 0.548868565573
	valid_y_misclass: 0.0203
	valid_y_nll: 0.0668058268363
	valid_y_row_norms_max: 0.545912116268
	valid_y_row_norms_mean: 0.264345579985
	valid_y_row_norms_min: 0.101699705657
Saving to ../models/mlp.pkl...
Saving to ../models/mlp.pkl done. Time elapsed: 0.958934 seconds
Saving to ../models/mlp.pkl...
Saving to ../models/mlp.pkl done. Time elapsed: 0.955979 seconds



In [202]:
## make use of trained model
## You need to do all the geeky stuff here to convert it back to Theano
import theano.tensor as T
import theano
softmax_model = cPickle.load(open("../models/mlp.pkl"))
X = softmax_model.get_input_space().make_theano_batch()
y = softmax_model.fprop(X)
ylabel = T.argmax(y, axis = 1)
predict = theano.function([X], ylabel)

yhat = predict(test_mnist.X)
ytarget = cPickle.load(open("../data/mnist.pkl"))[-1][1]
print "Accuracy on test data: ", np.mean(yhat == ytarget)

Accuracy on test data:  0.98


### 3. So I need a wrapper for current pylearn2 

**support sklearn-like interface and handle data dumping behind the scene, it is a direct wrapper that still supports yaml language**

### it supports two type of pylearn2 model 
- single configuration model that can be trained by a specific algorithm, e.g., MLP
- stacked models that should be trained stage by stage, e.g., Stacked Autoencoders

In [168]:
import sys
sys.path += ["../python-codes/"]
%load_ext autoreload
%autoreload 2
import sklearn_pylearn2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [228]:
!ls

TUTORIAL - pylearn2 101.ipynb  TUTORIAL - sklearn-theano.ipynb


In [227]:
m = sklearn_pylearn2.Pylearn2Model("testmodel", None, None)
m.clean()

deleted /home/dola/workspace/dola/deeplearning-exploration/notebooks/testmodel-Apr222015-160837
