-
Notifications
You must be signed in to change notification settings - Fork 48
/
README
196 lines (153 loc) · 8.34 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
cuv RBM
Hannes Schulz (schulz@ais.uni-bonn.de)
Andreas Mueller (amueller@ais.uni-bonn.de)
11/4/2010
The implementation is not as short as it could be but is build to be very
modular with lots of functionality. A special emphasis is laid on easy
extendability.
Functionality:
- Restricted Boltzmann Machine Training
- With n-step Contrastive Divergence
- With persistent Contrastive Divergence
- Weight decay, momentum, batch-learning
- Binary or gaussian visible nodes
- Restricted Boltzmann Machine Evaluation
- Sampling from the model
- Visualizing Filters
- Annealed Importance Sampling for approximating the partition function
- Calculating the partition function exactly
- Visualization and saving of hidden representations
- Stacking RBMs to Deep Belief Networks
- Sampling from DBNs
- Deep Boltzmann Machine Training
- With n-step Contrastive Divergence
- With persistent Contrastive Divergence
- Deep Boltzmann Machine Evaluation
- Sampling from the model
- Neural Network Traing
- Backpropagation of error
- RPROP
- Weight decay, momentum, batch-learning
- Variable number of layers
- Cross entropy training
- Finetuning
- Initalizing a Neural Network with an RBM and DBM
- All of the above functionality can be used
- Training on Image Data
- Visualization of input, filters and samples from the model
- on-the-fly modifications to trainingset via gaussian noise or translations
Prerequisits
This program uses the cuv library.
You can download it at http://www.ais.uni-bonn.de/deep_learning/downloads.html.
There you can also find build instructions.
Python Packages:
You need numpy, scipy, matplotlib to run this program.
These can be easyly installed on an Ubuntu or Debian linux with
> sudo apt-get install python-numpy python-scipy python-matplotlib
General usage:
Depending on options there are three training phases:
1) Layer wise RBM training
2) DBM training
3) Finetuning
Each training phase can be stopped by pressing Ctrl+C.
This saves the current weights and starts the next phase.
During layer wise RBM training, Ctrl+C finishes training
of one layer and continues with training the next RBM layer.
All parameters are saved in the info-0.pickle so that they can be recovered at all times.
By defaults, weights are saved only after each training phase (indicated by a postfix).
Using the --save_every option, one can save every X weight updates which is handy
if one wants to analyse learning.
During RBM and DBM learning, the "reconstruction error" is shown every 100 weight updates.
During MLP learning, training error is shown every epoch and test error is shown every
ten epochs.
Command-line Options
All command line options can be printed with
> ./caller.py --help
There is a brief explanation to every one.
The most important options that are used for both finetuning (MLP) and pretraining (RBM) are:
--l_size
The size of the hidden layers, separated with dots.
The size of the visible and (if any) output layer are determined by the dataset.
Example: Training on a dataset of dimension 100 with 5 classes, --l_size 80.50
will yield a 100-80-50 DBN that is then finetuned as a 100-80-50-5 neural network.
--dataset
Choses the dataset. Build in is support for MNIST, Caltech 101 and some toy datasets.
You can download MNIST at http://yann.lecun.com/exdb/mnist/ and Caltech 101 at
http://www.vision.caltech.edu/Image_Datasets/Caltech101/.
--device
Chooses the GPU you want to use. It is not recommended to start more than
one instance on the same GPU.
--workdir
Write all output files (pickles with learning information, weights)
into this directory
--load
Load weights from the directory given with --workdir.
This can be done either for visualization (default) or to continue training.
Can be one of none,pretraining,dbm,finetuning,latest.
None (default) means no weights are loaded and everything in workdir is
overwritten. pretraining, dbm, finetuning specify that the weights
that where saved AFTER this training phase are loaded.
Latest means the latest training phase that was saved is loaded.
(Here latest is not timestamp but pretraining then DBM then finetuning)
--continue_learning <Layernum>
Continue RBM learning in layer <Layernum>.
For finetuning, the whole network is learned.
All other options without the prefix "finetune" only apply to the RBM where those
with the prefix "finetune" apply only to the MLP.
Datasets
MNIST
Dataset of handwritten digits.
You need to download it from http://yann.lecun.com/exdb/mnist/.
MNIST Validation
Randomly chosen 50000 examples for training and 10000 for validation (which is now the "testing" set),
both chosen from the MNIST trainingset.
Shifter
Very small toy dataset, comes with the RBM.
Bars and Stripes
Another very small toy dataset.
Caltech
Natural images, reshaped and padded.
You need to download it from http://www.vision.caltech.edu/Image_Datasets/Caltech101/.
Then the "extract_caltech.py" from the datasets folder can be used to generate a file that
can be read by the RBM.
Using Your Own Dataset
To use your own dataset, just define a new class in dataset.py.
It needs to inherit from the DataSet class.
You need to set at least self.data, cfg.py and cfg.px.
For image data, px times py is the size of a single image.
If your data is not image data, set either one to 1 and the other to the dataset dimensionality.
If you have image data with multiple channels, set cfg.maps_bottom to the number of images.
These variables are used for visualization purposes only.
You also need to "register" the dataset in "helper_classes.py" as a member of the enum Dataset.
Examples
Training a DBM with 3 hidden layers on MNIST using PCD and generating a video of sampling 100 steps:
> ./caller.py --dataset mnist --cd_type pcd --l_size 500.500.1000 --dbm 1 --video 1 --weight_updates 100000 --eval_steps 100 -w /tmp/dbm
Or shorter (since mnist and PCD are default):
> ./caller.py --l_size 500.500.1000 --dbm 1 --video 1 --weight_updates 100000 --eval_steps 100 -w /tmp/dbm
Training an one layer MLP on MNIST using random translations of the training set with RPROP and maximum entropy error function:
> ./caller.py --pretrain 0 --finetune 1 --l_size 500 --finetune_translation_max 1 --finetune_epochs 200 -w /tmp/mlp_rprop
The same but with backpropagation with weight-decay and MSE error function:
> ./caller.py --pretrain 0 --finetune 1 --l_size 500 --finetune_translation_max 1 --finetune_epochs 200 --finetune_rpop 0 --finetune_learnrate 0.001 --finetune_cost 0.01 --finetune_softmax 0 -w /tmp/mlp_backprop
The same but with two hidden layers and pretraining using DBN trained with PCD:
> ./caller.py --finetune 1 --l_size 500.500 --finetune_translation_max 1 -w /tmp/mlp_pretraining
Annealed Importance Sampling
Annealed importance sampling was proposed for RBM evaluation by Ruslan Salakhutdiov in "On the Quantitative Analysis of Deep Belief Networks." (http://www.mit.edu/~rsalakhu/papers/dbn_ais.pdf)
This is an implementation of the method described in this paper. It works together with the RBM implementation by
reading parameters from a given work directory.
Usually it can simply be called with:
> python ais/ais.py --dir <directory>
where <directory> is the working directory of the RBM.
The output is the estimated partition function and the resulting data likelihood under the model.
The result of the computation is saved in a pickle in the working directory and the calculation is not repeated
if the script is called again with the same directory.
Calculating The Partition Function
It is possible to calculate the partition function directly by summing out the visible (or hidden) units
analytically and then summing over the others numerically.
This script is written in a way to sum out the hidden units numerically.
This means runtime is (nearly) independent of the number of visible units, but exponential in the number of hidden nodes.
For about 25 hidden nodes, it can be computed in reasonable time.
Usage of the script is very simple:
> python analyse/calc_part_func.py --device 0 --basename <directory> --idx pretrain
The result of the computation is saved in a pickle in the working directory and the calculation is not repeated
if the script is called again with the same directory unless "--force-overwrite" is given as a command line option.
Have fun!