Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SINGA-236 memory pool #236

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

liyuchenmike
Copy link
Contributor

Implemented the following features:

  1. A memory pool facility to management Block data allocated by the memory pool.
  2. Add relevant test cases.

liyuchenmike and others added 16 commits August 11, 2016 17:29
SINGA-236 Memory Pool \n /
Added the memory pool feature to management Block data allocated on CPU side. \n /
Added relevant test cases.
In this ticket, we implemented a batch normalized VGG model for cifar10
dataset (refer to http://torch.ch/blog/2015/07/30/cifar.html).

*    +vgg-parallel.cc for parallel training
*    +vgg.py using python language
*    fix a bug in ResetLike() method in tensor.h, which before did not
     reset shape.
*    fix a bug in local_updater.cc, which may cause race condition when
     multi-threads try to initialize mutexes concurrently.
*    revise batch nomalization layer to support 2D tensor input
Implement Alexnet on Imagenet.
1. The model is following the imagenet paper.
2. The data is created offline, including multiple training files,
  one test file and one mean file. All of them are in binary format.
  This part is implemented via writer, encoder in SINGA.
3. Loading data in multiple threads. This part needs reader,
  decoder and transformer in SINGA.
4. This example need OpenCV support.
5. snapshot, jpgencoder, timer, binfile_reader are slightly modified.
Replace CudnnDropout with Dropout
Add argument "-h"
Merge the training of vgg and alexnet into train.py
The validation accuracy of vgg could reach 0.89
- cudnn rnn implementation (cudnn_rnn,h, cudnn_rnn.cc, rnn.cc, rnn.h, test_cudnn_rnn.cc).
- The weight shape now are manually calculated instead of using API provided by CUDNN.
- Test for RNN_cudnn_Tanh (unidirectional, 1 hidden layer).
Finish the CudnnRNN layer.
Pass test for tanh rnn.

RNN forward accepts a vector of input tensors: <x0, x1, ... x(n-1), hx, cx>
x(i) is the i-th input tensor, hx is the init hidden tensor which could
be a dummy tensor. A dummy tensor is a tensor created without shape/device/data_type,
during compuation, cudnnRNN would use 0s for this tensor. cx is not necessary
for relu/tanh/gru rnn. For lstm, it could also be a dummy tensor like hx.
The output is: <y0, y1, ... y(n-1), hy, cy>.
relu/tanh/gru rnns does not have cy. lstm have both hy and cy.

RNN backward accepts a vector of input gradient tensors: <dy0, dy1, ...  dy(n-1), dhy, dcy>.
dhy is necessry for all rnns, but could be a dummy tensor, in which case
a tensor with 0s would be used for dhy during computation. dcy is used
only for lstm, which could also be a dummy tensor.
The output is: <dw, <dx0, dx1, ... dx(n-1), dhx, dcx>>,
where dhx is a tensor for the gradient of hx. dcx is only used for lstm.

The CudnnRNN must be moved onto cuda, otherwise memory error would happen (the weight is on cpu).
Add an example using the char-rnn model.
The trained model (with 2 stacks of lstm) over linux kernel source code
could generate source code with some meaning full patterns, e.g.,
indention, comments, variable definition, assignments.
In this ticket, we implement a new communication framework for SINGA. We abstract each physical computing node as an endpoint, and add two interfaces, i.e., send and recv, to the endpoint so that users can directly call them to accomplish data transfer.
Add checks for cudnn version, which should be >= 5.05 to compile cudnnrnn code
Reformat code following google style.
Update cmake files to ignore communication code if ENABLE_DIST is off.
For most layers, we would have multiple implementations, e.g., using
cudnn for nvidia gpu, using cpp for cpu and using opencl for other gpus.

These layers have different classes. They are registered with different
identifiers. This ticket would unify the layer identifiers for each
engine:
1. cudnn layers are registered with identifier = cudnn_xxx, e.g.,
cudnn_convolution for the CudnnConvolution layer.
2. singa layers are registered with identifier = singa_xxx, e.g.,
singa_convolution for the Convolution layer.

cudnn engine must run on cuda devices. and singa engine could run on
cuda-gpu device or cpp-cpu device depending on the layer type. For
instance, the Convolution layer must run on cpp-cpu device, and Dense
layer can run on both devices and would select the correct device
automatically.
Users need to make sure the engine and the device of the tensors.

Both CPP and Python code is updated. Users have to compose the layer
identifier manually for CPP version. For Python version, users can set
layer.engine='cudnn' or 'singa'.

All identifiers are case insensitive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants