Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Code - New Features #106

Closed
6 of 9 tasks
melgor opened this issue Mar 9, 2016 · 5 comments
Closed
6 of 9 tasks

Training Code - New Features #106

melgor opened this issue Mar 9, 2016 · 5 comments
Labels

Comments

@melgor
Copy link
Contributor

melgor commented Mar 9, 2016

Adding new Features for Training Code, which will clean the code, reduce memory consumption during training and enable multiple-GPU.
Based on talk: https://groups.google.com/forum/?hl=en#!topic/cmu-openface/9Be_VMC659c

@melgor
Copy link
Contributor Author

melgor commented Mar 11, 2016

I tried to add "cudnn.convert". Unfortunately, there is a problem with GPU memory consumption. I was using it in my other projects and it works fine.
I tested it and it is connected with
model:float()
Without it it works fine...

bamos pushed a commit that referenced this issue Mar 16, 2016
Replace nn_to_cudnn by cudnn.convert #106
@fmassa
Copy link

fmassa commented Mar 17, 2016

@melgor About the shareGradient part, I've started working on a generic code that allows to reuse the intermediary buffers of the network without having to define the sharing by hand https://github.com/fmassa/optimize-net
There are unit tests for common architectures, and I'll soon finish a version which works for training as well (there is an open PR for that already which works, just need to update the README and add some more tests).
I'll also soon add a more robust option for clearing the gradWeights when in inference mode (current simple approach loses the sharings if there are some), but that's not difficult to add.

@melgor
Copy link
Contributor Author

melgor commented May 9, 2016

I was trying to add muli-GPU stuff to the code but I failed. I have similar problem like in adding cuDNN. This is exacly about converting model to float before saving.
https://github.com/cmusatyalab/openface/blob/master/training/train.lua#L91
Do we really need it? For sure everybody will use CUDA for training. We it would be better to provide the script which will convert it to float.
But this will add other problem: we would not able to use any "clearState" solution, because it will break memory optimization made by 'optnet'

To sum up:

  1. It would be easier to save CUDa model.
  2. Saved model will be bigger than now (as we save gradient matrices etc.)
  3. It will need additional script for "clearState" and "float".

What to you think about it, @bamos ?

@bamos
Copy link
Collaborator

bamos commented May 9, 2016

Hi @melgor - it seems reasonable to save the CUDA model to disk and then use a script that can convert it to float and clearing the state if it makes adding these features easier.

I have some issues running out of disk space so it's not ideal to use a larger model here. I wonder how much disk space compressing the model with something like https://groups.google.com/forum/#!topic/torch7/7hCpmhv20KY will save.

-Brandon.

@stale
Copy link

stale bot commented Nov 18, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 18, 2017
@stale stale bot closed this as completed Nov 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants