Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[WIP] Out-of-core training on V3 #3762
This PR enables us to run out-of-core training on V3. The out-of-core training realizes training of very large models whose memory usage goes above GPU memory by using CPU memory as swap area. This PR is a successor of #2027 (v1 OoC) and basic functions of this PR are ported from @anaruse ’s branch for v2 OoC). I think they should credit to @anaruse and our contributions are as follows
Please see examples (examples/imagenet/train_imagenet_OOC_ibm.py and examples/imagenet/train_imagenet_data_parallel_OOC_ibm.py) to run this function. As you can see in the examples, “with chainer.out_of_core_mode()” enables the out-of-core functions. Command line options of “—ooc” and “—insize” are added in the examples. You don’t have to modify model files.
I created another PR to Cupy v2 (Swap in/out between GPU and CPU memory #694) and this PR depends on it.
referenced this pull request
Nov 1, 2017
@imaihal Hi, thank you for sending this PR. How's the progress? It seems to be still a "WIP" PR, and it's based on Chainer v3, but the next major version of Chainer is v5. I think a lot of code in this PR can work in v5 as is, but some conflicts have already happened. Could you resolve those conflicts and let us know the status and plans to finish this PR?
@mitmul Sorry for late reply... Thank you for your comment. We've implemented this PR on Chainer V4, but not done on Chainer V5 yet. Unfortunately, as of now, we don't have plan to do it. However, we are interested in updating this PR. So, we would like to discuss what is the best way to merge official code at some point.
Should I close this PR for now?