Consolidate train_net and test_net in memory #119

mavenlin · 2014-02-17T12:15:22Z

Now train_net and test_net are constructed separately according to the two definition files. As is pointed out in #57, the definition files can be consolidated so that a single definition file creates both the train_net and test_net.

The consolidation can happen in memory level, namely use the same net for both train and test. The layer forward backward function can behave differently at running time according to the Phase parameter.

This will save memory needed by test_net, and also save time by avoiding memory copy from the train_net to the test_net.

kloudkl · 2014-02-17T12:25:24Z

This will bring substantial improvements in both aspects. I will be able to run a large net on my modest GPU.

@jeffdonahue, your SplitLayer #114 is wonderful! Do you have any suggestions about this issue?

kloudkl · 2014-02-18T02:05:21Z

Not setting test_net in the solver.prototxt removes the test_net_ initialization altogether and setting test_interval equal to 0 effectively disables Solver::test() during the training process. But combining the net are still very important when studying the optimization algorithms such as adaptive learning rate #30 or accelerated momentum #53. Without the results of testing every fixed interval, it is impossible to compare the effects of different algorithms and parameters.

sergeyk · 2014-02-25T00:45:22Z

This would be a great pull request if done properly -- and it's far from trivial. None of the core Caffe developers are currently working on this, so we would certainly appreciate a contribution!

kloudkl · 2014-02-25T20:27:36Z

If #57 (Consolidate network definitions) is not solved, the solution to this issue would involve merging the NetParameter of the train net and the test net which is the reverse operation of @jeffdonahue's src/caffe/util/insert_splits.cpp. Once #57 is resolved in the future, the merging will become useless.

To distinguish between the layers that only belong to one of the nets, the LayerParameter proto need to add at least one more field to flag the nets that use the layer. This is what #57 required.

In summary, a thorough solution had better deal with #57 first.

mavenlin · 2014-02-26T07:38:52Z

@kloudkl I don't quite understand "which is the reverse operation of @jeffdonahue's src/caffe/util/insert_splits.cpp".
I think #57 and the this issue can be resolve in one step.
Instead of using a single definition file to initialise two nets, we can just initialise one big net.
With split_layer, train net and test net are just different paths in the bigger net (which is defined in the single definition file). Each node will decide whether to propagate information according to the flag and the phase of solver.

Yangqing · 2014-02-27T04:36:57Z

I don't think this is a big issue. While it is tempting to save duplicated
memories, keep in mind that the parameters themselves are only (e.g. for
imagenet) 250 megabytes - you won't save much, and you will have a lot of
hassle dealing with memory pointers / accidentally overwriting stuff, etc.

If memory is an issue, one can always do training only, and write a
separate program to run testing.

Yangqing

On Tue, Feb 25, 2014 at 11:38 PM, Lin Min notifications@github.com wrote:

@kloudkl https://github.com/kloudkl I don't quite understand "which is
the reverse operation of @jeffdonahue https://github.com/jeffdonahue's
src/caffe/util/insert_splits.cpp".
I think #57 #57 and the this issue
can be resolve in one step.
Instead of using a single definition file to initialise two nets, we can
just initialise one big net.
With split_layer, train net and test net are just different paths in the
bigger net (which is defined in the single definition file). Each node will
decide whether to propagate information according to the flag and the phase
of solver.

Reply to this email directly or view it on GitHubhttps://github.com//issues/119#issuecomment-36098824
.

sguada · 2014-02-27T06:42:42Z

@Yangqing at this point using master training imagenet with 256 images per batch for training and 50 images per batch for test requires 4167Mb while training without test requires 3631Mb, so there is a 531Mb difference. Probably due to the duplicity of the data blobs.
Said so, merging testing and training would require the same batch size, otherwise the data blobs would need to reset at every change. So not trivial.

kloudkl · 2014-02-27T12:18:23Z

Why not just separate the data blobs from the layers so that the layers would be able to accept data of any batchsize (#166)? After all, we are all used to functions or methods being able to process containers such as vector or map of arbitrary sizes.

Yangqing · 2014-02-27T17:35:00Z

That is possible and already supported by caffe, but keep in mind that
resizing input effectively runs multiple deallocation and allocation of
memory chunks.

Also, the data blobs are separated from the layers, they are managed by
the network. Only exception is data layer, who needs to know the batch size
to start with. Your argument of merging training and testing will have
other non-trivial things - when having two data layers, switching training
and testing will effectively change the network architecture, requiring
re-initializing nets, and may be nasty when the network structure is not a
single chain.

My argument is that, if your testing does not fit into memory, don't do
testing along with training. Test on a different process, test which CPUs
which have more memory available, test on other machines by periodically
checking out snapshots - there are tons of better alternatives than to
coercing everything into one process.

Yangqing

On Thu, Feb 27, 2014 at 4:18 AM, kloudkl notifications@github.com wrote:

Why not just separate the data blobs from the layers so that the layers
would be able to accept data of any batchsize (#166 #166)?
After all, we are all used to functions or methods being able to process
containers such as vector or map of arbitrary sizes.

Reply to this email directly or view it on GitHubhttps://github.com//issues/119#issuecomment-36236093
.

sguada · 2014-02-27T18:36:49Z

I agree with @Yangqing, there are other ways to save memory by testing in different process or by improving other parts of caffe (i.e. see #128).

kloudkl · 2014-02-28T06:11:45Z

Some layers fix the batch size num_ in their SetUp methods and iterate over num_ rather than over the real batch sizes of the blobs passed in the Forward* and Backward* methods. If the real batch sizes are smaller than num_ there will be out of memory bound segmentation fault and if they are larger than num_ there will be unprocessed data points. The Layers that preset the valid batch sizes that they accpet are ConvolutionLayer, LRNLayer, FlattenLayer, InnerProductLayer and PaddingLayer(already killed in the dev branch). The other layers either perform element-wise operations or permit flexible batch sizes.

As long as the batch sizes of the bottom and top arguments do not exceed the available memory, there is no need for the arguments to be equal to fixed batch sizes. The batchsize field in proto can be removed.

To determine when the memory need to allocated on the fly, we had better check that the batch sizes of the top blobs are no less than those of the bottom blobs and allocate if necessary. The allocated memory for each layer is therefore just big enough to contain the data of the maximal batch size that run through the layer. With regard to the concern about avoiding frequent memory deallocation and reallocation, we will permit the blobs to grow in batch size but not to shrink. Therefore, the memory would be allocated lazily and reused once allocated. On the other hand, if memory is scarce and needs to be revoked, the shrink can happen only after a relatively long inactive period.

In the use case of merging train_net and test_net, the phase that uses the smaller batch size only reuses a portion of the pre-allocated memory that it indeed requires.

shelhamer · 2014-04-22T05:37:33Z

#332 shares the blobs between the train and tests nets to conserve memory. #57 will consolidate the definitions. Thanks for the suggestion!

shaibagon · 2016-02-17T06:19:27Z

Thank you for your work on consolidating the weight blobs between test and train nets. However, I noticed that caffe still allocates separate memory for BOTH train and test data blobs.
Is there a way to "swap" these nets in and out of the GPU memory? That is, during training only allocate and work on the training net. Then when starting a test phase, swap the training net out of GPU and allocate for the test net?
Working with large nets, the weight sharing is not enough to reduce GPU mem consumption. Swaping the entire nets (weights and data blobs) will make much more GPU mem room for train/test, this swap will only occur when switching phase from TRAIN to TEST.

Seanberite · 2016-10-12T11:09:50Z

@shaibagon Did you find any solution to swap when switching TRAIN - TEST phase?

shaibagon · 2016-10-12T20:02:25Z

@Seanberite I'm afraid not.

shelhamer added the interface label Feb 18, 2014

shelhamer added the work-in-progress label Feb 25, 2014

kloudkl mentioned this issue Feb 25, 2014

padding aware im2col and col2im functions #128

Merged

kloudkl mentioned this issue Feb 26, 2014

Generalize the network into graph of Blob nodes and Layer edges #166

Closed

kloudkl mentioned this issue Mar 9, 2014

Allow dynamic batch sizes in all the layers #195

Closed

kloudkl mentioned this issue Mar 22, 2014

Make SyncedMem and Blob be able to increase capacities without deallocation and reallocation #250

Closed

shelhamer closed this as completed Apr 22, 2014

hnoel mentioned this issue Aug 13, 2014

problem with memory_data_layer in python #912

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate train_net and test_net in memory #119

Consolidate train_net and test_net in memory #119

mavenlin commented Feb 17, 2014

kloudkl commented Feb 17, 2014

kloudkl commented Feb 18, 2014

sergeyk commented Feb 25, 2014

kloudkl commented Feb 25, 2014

mavenlin commented Feb 26, 2014

Yangqing commented Feb 27, 2014

sguada commented Feb 27, 2014

kloudkl commented Feb 27, 2014

Yangqing commented Feb 27, 2014

sguada commented Feb 27, 2014

kloudkl commented Feb 28, 2014

shelhamer commented Apr 22, 2014

shaibagon commented Feb 17, 2016

Seanberite commented Oct 12, 2016

shaibagon commented Oct 12, 2016

Consolidate train_net and test_net in memory #119

Consolidate train_net and test_net in memory #119

Comments

mavenlin commented Feb 17, 2014

kloudkl commented Feb 17, 2014

kloudkl commented Feb 18, 2014

sergeyk commented Feb 25, 2014

kloudkl commented Feb 25, 2014

mavenlin commented Feb 26, 2014

Yangqing commented Feb 27, 2014

sguada commented Feb 27, 2014

kloudkl commented Feb 27, 2014

Yangqing commented Feb 27, 2014

sguada commented Feb 27, 2014

kloudkl commented Feb 28, 2014

shelhamer commented Apr 22, 2014

shaibagon commented Feb 17, 2016

Seanberite commented Oct 12, 2016

shaibagon commented Oct 12, 2016