Gradient accumulation of several sample #7964

edmBernard · 2017-09-20T11:34:33Z

I search to compute a triplet ranking loss.
To achieve this, I need to accumulate gradient through several example and update weight with the resulting gradient.

In chainer we can easily do that because backward function accumulate gradient by default :

We need to clear the gradients first because the backward() method accumulates gradients instead of overwriting the previous values.

Is there a way to do this with Mxnet Gluon and the autograd API ?

That similar to gradient accumulation inside a batch.

ZiyueHuang · 2017-09-21T02:54:52Z

You mean only collect several examples' gradients in a batch to update weight? You can create a 0-1 mask and use it as the outermost out_grad to backward。

edmBernard · 2017-09-21T06:32:45Z

A bit more description of the process (it's for image retrieval-like task):

compute triplet loss on 10000 (query, relevant, non-relevant) images
sort these triplet to keep the 100 first with higher loss
compute gradient on these 100 samples
aggregate gradients
update weight

I can't pack these 100 samples in one batch and make the learning process on this batch because a batch of 100*3 images take too much place in graphic card memory.

ZiyueHuang · 2017-09-21T06:48:22Z

If you want to do this on whole dataset level. Then you should do an extra forward on whole dataset every time to get the indices of top-100, then create a batch, run forward-backward and update weight.

edmBernard · 2017-09-21T11:12:12Z

I try but I can't run forward-backward on a batch of 100+ images, it consume too much memory and crash.
That's why I ask if there is a way to accumulate gradient outside of batch usage.

ZiyueHuang · 2017-09-21T11:20:22Z

I think you can hack update in optimizer, store the gradients into states in previous batches. For example, if batch size is 10, then in the first 9 batches only aggregate the gradients into states and in the 10th batch aggregate the gradients then update the weights.

edmBernard · 2017-09-23T10:07:27Z

I was looking for a more regular way :(
I will test to hack the optimizer thx

ZiyueHuang · 2017-09-23T10:46:26Z

I'm afraid that there is no regular way because actually this is not a regular case. I think hacking optimizer is the easiest way. On 09/23/2017 18:07, Erwan BERNARD wrote: I was looking for a more regular way :( I will test to hack the optimizer thx — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

edmBernard closed this as completed Sep 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient accumulation of several sample #7964

Gradient accumulation of several sample #7964

edmBernard commented Sep 20, 2017

ZiyueHuang commented Sep 21, 2017

edmBernard commented Sep 21, 2017

ZiyueHuang commented Sep 21, 2017

edmBernard commented Sep 21, 2017

ZiyueHuang commented Sep 21, 2017

edmBernard commented Sep 23, 2017

ZiyueHuang commented Sep 23, 2017 via email

Gradient accumulation of several sample #7964

Gradient accumulation of several sample #7964

Comments

edmBernard commented Sep 20, 2017

ZiyueHuang commented Sep 21, 2017

edmBernard commented Sep 21, 2017

ZiyueHuang commented Sep 21, 2017

edmBernard commented Sep 21, 2017

ZiyueHuang commented Sep 21, 2017

edmBernard commented Sep 23, 2017

ZiyueHuang commented Sep 23, 2017 via email