Deform Pool: Multiple Forward-Backward Gives Buffer / Retain_graph Related Bugs #3

chengdazhi · 2018-12-22T17:55:13Z

Hi,

When I want to run deformable pooling (no matter modulated or not) two times (forward -> backward -> forward -> backward), the second backward yields follow bug:

Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

This doesn't happen on Deform Convs, so it's related to pooling only. What is your opinion?

CharlesShang · 2018-12-24T06:39:48Z

Hi Dazhi,
Could you check if you are on the latest version?
I just run the code below, it works fine.

    # mdformable pooling (V2)
    dpooling = DCNPooling(spatial_scale=1.0 / 4,
                          pooled_size=7,
                          output_dim=32,
                          no_trans=False,
                          group_size=1,
                          trans_std=0.1,
                          deform_fc_dim=1024).cuda()

    # first
    for _ in range(10):
        dout = dpooling(input, rois)
        target = dout.new(*dout.size())
        target.data.uniform_(-0.1, 0.1)
        error = (target - dout).mean()
        error.backward()

chengdazhi · 2018-12-24T07:17:53Z

Hi Charles, thanks for your reply. I have to use your code implemented for pytorch 0.4, and my problem was also triggered on pytorch 0.4.1.

I came up with an ugly but workable solution, which is to replace Function.save_for_backward and Function.saved_tensors with directly storing tensors as the Function objects' members. I have checked that this workaround produces correct results, but I'm not sure whether this would cause other problems.

CharlesShang · 2018-12-24T09:26:37Z

I dont think save_for_backward is the reason to "buffers have already been freed" error.
I think you might call backward twice, like forward -> backward -> backward.

chengdazhi · 2018-12-24T09:31:47Z

I am pretty positive that I didn't call two backwards consecutively. If you are not convinced about my conclusion, it should be very easy for you to switch to pytorch 0.4 implementation and add a loop to your example_mdpooling() in test.py. The error is triggered when calling _, _, ..., _ = self.saved_tensors, and it seems relative to your cuda code. I'm not familiar with pytorch, and I failed to pinpoint the root cause.

yeyuanzheng177 mentioned this issue Dec 9, 2019

When I use apex on 2080ti, I get the following error, how can I solve it? #42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deform Pool: Multiple Forward-Backward Gives Buffer / Retain_graph Related Bugs #3

Deform Pool: Multiple Forward-Backward Gives Buffer / Retain_graph Related Bugs #3

chengdazhi commented Dec 22, 2018

CharlesShang commented Dec 24, 2018

chengdazhi commented Dec 24, 2018

CharlesShang commented Dec 24, 2018

chengdazhi commented Dec 24, 2018

Deform Pool: Multiple Forward-Backward Gives Buffer / Retain_graph Related Bugs #3

Deform Pool: Multiple Forward-Backward Gives Buffer / Retain_graph Related Bugs #3

Comments

chengdazhi commented Dec 22, 2018

CharlesShang commented Dec 24, 2018

chengdazhi commented Dec 24, 2018

CharlesShang commented Dec 24, 2018

chengdazhi commented Dec 24, 2018