dcn_v2 error RuntimeError: expected scalar type Float but found Half #1

KingWangJL · 2022-02-24T12:46:07Z

When running the network, I encountered this problem. Through debugging, I found that the offset in DCN's forword function is a type of float16.So I think this might be the cause of the problem，Do you have a better idea for this problem.

KingWangJL · 2022-02-24T12:46:39Z

class DCN(DCNv2):
def init(self, in_channels, out_channels, kernel_size, stride, padding, dilation=1, deformable_groups=1, extra_offset_mask=False,):
super(DCN, self).init(in_channels, out_channels, kernel_size, stride, padding, dilation, deformable_groups)

    self.extra_offset_mask = extra_offset_mask
    channels_ = self.deformable_groups * 3 * self.kernel_size[0] * self.kernel_size[1]
    self.conv_offset_mask = nn.Conv2d(self.in_channels, channels_, kernel_size=self.kernel_size, stride=self.stride, padding=self.padding, bias=True)
    self.init_offset()

def init_offset(self):
    self.conv_offset_mask.weight.data.zero_()
    self.conv_offset_mask.bias.data.zero_()

def forward(self, input, main_path=None):
    if self.extra_offset_mask:
        out = self.conv_offset_mask(input[1])
        input = input[0]
    else:
        out = self.conv_offset_mask(input)
    o1, o2, mask = torch.chunk(out, 3, dim=1)       # each has self.deformable_groups * self.kernel_size[0] * self.kernel_size[1] channels
    offset = torch.cat((o1, o2), dim=1)  # x, y [0-8]: the first group,

ShihuaHuang95 · 2022-02-25T06:07:56Z

@KingWangJL Many thanks for your interest in our work. We also find this problem when we train our models with Apex Mixed Precision. However, we still have not found any good solution to this problem now. For now, we just train the model with full precision.

KingWangJL · 2022-02-26T02:32:07Z

Thanks your reply，I directly modified the dCN_v2 source code，At present, the network model training is normal，But I don't think it's a good way，it's only run is OK！ At 2022-02-25 14:08:07, "Shihua Huang" ***@***.***> wrote: @KingWangJL Many thanks for your interest in our work. We also find this problem when we train our models with Apex Mixed Precision. However, we still have not found any good solution to this problem now. For now, we just train the model with full precision. — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: ***@***.***>

LeoniusChen · 2022-05-13T08:07:02Z

I have successfully trained the model using apex.amp and got comparable results. You can add @amp.float_function on top of the forward and backward function of modules in DCNv2. Maybe you can refer to CharlesShang/DCNv2#50

ShihuaHuang95 · 2022-05-13T14:31:42Z

@LeoniusChen Cool! Thanks for your sharing!

ShihuaHuang95 · 2022-05-13T14:33:07Z

@LeoniusChen By the way, could you please share the final results when apex is used?

LeoniusChen · 2022-05-14T04:13:09Z

I only apply apex.amp to test the Cityscapes Semantic Segmentation (PointRend + FaPN R50) task. Here is the result.

ShihuaHuang95 · 2022-05-14T08:46:48Z

Noted. Thanks again for your interest in our work. By the way, compared to the results in our paper, it is not good.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dcn_v2 error RuntimeError: expected scalar type Float but found Half #1

dcn_v2 error RuntimeError: expected scalar type Float but found Half #1

KingWangJL commented Feb 24, 2022

KingWangJL commented Feb 24, 2022

ShihuaHuang95 commented Feb 25, 2022

KingWangJL commented Feb 26, 2022 via email

LeoniusChen commented May 13, 2022

ShihuaHuang95 commented May 13, 2022

ShihuaHuang95 commented May 13, 2022

LeoniusChen commented May 14, 2022 •

edited

ShihuaHuang95 commented May 14, 2022

dcn_v2 error RuntimeError: expected scalar type Float but found Half #1

dcn_v2 error RuntimeError: expected scalar type Float but found Half #1

Comments

KingWangJL commented Feb 24, 2022

KingWangJL commented Feb 24, 2022

ShihuaHuang95 commented Feb 25, 2022

KingWangJL commented Feb 26, 2022 via email

LeoniusChen commented May 13, 2022

ShihuaHuang95 commented May 13, 2022

ShihuaHuang95 commented May 13, 2022

LeoniusChen commented May 14, 2022 • edited

ShihuaHuang95 commented May 14, 2022

LeoniusChen commented May 14, 2022 •

edited