Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcn_v2 error RuntimeError: expected scalar type Float but found Half #1

Open
KingWangJL opened this issue Feb 24, 2022 · 8 comments
Open

Comments

@KingWangJL
Copy link

When running the network, I encountered this problem. Through debugging, I found that the offset in DCN's forword function is a type of float16.So I think this might be the cause of the problem,Do you have a better idea for this problem.

@KingWangJL
Copy link
Author

class DCN(DCNv2):
def init(self, in_channels, out_channels, kernel_size, stride, padding, dilation=1, deformable_groups=1, extra_offset_mask=False,):
super(DCN, self).init(in_channels, out_channels, kernel_size, stride, padding, dilation, deformable_groups)

    self.extra_offset_mask = extra_offset_mask
    channels_ = self.deformable_groups * 3 * self.kernel_size[0] * self.kernel_size[1]
    self.conv_offset_mask = nn.Conv2d(self.in_channels, channels_, kernel_size=self.kernel_size, stride=self.stride, padding=self.padding, bias=True)
    self.init_offset()

def init_offset(self):
    self.conv_offset_mask.weight.data.zero_()
    self.conv_offset_mask.bias.data.zero_()

def forward(self, input, main_path=None):
    if self.extra_offset_mask:
        out = self.conv_offset_mask(input[1])
        input = input[0]
    else:
        out = self.conv_offset_mask(input)
    o1, o2, mask = torch.chunk(out, 3, dim=1)       # each has self.deformable_groups * self.kernel_size[0] * self.kernel_size[1] channels
    offset = torch.cat((o1, o2), dim=1)  # x, y [0-8]: the first group,

@ShihuaHuang95
Copy link
Owner

@KingWangJL Many thanks for your interest in our work. We also find this problem when we train our models with Apex Mixed Precision. However, we still have not found any good solution to this problem now. For now, we just train the model with full precision.

@KingWangJL
Copy link
Author

KingWangJL commented Feb 26, 2022 via email

@LeoniusChen
Copy link

I have successfully trained the model using apex.amp and got comparable results. You can add @amp.float_function on top of the forward and backward function of modules in DCNv2. Maybe you can refer to CharlesShang/DCNv2#50

@ShihuaHuang95
Copy link
Owner

@LeoniusChen Cool! Thanks for your sharing!

@ShihuaHuang95
Copy link
Owner

@LeoniusChen By the way, could you please share the final results when apex is used?

@LeoniusChen
Copy link

LeoniusChen commented May 14, 2022

I only apply apex.amp to test the Cityscapes Semantic Segmentation (PointRend + FaPN R50) task. Here is the result.
a05725ac-a9b5-4286-a296-779abb2369eb

@ShihuaHuang95
Copy link
Owner

Noted. Thanks again for your interest in our work. By the way, compared to the results in our paper, it is not good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants