Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[BUGFIX] Add check to make sure num_group is non-zero #20186

Merged
merged 2 commits into from Apr 19, 2021

Conversation

Adnios
Copy link
Contributor

@Adnios Adnios commented Apr 17, 2021

Description

Related issue: #19921

When group=0, the mxnet.gluon.nn.Conv1D(Conv2D, Conv3D) will cause crashes(floating point exception)

11396 Floating point exception(core dumped) python test-mxnet.py

MxNet doesn't seem to have checked whether group=0. This pr adds check for group not equal zero in src/operator/nn/convolution.cc. Now the error message will be like this.

Traceback (most recent call last):
  File "test-mxnet.py", line 14, in <module>
    nn.Conv3D(channels=1, kernel_size=1, groups=0)
  File "/mxnet/python/mxnet/gluon/nn/conv_layers.py", line 401, in __init__
    in_channels, activation, use_bias, weight_initializer, bias_initializer, **kwargs)
  File "/mxnet/python/mxnet/gluon/nn/conv_layers.py", line 115, in __init__
    wshapes = _infer_weight_shape(op_name, dshape, self._kwargs)
  File "/mxnet/python/mxnet/gluon/nn/conv_layers.py", line 38, in _infer_weight_shape
    return sym.infer_shape_partial()[0]
  File "/mxnetpython/mxnet/symbol/symbol.py", line 1068, in infer_shape_partial
    return self._infer_shape_impl(True, *args, **kwargs)
  File "/mxnet/python/mxnet/symbol/symbol.py", line 1126, in _infer_shape_impl
    ctypes.byref(complete)))
  File "/PARA/blsc365/wangjian/mxnet-1.4.1-sleep/python/mxnet/base.py", line 252, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator conv1_convolution0: [17:48:51] src/operator/nn/convolution.cc:215: Check failed: param_.num_group != 0U (0 vs. 0) num_group must be nonzero

We can use the following code to test.

import mxnet
mxnet.gluon.nn.Conv1D(channels=1, kernel_size=1, groups=0)
import mxnet
mxnet.gluon.nn.Conv2D(channels=1, kernel_size=1, groups=0)
import mxnet
mxnet.gluon.nn.Conv3D(channels=1, kernel_size=1, groups=0)

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@mxnet-bot
Copy link

Hey @Adnios , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [clang, edge, unix-gpu, website, centos-gpu, centos-cpu, windows-gpu, unix-cpu, windows-cpu, miscellaneous, sanity]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@Adnios Adnios changed the title [BUGFIX] Add check for group not equal zero [BUGFIX] Add check whether num_group equal to zero Apr 17, 2021
@Adnios Adnios changed the title [BUGFIX] Add check whether num_group equal to zero [BUGFIX] Add check to make sure num_group is non-zero Apr 18, 2021
@Adnios
Copy link
Contributor Author

Adnios commented Apr 18, 2021

@mxnet-bot run ci [all]

@Adnios
Copy link
Contributor Author

Adnios commented Apr 18, 2021

@mxnet-bot run ci [windows-cpu]

1 similar comment
@szha
Copy link
Member

szha commented Apr 19, 2021

@mxnet-bot run ci [windows-cpu]

Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix

src/operator/nn/convolution.cc Outdated Show resolved Hide resolved
Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Adnios
Copy link
Contributor Author

Adnios commented Apr 19, 2021

@mxnet-bot run ci [windows-gpu]

@szha szha merged commit 5da68f7 into apache:master Apr 19, 2021
@szha
Copy link
Member

szha commented Apr 19, 2021

@Adnios thank you for the fix

@Adnios Adnios deleted the group_ne_zero branch April 21, 2021 01:49
chinakook pushed a commit to chinakook/mxnet that referenced this pull request Aug 4, 2021
* add check for group not equal zero

* num_group in convolution must be positive
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants