everything is good but segmentation fault after program exit #19

huangenyan · 2019-08-28T08:44:12Z

I successfully run the demo. When the program exit, I got
[1] 12790 segmentation fault (core dumped) ./mask-rcnn_demo ~/mask_rcnn_coco.dat test.jpg

I think it is caused by some problems during resource releasing or destruction?

Although it is not a big issue, it is a little annoying and hopefully be fixed.

The text was updated successfully, but these errors were encountered:

huangenyan · 2019-08-28T08:51:32Z

more on this: problem only happens on GPU, if I use CPU (set gpu_count = 0 in config) the problem is gone.

Kolkir · 2019-08-28T17:44:37Z

@huangenyan Hello, could please provide more details: the system do you use, the image you used for evaluation, the type of your GPU, the dump was generated , what type of build (with optimization or not) do you used ...
I don't have such problem in my environment.

huangenyan · 2019-08-29T07:59:36Z

I just forget to mention what I use is mask_rcnn_pytorch
I spend some time on this and here are some information you may find helpful:

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

$ nvidia-smi
Thu Aug 29 15:51:46 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40       Driver Version: 430.40       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:01:00.0  On |                  N/A |
| 41%   41C    P2    56W / 260W |   6585MiB / 11016MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1223      G   /usr/lib/xorg/Xorg                           188MiB |
|    0      1464      G   /usr/bin/gnome-shell                         113MiB |
|    0     11545      G   ...quest-channel-token=5961631727014844578   235MiB |
|    0     15818      C   /usr/bin/valgrind.bin                       5890MiB |
|    0     31699      G   ...uest-channel-token=15778414260646414614   153MiB |
+-----------------------------------------------------------------------------+

GPU is RTX 2080 TI

$ valgrind -v ./mask-rcnn_demo
...
==15818== 1 errors in context 1 of 1:
==15818== Invalid read of size 4
==15818==    at 0x44BC09FE: ??? (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==15818==    by 0x44BC596A: ??? (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==15818==    by 0x44BDABE1: cudaDeviceSynchronize (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==15818==    by 0x14E26393: cudnnDestroy (in /usr/local/lib/libcaffe2_gpu.so)
==15818==    by 0x109A4CF0: std::unordered_map<int, at::native::(anonymous namespace)::Handle, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, at::native::(anonymous namespace)::Handle> > >::~unordered_map() (in /usr/local/lib/libcaffe2_gpu.so)
==15818==    by 0x447F5614: __cxa_finalize (cxa_finalize.c:83)
==15818==    by 0x107B2FB2: ??? (in /usr/local/lib/libcaffe2_gpu.so)
==15818==    by 0x4010B72: _dl_fini (dl-fini.c:138)
==15818==    by 0x447F5040: __run_exit_handlers (exit.c:108)
==15818==    by 0x447F5139: exit (exit.c:139)
==15818==    by 0x447D3B9D: (below main) (libc-start.c:344)
==15818==  Address 0x18 is not stack'd, malloc'd or (recently) free'd
==15818== 
--15818-- 
--15818-- used_suppression:  98231 zlib-1.2.x trickyness (1b): See http://www.zlib.net/zlib_faq.html#faq36 /usr/lib/valgrind/default.supp:516
==15818== 
==15818== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 98231 from 1)

I also tested which statement cause the segmentation fault, and find problem occurs in fpn.cpp, by adding exit(0) at different location of the program:

std::tuple<torch::Tensor,
           torch::Tensor,
           torch::Tensor,
           torch::Tensor,
           torch::Tensor>
FPNImpl::forward(at::Tensor x) {
// no segmentation fault if adding exit(0) here
  x = c1_->forward(x);
// segmentation fault if adding exit(0) here
  x = c2_->forward(x);
  auto c2_out = x;
  x = c3_->forward(x);
  auto c3_out = x;
  x = c4_->forward(x);
  auto c4_out = x;
  x = c5_->forward(x);
  auto p5_out = p5_conv1_->forward(x);
  auto p4_out =
      p4_conv1_->forward(c4_out) + upsample(p5_out, /*scale_factor*/ 2);
  auto p3_out =
      p3_conv1_->forward(c3_out) + upsample(p4_out, /*scale_factor*/ 2);
  auto p2_out =
      p2_conv1_->forward(c2_out) + upsample(p3_out, /*scale_factor*/ 2);

  p5_out = p5_conv2_->forward(p5_out);
  p4_out = p4_conv2_->forward(p4_out);
  p3_out = p3_conv2_->forward(p3_out);
  p2_out = p2_conv2_->forward(p2_out);

  // P6 is used for the 5th anchor scale in RPN. Generated by subsampling from
  // P5 with stride of 2.
  auto p6_out = p6_->forward(p5_out);

  return {p2_out, p3_out, p4_out, p5_out, p6_out};
}

I'm still working on this and hopefully providing more information.

huangenyan · 2019-08-29T09:59:35Z

I create a minimal source file which can reproduce the error:

#include <torch/torch.h>

#include <iostream>
#include <memory>


int main(int argc, char** argv) {

  auto input = torch::ones({1, 3, 1024, 1024});
  input = input.to(torch::DeviceType::CUDA);

  auto c2 = torch::nn::Conv2d(torch::nn::Conv2dOptions(3, 64, 7).stride(2).padding(3));
  c2->to(torch::DeviceType::CUDA);
  c2->forward(input);
  return 0;
}

The example has nothing to do with your code, so I think it is a bug in pytorch c++ frontend and I'll report an issue there.

Thanks!

huangenyan closed this as completed Aug 29, 2019

huangenyan mentioned this issue Aug 29, 2019

Plan to migrate to libtorch 1.2.0? #20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

everything is good but segmentation fault after program exit #19

everything is good but segmentation fault after program exit #19

huangenyan commented Aug 28, 2019

huangenyan commented Aug 28, 2019

Kolkir commented Aug 28, 2019

huangenyan commented Aug 29, 2019

huangenyan commented Aug 29, 2019

everything is good but segmentation fault after program exit #19

everything is good but segmentation fault after program exit #19

Comments

huangenyan commented Aug 28, 2019

huangenyan commented Aug 28, 2019

Kolkir commented Aug 28, 2019

huangenyan commented Aug 29, 2019

huangenyan commented Aug 29, 2019