Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

everything is good but segmentation fault after program exit #19

Closed
huangenyan opened this issue Aug 28, 2019 · 4 comments
Closed

everything is good but segmentation fault after program exit #19

huangenyan opened this issue Aug 28, 2019 · 4 comments

Comments

@huangenyan
Copy link

I successfully run the demo. When the program exit, I got
[1] 12790 segmentation fault (core dumped) ./mask-rcnn_demo ~/mask_rcnn_coco.dat test.jpg

I think it is caused by some problems during resource releasing or destruction?

Although it is not a big issue, it is a little annoying and hopefully be fixed.

@huangenyan
Copy link
Author

more on this: problem only happens on GPU, if I use CPU (set gpu_count = 0 in config) the problem is gone.

@Kolkir
Copy link
Owner

Kolkir commented Aug 28, 2019

@huangenyan Hello, could please provide more details: the system do you use, the image you used for evaluation, the type of your GPU, the dump was generated , what type of build (with optimization or not) do you used ...
I don't have such problem in my environment.

@huangenyan
Copy link
Author

I just forget to mention what I use is mask_rcnn_pytorch
I spend some time on this and here are some information you may find helpful:

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
$ nvidia-smi
Thu Aug 29 15:51:46 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40       Driver Version: 430.40       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:01:00.0  On |                  N/A |
| 41%   41C    P2    56W / 260W |   6585MiB / 11016MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1223      G   /usr/lib/xorg/Xorg                           188MiB |
|    0      1464      G   /usr/bin/gnome-shell                         113MiB |
|    0     11545      G   ...quest-channel-token=5961631727014844578   235MiB |
|    0     15818      C   /usr/bin/valgrind.bin                       5890MiB |
|    0     31699      G   ...uest-channel-token=15778414260646414614   153MiB |
+-----------------------------------------------------------------------------+

GPU is RTX 2080 TI

$ valgrind -v ./mask-rcnn_demo
...
==15818== 1 errors in context 1 of 1:
==15818== Invalid read of size 4
==15818==    at 0x44BC09FE: ??? (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==15818==    by 0x44BC596A: ??? (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==15818==    by 0x44BDABE1: cudaDeviceSynchronize (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==15818==    by 0x14E26393: cudnnDestroy (in /usr/local/lib/libcaffe2_gpu.so)
==15818==    by 0x109A4CF0: std::unordered_map<int, at::native::(anonymous namespace)::Handle, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, at::native::(anonymous namespace)::Handle> > >::~unordered_map() (in /usr/local/lib/libcaffe2_gpu.so)
==15818==    by 0x447F5614: __cxa_finalize (cxa_finalize.c:83)
==15818==    by 0x107B2FB2: ??? (in /usr/local/lib/libcaffe2_gpu.so)
==15818==    by 0x4010B72: _dl_fini (dl-fini.c:138)
==15818==    by 0x447F5040: __run_exit_handlers (exit.c:108)
==15818==    by 0x447F5139: exit (exit.c:139)
==15818==    by 0x447D3B9D: (below main) (libc-start.c:344)
==15818==  Address 0x18 is not stack'd, malloc'd or (recently) free'd
==15818== 
--15818-- 
--15818-- used_suppression:  98231 zlib-1.2.x trickyness (1b): See http://www.zlib.net/zlib_faq.html#faq36 /usr/lib/valgrind/default.supp:516
==15818== 
==15818== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 98231 from 1)

I also tested which statement cause the segmentation fault, and find problem occurs in fpn.cpp, by adding exit(0) at different location of the program:

std::tuple<torch::Tensor,
           torch::Tensor,
           torch::Tensor,
           torch::Tensor,
           torch::Tensor>
FPNImpl::forward(at::Tensor x) {
// no segmentation fault if adding exit(0) here
  x = c1_->forward(x);
// segmentation fault if adding exit(0) here
  x = c2_->forward(x);
  auto c2_out = x;
  x = c3_->forward(x);
  auto c3_out = x;
  x = c4_->forward(x);
  auto c4_out = x;
  x = c5_->forward(x);
  auto p5_out = p5_conv1_->forward(x);
  auto p4_out =
      p4_conv1_->forward(c4_out) + upsample(p5_out, /*scale_factor*/ 2);
  auto p3_out =
      p3_conv1_->forward(c3_out) + upsample(p4_out, /*scale_factor*/ 2);
  auto p2_out =
      p2_conv1_->forward(c2_out) + upsample(p3_out, /*scale_factor*/ 2);

  p5_out = p5_conv2_->forward(p5_out);
  p4_out = p4_conv2_->forward(p4_out);
  p3_out = p3_conv2_->forward(p3_out);
  p2_out = p2_conv2_->forward(p2_out);

  // P6 is used for the 5th anchor scale in RPN. Generated by subsampling from
  // P5 with stride of 2.
  auto p6_out = p6_->forward(p5_out);

  return {p2_out, p3_out, p4_out, p5_out, p6_out};
}

I'm still working on this and hopefully providing more information.

@huangenyan
Copy link
Author

I create a minimal source file which can reproduce the error:

#include <torch/torch.h>

#include <iostream>
#include <memory>


int main(int argc, char** argv) {

  auto input = torch::ones({1, 3, 1024, 1024});
  input = input.to(torch::DeviceType::CUDA);

  auto c2 = torch::nn::Conv2d(torch::nn::Conv2dOptions(3, 64, 7).stride(2).padding(3));
  c2->to(torch::DeviceType::CUDA);
  c2->forward(input);
  return 0;
}

The example has nothing to do with your code, so I think it is a bug in pytorch c++ frontend and I'll report an issue there.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants