Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan raises segmentation fault #2354

Closed
Ca0L opened this issue Nov 26, 2020 · 7 comments
Closed

Vulkan raises segmentation fault #2354

Ca0L opened this issue Nov 26, 2020 · 7 comments

Comments

@Ca0L
Copy link
Contributor

Ca0L commented Nov 26, 2020

Bug

This code raised a segmentation fault.

#include "net.h"
#include <iostream>
#include <string>

int main(int argc, char* argv[])
{
    ncnn::Net model;
    model.opt.use_vulkan_compute = true;
    model.set_vulkan_device(1);
    std::cout << "OK" << std::endl;
    return 0;
}

This is the makefile I used, test-softmax.cpp contains the code above.

repo_home=/data1/home/cailinchao/repos/ncnn/
build_dir=$(repo_home)/build
glslang_build=$(build_dir)/glslang

main: test-softmax.cpp
	g++ test-softmax.cpp -g -o main -lncnnd -lvulkan -lSPIRV -lglslang -lOGLCompiler -lOSDependent -fopenmp -lgomp -I$(repo_home)/src -I$(build_dir)/src -L$(build_dir)/src -L$(glslang_build)/glslang -L$(glslang_build)/OGLCompilersDLL -L$(glslang_build)/SPIRV -L$(glslang_build)/OSDependent/Unix -L$(glslang_build)/glslang/OSDependent/Unix/

.PHONY: clean
clean:
	rm main

This is the output.

[0 GeForce RTX 2080 Ti]  queueC=2[8]  queueG=0[16]  queueT=1[2]
[0 GeForce RTX 2080 Ti]  bugsbn1=0  bugcopc=0  bugihfa=0
[0 GeForce RTX 2080 Ti]  fp16p=1  fp16s=1  fp16a=1  int8s=1  int8a=1
[0 GeForce RTX 2080 Ti]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
[1 GeForce RTX 2080 Ti]  queueC=2[8]  queueG=0[16]  queueT=1[2]
[1 GeForce RTX 2080 Ti]  bugsbn1=0  bugcopc=0  bugihfa=0
[1 GeForce RTX 2080 Ti]  fp16p=1  fp16s=1  fp16a=1  int8s=1  int8a=1
[1 GeForce RTX 2080 Ti]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
[2 GeForce GTX 1080 Ti]  queueC=2[8]  queueG=0[16]  queueT=1[2]
[2 GeForce GTX 1080 Ti]  bugsbn1=0  bugcopc=0  bugihfa=0
[2 GeForce GTX 1080 Ti]  fp16p=1  fp16s=1  fp16a=0  int8s=1  int8a=1
[2 GeForce GTX 1080 Ti]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
[3 GeForce RTX 2080 Ti]  queueC=2[8]  queueG=0[16]  queueT=1[2]
[3 GeForce RTX 2080 Ti]  bugsbn1=0  bugcopc=0  bugihfa=0
[3 GeForce RTX 2080 Ti]  fp16p=1  fp16s=1  fp16a=1  int8s=1  int8a=1
[3 GeForce RTX 2080 Ti]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
[4 GeForce RTX 2080 Ti]  queueC=2[8]  queueG=0[16]  queueT=1[2]
[4 GeForce RTX 2080 Ti]  bugsbn1=0  bugcopc=0  bugihfa=0
[4 GeForce RTX 2080 Ti]  fp16p=1  fp16s=1  fp16a=1  int8s=1  int8a=1
[4 GeForce RTX 2080 Ti]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
[5 GeForce GTX 1080 Ti]  queueC=2[8]  queueG=0[16]  queueT=1[2]
[5 GeForce GTX 1080 Ti]  bugsbn1=0  bugcopc=0  bugihfa=0
[5 GeForce GTX 1080 Ti]  fp16p=1  fp16s=1  fp16a=0  int8s=1  int8a=1
[5 GeForce GTX 1080 Ti]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
OK
Segmentation fault (core dumped)

This is the backtrace:

Reading symbols from ./main...done.
[New LWP 10147]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./main'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __GI___pthread_mutex_lock (mutex=0x8) at ../nptl/pthread_mutex_lock.c:65
65      ../nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) bt
#0  __GI___pthread_mutex_lock (mutex=0x8) at ../nptl/pthread_mutex_lock.c:65
#1  0x00007effdf454ae5 in ?? () from /usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0
#2  0x00007effe23406fb in eglReleaseThread () from /usr/lib/x86_64-linux-gnu/libEGL.so.1
#3  0x00007effe5562337 in ?? () from /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#4  0x00007effe5561149 in ?? () from /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#5  0x00007effe6dbfcf0 in _dl_close_worker (map=map@entry=0x5626a3378480, force=force@entry=false)
    at dl-close.c:293
#6  0x00007effe6dc0afa in _dl_close_worker (force=false, map=0x5626a3378480) at dl-close.c:125
#7  _dl_close (_map=0x5626a3378480) at dl-close.c:842
#8  0x00007effe5b4451f in __GI__dl_catch_exception (exception=exception@entry=0x7ffddeef81a0, 
    operate=operate@entry=0x7effe57da070 <dlclose_doit>, args=args@entry=0x5626a3378480)
    at dl-error-skeleton.c:196
#9  0x00007effe5b445af in __GI__dl_catch_error (objname=objname@entry=0x5626a3289b50, 
    errstring=errstring@entry=0x5626a3289b58, mallocedp=mallocedp@entry=0x5626a3289b48, 
    operate=operate@entry=0x7effe57da070 <dlclose_doit>, args=args@entry=0x5626a3378480)
    at dl-error-skeleton.c:215
#10 0x00007effe57da745 in _dlerror_run (operate=operate@entry=0x7effe57da070 <dlclose_doit>, 
    args=0x5626a3378480) at dlerror.c:162
#11 0x00007effe57da0b3 in __dlclose (handle=<optimized out>) at dlclose.c:46
#12 0x00007effe6b7aa28 in ?? () from /usr/lib/x86_64-linux-gnu/libvulkan.so.1
#13 0x00007effe6b84d3f in vkDestroyInstance () from /usr/lib/x86_64-linux-gnu/libvulkan.so.1
#14 0x00005626a1ae16cb in ncnn::destroy_gpu_instance ()
    at /data1/home/cailinchao/repos/ncnn/src/gpu.cpp:1025
#15 0x00005626a1aebc0d in ncnn::__ncnn_vulkan_instance_holder::~__ncnn_vulkan_instance_holder (
    this=0x5626a23c6768 <ncnn::g_instance>, __in_chrg=<optimized out>)
    at /data1/home/cailinchao/repos/ncnn/src/gpu.cpp:50
#16 0x00007effe5a200f1 in __run_exit_handlers (status=0, listp=0x7effe5dc8718 <__exit_funcs>, 
    run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#17 0x00007effe5a201ea in __GI_exit (status=<optimized out>) at exit.c:139
#18 0x00007effe59feb9e in __libc_start_main (main=0x5626a1aad64a <main(int, char**)>, argc=1, 
    argv=0x7ffddeef8428, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7ffddeef8418) at ../csu/libc-start.c:344
#19 0x00005626a1aad56a in _start ()

Environment

OS: Ubuntu 18.04.4 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 10.0.0 
CMake version: version 3.18.2

GPU models and configuration: 
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
GPU 2: GeForce GTX 1080 Ti

Nvidia driver version: 450.80.02

ncnn version: commit 60df2740a7a1eb57a2817aca3385c3153d7a5445
ncnn build command: cmake -DCMAKE_BUILD_TYPE=Debug -DNCNN_VULKAN=ON -DNCNN_BUILD_EXAMPLES=ON .. && make -j$(nproc)
@Eleanor456
Copy link

also got the same error, how did you solve the problem

@Ca0L
Copy link
Contributor Author

Ca0L commented Apr 16, 2021

also got the same error, how did you solve the problem

Sorry, I haven't solve it yet.

@kulicuu
Copy link

kulicuu commented Sep 20, 2021

I got a segfault, I think/thought it's malformed vertices.

UNASSIGNED-khronos-validation-createinstance-status-message(INFO / SPEC): msgNum: -671457468 - Validation Information: [ UNASSIGNED-khronos-validation-createins
tance-status-message ] Object 0: handle = 0x15cce443550, type = VK_OBJECT_TYPE_INSTANCE; | MessageID = 0xd7fa5f44 | Khronos Validation Layer Active:
    Settings File: Found at C:\Users\wylie\AppData\Local\LunarG\vkconfig\override\vk_layer_settings.txt specified by VkConfig application override.
    Current Enables: None.
    Current Disables: VK_VALIDATION_FEATURE_DISABLE_THREAD_SAFETY_EXT.

    Objects: 1
        [0] 0x15cce443550, type: 1, name: NULL
INFO:
GENERAL [Loader Message (0)] : Inserted device layer VK_LAYER_KHRONOS_validation (C:\VulkanSDK\1.2.182.0\Bin\\.\VkLayer_khronos_validation.dll)

INFO:
GENERAL [Loader Message (0)] : Inserted device layer VK_LAYER_OBS_HOOK (C:\ProgramData\obs-studio-hook\.\graphics-hook64.dll)

INFO:
GENERAL [Loader Message (0)] : Inserted device layer VK_LAYER_NV_optimus (C:\WINDOWS\System32\DriverStore\FileRepository\nvltwi.inf_amd64_62c6fe9661e469e3\.\nvo
glv64.dll)

error: process didn't exit successfully: `target\debug\peregrine.exe` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)
Segmentation fault

@JujuDel
Copy link

JujuDel commented Dec 3, 2021

If this topic is still open, here is how to fix it

Solution

#include "net.h"
#include <iostream>
#include <string>

int main(int argc, char* argv[])
{
    ncnn::Net model;
    model.opt.use_vulkan_compute = true;
    model.set_vulkan_device(1);
    ncnn::destroy_gpu_instance(); // <--- Add this
    std::cout << "OK" << std::endl;
    return 0;
}

Short explanation

While this hasn't been fixed, I believe that you should explicitly call destroy_gpu_instance() if the code calls create_gpu_instance() at some point (which is done with set_vulkan_device in your case)

Theoretically, it's already done inside of ~__ncnn_vulkan_instance_holder() (see gpu.h and gpu.cpp) while deleting the static __ncnn_vulkan_instance_holder g_instance;

Commit used

I've done that being on the tag 20210720

@lblbk
Copy link

lblbk commented Oct 27, 2022

@JujuDel hello, after searching for the issue, I found that this solution is to solve this problem , and has no effect on the current problem, do you have any other solution?
thx.

@nihui
Copy link
Member

nihui commented Dec 19, 2023

#5234

@nihui
Copy link
Member

nihui commented Dec 20, 2023

ded0b78

@nihui nihui closed this as completed Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants