Vulkan raises segmentation fault #2354

Ca0L · 2020-11-26T07:01:00Z

Bug

This code raised a segmentation fault.

#include "net.h"
#include <iostream>
#include <string>

int main(int argc, char* argv[])
{
    ncnn::Net model;
    model.opt.use_vulkan_compute = true;
    model.set_vulkan_device(1);
    std::cout << "OK" << std::endl;
    return 0;
}

This is the makefile I used, test-softmax.cpp contains the code above.

repo_home=/data1/home/cailinchao/repos/ncnn/
build_dir=$(repo_home)/build
glslang_build=$(build_dir)/glslang

main: test-softmax.cpp
	g++ test-softmax.cpp -g -o main -lncnnd -lvulkan -lSPIRV -lglslang -lOGLCompiler -lOSDependent -fopenmp -lgomp -I$(repo_home)/src -I$(build_dir)/src -L$(build_dir)/src -L$(glslang_build)/glslang -L$(glslang_build)/OGLCompilersDLL -L$(glslang_build)/SPIRV -L$(glslang_build)/OSDependent/Unix -L$(glslang_build)/glslang/OSDependent/Unix/

.PHONY: clean
clean:
	rm main

This is the output.

[0 GeForce RTX 2080 Ti]  queueC=2[8]  queueG=0[16]  queueT=1[2]
[0 GeForce RTX 2080 Ti]  bugsbn1=0  bugcopc=0  bugihfa=0
[0 GeForce RTX 2080 Ti]  fp16p=1  fp16s=1  fp16a=1  int8s=1  int8a=1
[0 GeForce RTX 2080 Ti]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
[1 GeForce RTX 2080 Ti]  queueC=2[8]  queueG=0[16]  queueT=1[2]
[1 GeForce RTX 2080 Ti]  bugsbn1=0  bugcopc=0  bugihfa=0
[1 GeForce RTX 2080 Ti]  fp16p=1  fp16s=1  fp16a=1  int8s=1  int8a=1
[1 GeForce RTX 2080 Ti]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
[2 GeForce GTX 1080 Ti]  queueC=2[8]  queueG=0[16]  queueT=1[2]
[2 GeForce GTX 1080 Ti]  bugsbn1=0  bugcopc=0  bugihfa=0
[2 GeForce GTX 1080 Ti]  fp16p=1  fp16s=1  fp16a=0  int8s=1  int8a=1
[2 GeForce GTX 1080 Ti]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
[3 GeForce RTX 2080 Ti]  queueC=2[8]  queueG=0[16]  queueT=1[2]
[3 GeForce RTX 2080 Ti]  bugsbn1=0  bugcopc=0  bugihfa=0
[3 GeForce RTX 2080 Ti]  fp16p=1  fp16s=1  fp16a=1  int8s=1  int8a=1
[3 GeForce RTX 2080 Ti]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
[4 GeForce RTX 2080 Ti]  queueC=2[8]  queueG=0[16]  queueT=1[2]
[4 GeForce RTX 2080 Ti]  bugsbn1=0  bugcopc=0  bugihfa=0
[4 GeForce RTX 2080 Ti]  fp16p=1  fp16s=1  fp16a=1  int8s=1  int8a=1
[4 GeForce RTX 2080 Ti]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
[5 GeForce GTX 1080 Ti]  queueC=2[8]  queueG=0[16]  queueT=1[2]
[5 GeForce GTX 1080 Ti]  bugsbn1=0  bugcopc=0  bugihfa=0
[5 GeForce GTX 1080 Ti]  fp16p=1  fp16s=1  fp16a=0  int8s=1  int8a=1
[5 GeForce GTX 1080 Ti]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
OK
Segmentation fault (core dumped)

This is the backtrace:

Reading symbols from ./main...done.
[New LWP 10147]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./main'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __GI___pthread_mutex_lock (mutex=0x8) at ../nptl/pthread_mutex_lock.c:65
65      ../nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) bt
#0  __GI___pthread_mutex_lock (mutex=0x8) at ../nptl/pthread_mutex_lock.c:65
#1  0x00007effdf454ae5 in ?? () from /usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0
#2  0x00007effe23406fb in eglReleaseThread () from /usr/lib/x86_64-linux-gnu/libEGL.so.1
#3  0x00007effe5562337 in ?? () from /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#4  0x00007effe5561149 in ?? () from /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#5  0x00007effe6dbfcf0 in _dl_close_worker (map=map@entry=0x5626a3378480, force=force@entry=false)
    at dl-close.c:293
#6  0x00007effe6dc0afa in _dl_close_worker (force=false, map=0x5626a3378480) at dl-close.c:125
#7  _dl_close (_map=0x5626a3378480) at dl-close.c:842
#8  0x00007effe5b4451f in __GI__dl_catch_exception (exception=exception@entry=0x7ffddeef81a0, 
    operate=operate@entry=0x7effe57da070 <dlclose_doit>, args=args@entry=0x5626a3378480)
    at dl-error-skeleton.c:196
#9  0x00007effe5b445af in __GI__dl_catch_error (objname=objname@entry=0x5626a3289b50, 
    errstring=errstring@entry=0x5626a3289b58, mallocedp=mallocedp@entry=0x5626a3289b48, 
    operate=operate@entry=0x7effe57da070 <dlclose_doit>, args=args@entry=0x5626a3378480)
    at dl-error-skeleton.c:215
#10 0x00007effe57da745 in _dlerror_run (operate=operate@entry=0x7effe57da070 <dlclose_doit>, 
    args=0x5626a3378480) at dlerror.c:162
#11 0x00007effe57da0b3 in __dlclose (handle=<optimized out>) at dlclose.c:46
#12 0x00007effe6b7aa28 in ?? () from /usr/lib/x86_64-linux-gnu/libvulkan.so.1
#13 0x00007effe6b84d3f in vkDestroyInstance () from /usr/lib/x86_64-linux-gnu/libvulkan.so.1
#14 0x00005626a1ae16cb in ncnn::destroy_gpu_instance ()
    at /data1/home/cailinchao/repos/ncnn/src/gpu.cpp:1025
#15 0x00005626a1aebc0d in ncnn::__ncnn_vulkan_instance_holder::~__ncnn_vulkan_instance_holder (
    this=0x5626a23c6768 <ncnn::g_instance>, __in_chrg=<optimized out>)
    at /data1/home/cailinchao/repos/ncnn/src/gpu.cpp:50
#16 0x00007effe5a200f1 in __run_exit_handlers (status=0, listp=0x7effe5dc8718 <__exit_funcs>, 
    run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#17 0x00007effe5a201ea in __GI_exit (status=<optimized out>) at exit.c:139
#18 0x00007effe59feb9e in __libc_start_main (main=0x5626a1aad64a <main(int, char**)>, argc=1, 
    argv=0x7ffddeef8428, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7ffddeef8418) at ../csu/libc-start.c:344
#19 0x00005626a1aad56a in _start ()

Environment

OS: Ubuntu 18.04.4 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 10.0.0 
CMake version: version 3.18.2

GPU models and configuration: 
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
GPU 2: GeForce GTX 1080 Ti

Nvidia driver version: 450.80.02

ncnn version: commit 60df2740a7a1eb57a2817aca3385c3153d7a5445
ncnn build command: cmake -DCMAKE_BUILD_TYPE=Debug -DNCNN_VULKAN=ON -DNCNN_BUILD_EXAMPLES=ON .. && make -j$(nproc)

The text was updated successfully, but these errors were encountered:

Eleanor456 · 2021-04-16T07:19:19Z

also got the same error, how did you solve the problem

Ca0L · 2021-04-16T07:24:21Z

also got the same error, how did you solve the problem

Sorry, I haven't solve it yet.

kulicuu · 2021-09-20T14:24:43Z

I got a segfault, I think/thought it's malformed vertices.

UNASSIGNED-khronos-validation-createinstance-status-message(INFO / SPEC): msgNum: -671457468 - Validation Information: [ UNASSIGNED-khronos-validation-createins
tance-status-message ] Object 0: handle = 0x15cce443550, type = VK_OBJECT_TYPE_INSTANCE; | MessageID = 0xd7fa5f44 | Khronos Validation Layer Active:
    Settings File: Found at C:\Users\wylie\AppData\Local\LunarG\vkconfig\override\vk_layer_settings.txt specified by VkConfig application override.
    Current Enables: None.
    Current Disables: VK_VALIDATION_FEATURE_DISABLE_THREAD_SAFETY_EXT.

    Objects: 1
        [0] 0x15cce443550, type: 1, name: NULL
INFO:
GENERAL [Loader Message (0)] : Inserted device layer VK_LAYER_KHRONOS_validation (C:\VulkanSDK\1.2.182.0\Bin\\.\VkLayer_khronos_validation.dll)

INFO:
GENERAL [Loader Message (0)] : Inserted device layer VK_LAYER_OBS_HOOK (C:\ProgramData\obs-studio-hook\.\graphics-hook64.dll)

INFO:
GENERAL [Loader Message (0)] : Inserted device layer VK_LAYER_NV_optimus (C:\WINDOWS\System32\DriverStore\FileRepository\nvltwi.inf_amd64_62c6fe9661e469e3\.\nvo
glv64.dll)

error: process didn't exit successfully: `target\debug\peregrine.exe` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)
Segmentation fault

JujuDel · 2021-12-03T11:11:04Z

If this topic is still open, here is how to fix it

Solution

#include "net.h"
#include <iostream>
#include <string>

int main(int argc, char* argv[])
{
    ncnn::Net model;
    model.opt.use_vulkan_compute = true;
    model.set_vulkan_device(1);
    ncnn::destroy_gpu_instance(); // <--- Add this
    std::cout << "OK" << std::endl;
    return 0;
}

Short explanation

While this hasn't been fixed, I believe that you should explicitly call destroy_gpu_instance() if the code calls create_gpu_instance() at some point (which is done with set_vulkan_device in your case)

Theoretically, it's already done inside of ~__ncnn_vulkan_instance_holder() (see gpu.h and gpu.cpp) while deleting the static __ncnn_vulkan_instance_holder g_instance;

Commit used

I've done that being on the tag 20210720

lblbk · 2022-10-27T09:46:30Z

@JujuDel hello, after searching for the issue, I found that this solution is to solve this problem , and has no effect on the current problem, do you have any other solution?
thx.

nihui · 2023-12-19T09:35:44Z

#5234

nihui · 2023-12-20T03:33:58Z

ded0b78

lblbk mentioned this issue Oct 27, 2022

rvm ncnn推理问题 DefTruth/lite.ai.toolkit#371

Closed

nihui closed this as completed Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan raises segmentation fault #2354

Vulkan raises segmentation fault #2354

Ca0L commented Nov 26, 2020

Eleanor456 commented Apr 16, 2021

Ca0L commented Apr 16, 2021

kulicuu commented Sep 20, 2021

JujuDel commented Dec 3, 2021

lblbk commented Oct 27, 2022

nihui commented Dec 19, 2023

nihui commented Dec 20, 2023

Vulkan raises segmentation fault #2354

Vulkan raises segmentation fault #2354

Comments

Ca0L commented Nov 26, 2020

Bug

Environment

Eleanor456 commented Apr 16, 2021

Ca0L commented Apr 16, 2021

kulicuu commented Sep 20, 2021

JujuDel commented Dec 3, 2021

Solution

Short explanation

Commit used

lblbk commented Oct 27, 2022

nihui commented Dec 19, 2023

nihui commented Dec 20, 2023