Cuda kernel failed. Error: invalid device function #138

caijinlong · 2014-02-21T09:30:51Z

I have some errors like this when running the code. How to handle those problems?

F0221 16:54:21.855986 11564 im2col.cu:49] Cuda kernel failed. Error: invalid device function
*** Check failure stack trace: ***
@ 0x7f2556cc1b4d google::LogMessage::Fail()
@ 0x7f2556cc5b67 google::LogMessage::SendToLog()
@ 0x7f2556cc39e9 google::LogMessage::Flush()
@ 0x7f2556cc3ced google::LogMessageFatal::~LogMessageFatal()
@ 0x463bf2 caffe::im2col_gpu<>()
@ 0x452031 caffe::ConvolutionLayer<>::Forward_gpu()
@ 0x41288f caffe::Layer<>::Forward()
@ 0x41c9be caffe::ConvolutionLayerTest_TestSimpleConvolution_Test<>::TestBody()
@ 0x43becd testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x42dab1 testing::Test::Run()
@ 0x42db97 testing::TestInfo::Run()
@ 0x42dcd7 testing::TestCase::Run()
@ 0x432bdf testing::internal::UnitTestImpl::RunAllTests()
@ 0x43ba7d testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x42d0da testing::UnitTest::Run()
@ 0x40f774 main
@ 0x318ae1ecdd (unknown)
@ 0x40f4c9 (unknown)
/bin/sh: line 1: 11564 Aborted (core dumped) $testbin 0

Yangqing · 2014-02-21T23:27:42Z

You might not have the GPU correctly set up, since the kernel call is
saying invalid device function.

Yangqing

On Fri, Feb 21, 2014 at 1:30 AM, caijinlong notifications@github.comwrote:

I have some errors like this when running the code. How to handle those
problems?

F0221 16:54:21.855986 11564 im2col.cu:49] Cuda kernel failed. Error:
invalid device function
*** Check failure stack trace: ***
@ 0x7f2556cc1b4d google::LogMessage::Fail()
@ 0x7f2556cc5b67 google::LogMessage::SendToLog()
@ 0x7f2556cc39e9 google::LogMessage::Flush()
@ 0x7f2556cc3ced google::LogMessageFatal::~LogMessageFatal()
@ 0x463bf2 caffe::im2col_gpu<>()
@ 0x452031 caffe::ConvolutionLayer<>::Forward_gpu()
@ 0x41288f caffe::Layer<>::Forward()
@ 0x41c9be
caffe::ConvolutionLayerTest_TestSimpleConvolution_Test<>::TestBody()
@ 0x43becd testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x42dab1 testing::Test::Run()
@ 0x42db97 testing::TestInfo::Run()
@ 0x42dcd7 testing::TestCase::Run()
@ 0x432bdf testing::internal::UnitTestImpl::RunAllTests()
@ 0x43ba7d testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x42d0da testing::UnitTest::Run()
@ 0x40f774 main
@ 0x318ae1ecdd (unknown)
@ 0x40f4c9 (unknown)
/bin/sh: line 1: 11564 Aborted (core dumped) $testbin 0

Reply to this email directly or view it on GitHubhttps://github.com//issues/138
.

caijinlong · 2014-02-22T03:03:25Z

Thanks Yangqing. The problem has been solved. It is GPU's setting.

Jinlong

nickjacob · 2014-03-10T17:01:52Z

@caijinlong would you mind posting what GPU settings were causing the problem?

Or @Yangqing are there any features (e.g., compute mode, persistence mode) that I should be aware of when configuring the GPU?

I'm having the same issue running on a K20; any code that runs a kernel gives an "Invalid Device Function" error.

Thanks!
Nick

shelhamer · 2014-03-10T17:08:17Z

Can you run any CUDA demo, such as the NVIDIA-bundled samples? When in doubt, updating one's CUDA driver is worth a shot.

nickjacob · 2014-03-10T17:19:35Z

I can run the samples included in cuda 5.5, and my driver is at 319.37 which from reading other issues on here seems to be correct? Here's the output of deviceQuery (I'm running a K20 on AWS so I don't get access to fan speed for example)

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GRID K520"
  CUDA Driver Version / Runtime Version          5.5 / 5.5
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 4096 MBytes (4294770688 bytes)
  ( 8) Multiprocessors, (192) CUDA Cores/MP:     1536 CUDA Cores
  GPU Clock rate:                                797 MHz (0.80 GHz)
  Memory Clock rate:                             2500 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           0 / 3
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = GRID K520
Result = PASS

and this is the output of nvidia-smi -a:


==============NVSMI LOG==============

Timestamp                           : Mon Mar 10 08:41:44 2014
Driver Version                      : 319.37

Attached GPUs                       : 1
GPU 0000:00:03.0
    Product Name                    : GRID K520
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 128
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-f1f63fae-f245-3463-b8cd-2446df9fd1f3
    VBIOS Version                   : 80.04.D4.00.04
    Inforom Version
        Image Version               : N/A
        OEM Object                  : N/A
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    PCI
        Bus                         : 0x00
        Device                      : 0x03
        Domain                      : 0x0000
        Device Id                   : 0x118A10DE
        Bus Id                      : 0000:00:03.0
        Sub System Id               : 0x101410DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons         : N/A
    Memory Usage
        Total                       : 4095 MB
        Used                        : 9 MB
        Free                        : 4086 MB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
        Aggregate
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending                     : N/A
    Temperature
        Gpu                         : 27 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 35.32 W
        Power Limit                 : 125.00 W
        Default Power Limit         : 125.00 W
        Enforced Power Limit        : 125.00 W
        Min Power Limit             : 85.00 W
        Max Power Limit             : 130.00 W
    Clocks
        Graphics                    : 797 MHz
        SM                          : 797 MHz
        Memory                      : 2500 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 797 MHz
        SM                          : 797 MHz
        Memory                      : 2500 MHz
    Compute Processes               : None

Thanks so much for the help!

sguada · 2014-03-10T17:25:39Z

Can you try the device_query included in caffe/tools ?
It seems that you are using Grid K520 in AWS, never tried that, so no sure if this would be helpul
http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html

nickjacob · 2014-03-10T17:36:32Z

Thanks - applying information in netflix blog, although I think most of their issues were from direct calls to the nvidia performance primitive library, and caffe for me is getting stuck on custom cuda kernel calls. This is the output of the caffe device_query. Really appreciate the help!

Device id:                     0
Major revision number:         3
Minor revision number:         0
Name:                          GRID K520
Total global memory:           4294770688
Total shared memory per block: 49152
Total registers per block:     65536
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     1024
Maximum dimension of block:    1024, 1024, 64
Maximum dimension of grid:     2147483647, 65535, 65535
Clock rate:                    797000
Total constant memory:         65536
Texture alignment:             512
Concurrent copy and execution: Yes
Number of multiprocessors:     8
Kernel execution timeout:      No

ailzhang · 2014-05-05T21:09:57Z

@caijinlong Hi, could you share some thoughts about GPU setting please? I had exactly the same error. But I can run cuda samples successfully. Have no idea how to solve this. Thank you!

eendebakpt · 2014-06-03T20:39:07Z

On my system (GeForce GTX 750 Ti) I could solve the error by modifying the Makefile.config by changing

CUDA_ARCH := -gencode arch=compute_20,code=sm_20
-gencode arch=compute_20,code=sm_21
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
into

CUDA_ARCH := -gencode arch=compute_20,code=sm_20
-gencode arch=compute_20,code=sm_21
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_50,code=sm_50

zimenglan-sysu · 2014-06-11T09:42:06Z

@eendebakpt, Hi, could you tell me how to compute the capacity of GPU? I don't know how to add '-gencode arch=compute_50,code=sm_50' ?

zimenglan-sysu · 2014-06-11T10:41:20Z

@caijinlong hi, i has some problem below:

Solver scaffolding done.
I0611 18:38:49.181289 26648 solver.cpp:49] Solving XXXNet
F0611 18:38:49.206163 26648 im2col.cu:54] Cuda kernel failed. Error: invalid device function
*** Check failure stack trace: ***
@ 0x7f7a643d8b7d google::LogMessage::Fail()
@ 0x7f7a643dac7f google::LogMessage::SendToLog()
@ 0x7f7a643d876c google::LogMessage::Flush()
@ 0x7f7a643db51d google::LogMessageFatal::~LogMessageFatal()
@ 0x45a59c caffe::im2col_gpu<>()
@ 0x455857 caffe::ConvolutionLayer<>::Forward_gpu()
@ 0x4325aa caffe::Net<>::ForwardPrefilled()
@ 0x425568 caffe::Solver<>::Solve()
@ 0x40e9b5 main
@ 0x7f7a61d5076d (unknown)
@ 0x41018d (unknown)
Aborted (core dumped)
Done.

how to handle this problem?
thanks

ihsanafredi · 2014-07-23T13:36:29Z

Hi, I even changed it to gencode arch=compute_50,code=sm_50 but even than i received this below error, can any body help in this regards?
....
debug: (top_id, top_data_id, blob_id, feat_id)=0,119,0,119
[ FAILED ] PowerLayerTest/1.TestPowerGradientGPU, where TypeParam = double (1737 ms)
[----------] 20 tests from PowerLayerTest/1 (5441 ms total)

[----------] 5 tests from ConcatLayerTest/1, where TypeParam = double
[ RUN ] ConcatLayerTest/1.TestSetupNum
[ OK ] ConcatLayerTest/1.TestSetupNum (0 ms)
[ RUN ] ConcatLayerTest/1.TestGPUGradient
[ OK ] ConcatLayerTest/1.TestGPUGradient (102 ms)
[ RUN ] ConcatLayerTest/1.TestCPUGradient
[ OK ] ConcatLayerTest/1.TestCPUGradient (48 ms)
[ RUN ] ConcatLayerTest/1.TestSetupChannels
[ OK ] ConcatLayerTest/1.TestSetupChannels (0 ms)
[ RUN ] ConcatLayerTest/1.TestCPUNum
[ OK ] ConcatLayerTest/1.TestCPUNum (0 ms)
[----------] 5 tests from ConcatLayerTest/1 (150 ms total)

[----------] 3 tests from PaddingLayerUpgradeTest
[ RUN ] PaddingLayerUpgradeTest.TestSimple
[ OK ] PaddingLayerUpgradeTest.TestSimple (1 ms)
[ RUN ] PaddingLayerUpgradeTest.TestTwoTops
[ OK ] PaddingLayerUpgradeTest.TestTwoTops (1 ms)
[ RUN ] PaddingLayerUpgradeTest.TestImageNet
[ OK ] PaddingLayerUpgradeTest.TestImageNet (1 ms)
[----------] 3 tests from PaddingLayerUpgradeTest (3 ms total)

[----------] 1 test from GaussianFillerTest/0, where TypeParam = float
[ RUN ] GaussianFillerTest/0.TestFill
[ OK ] GaussianFillerTest/0.TestFill (0 ms)
[----------] 1 test from GaussianFillerTest/0 (0 ms total)

[----------] 4 tests from TanHLayerTest/1, where TypeParam = double
[ RUN ] TanHLayerTest/1.TestGradientCPU
[ OK ] TanHLayerTest/1.TestGradientCPU (3 ms)
[ RUN ] TanHLayerTest/1.TestForwardGPU
F0723 15:33:51.379904 10297 tanh_layer.cu:30] Check failed: error == cudaSuccess (8 vs. 0) invalid device function
*** Check failure stack trace: ***
@ 0x2b626d617b7d google::LogMessage::Fail()
@ 0x2b626d619c7f google::LogMessage::SendToLog()
@ 0x2b626d61776c google::LogMessage::Flush()
@ 0x2b626d61a51d google::LogMessageFatal::~LogMessageFatal()
@ 0x64092e caffe::TanHLayer<>::Forward_gpu()
@ 0x48cd82 caffe::TanHLayerTest_TestForwardGPU_Test<>::TestBody()
@ 0x58d25d testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x585081 testing::Test::Run()
@ 0x585166 testing::TestInfo::Run()
@ 0x5852a7 testing::TestCase::Run()
@ 0x5855fe testing::internal::UnitTestImpl::RunAllTests()
@ 0x58cddd testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x5846de testing::UnitTest::Run()
@ 0x4434dd main
@ 0x2b626f98b76d (unknown)
@ 0x4481ad (unknown)
make: *** [runtest] Aborted (core dumped)

lireagan · 2014-12-20T12:04:28Z

@eendebakpt Thank you, your anwser also helps me figure out another problem in Toolkit DeepNet

empty16 · 2015-10-21T03:36:27Z

@caijinlong could you share the solution?
I have changed "Makefile.config" to gencode both
arch=compute_50,code=sm_50
and
arch=compute_50,code=sm_50
arch=compute_50,code=compute_50
but even than i received this below error, can any body help in this regards?
[----------] 9 tests from ConvolutionLayerTest/1, where TypeParam = double
[ RUN ] ConvolutionLayerTest/1.TestGPUGradient
F1021 11:33:59.305110 3138 im2col.cu:54] Check failed: error == cudaSuccess (8 vs. 0) invalid device function
*** Check failure stack trace: ***
@ 0x2b8793543daa (unknown)
@ 0x2b8793543ce4 (unknown)
@ 0x2b87935436e6 (unknown)
@ 0x2b8793546687 (unknown)
@ 0x5f4e90 caffe::im2col_gpu<>()
@ 0x5df0e3 caffe::ConvolutionLayer<>::Forward_gpu()
@ 0x41b110 caffe::Layer<>::Forward()
@ 0x4296ca caffe::GradientChecker<>::CheckGradientExhaustive()
@ 0x476831 caffe::ConvolutionLayerTest_TestGPUGradient_Test<>::TestBody()
@ 0x547a63 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x53e547 testing::Test::Run()
@ 0x53e5ee testing::TestInfo::Run()
@ 0x53e6f5 testing::TestCase::Run()
@ 0x541a38 testing::internal::UnitTestImpl::RunAllTests()
@ 0x541cc7 testing::UnitTest::Run()
@ 0x412ac0 main
@ 0x2b8797095ec5 (unknown)
@ 0x417d57 (unknown)
@ (nil) (unknown)
make: *** [runtest] Aborted (core dumped)
could you help me figure this problem?

empty16 · 2015-10-21T03:38:38Z

@ihsanafredi Have you figured out this problem ?

ihsanafredi · 2015-11-09T14:16:29Z

My GPU was old.

dragontas · 2016-02-17T15:10:45Z

i did a simple:
rm -r ./build
mkdir build
cmake ..
make

The problem was a changed GPU, Sources needed to be rebuild.

hongzhenwang · 2016-07-28T12:29:06Z

I have solved the same problem. This problem occur when the version of cuda doesn't mach the caffe.
Trick lies in the Makefile.config

# For CUDA < 6.0, comment the *_50 lines for compatibility.

CUDA_ARCH := -gencode arch=compute_20,code=sm_20
-gencode arch=compute_20,code=sm_21
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_50,code=compute_50

if your cuda<6.0, then comment the last two lines.

Jumabek · 2016-09-07T00:25:48Z

@dragontas ' solution worked for me as well.

loretoparisi · 2016-10-25T13:09:23Z

I'm running this error with

$ docker run -ti caffe:gpu caffe --version
libdc1394 error: Failed
caffe version 1.0.0-rc3

and

$ nvidia-smi
Tue Oct 25 15:08:35 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 370.28                 Driver Version: 370.28                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:01:00.0      On |                  N/A |
|  0%   48C    P8     7W / 200W |     62MiB /  8105MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 0000:02:00.0     Off |                  N/A |
|  0%   38C    P8     7W / 200W |      1MiB /  8113MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1241    G   /usr/lib/xorg/Xorg                              60MiB |
+-----------------------------------------------------------------------------+

and

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

kevinhuang06 · 2016-11-14T08:35:17Z

@dragontas 's solution worked for me as well!

vamsus · 2017-03-10T02:27:05Z

I am facing a similar error. using latest CUDA version (8.0) with enabled GPU Nvidia Geforce 820M . How to change the CUDA arch.

[ RUN ] TanHLayerTest/2.TestTanH
F0310 07:19:41.605973 3025 tanh_layer.cu:26] Check failed: error == cudaSuccess (8 vs. 0) invalid device function
*** Check failure stack trace: ***
@ 0x7f5cb33b75cd google::LogMessage::Fail()
@ 0x7f5cb33b9433 google::LogMessage::SendToLog()
@ 0x7f5cb33b715b google::LogMessage::Flush()
@ 0x7f5cb33b9e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f5cb162f2aa caffe::TanHLayer<>::Forward_gpu()
@ 0x481379 caffe::Layer<>::Forward()
@ 0x7b1320 caffe::TanHLayerTest<>::TestForward()
@ 0x8e1cb3 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x8db2ca testing::Test::Run()
@ 0x8db418 testing::TestInfo::Run()
@ 0x8db4f5 testing::TestCase::Run()
@ 0x8dc7cf testing::internal::UnitTestImpl::RunAllTests()
@ 0x8dcaf3 testing::UnitTest::Run()
@ 0x46693d main
@ 0x7f5cb0d3b830 __libc_start_main
@ 0x46dfd9 _start
@ (nil) (unknown)
Makefile:532: recipe for target 'runtest' failed
make: *** [runtest] Aborted (core dumped)

balloch · 2017-04-27T22:01:51Z

Has anyone with CUDA 8.0 solved this problem?

lhk · 2017-05-06T07:31:04Z

I'm having problems with cuda 8, too

F0506 09:17:07.199545 19219 parallel.cpp:130] Check failed: error == cudaSuccess (10 vs. 0)  invalid device ordinal
*** Check failure stack trace: ***
    @     0x7f6db75a15cd  google::LogMessage::Fail()
    @     0x7f6db75a3433  google::LogMessage::SendToLog()
    @     0x7f6db75a115b  google::LogMessage::Flush()
    @     0x7f6db75a3e1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f6db7e6d75d  caffe::DevicePair::compute()
    @     0x7f6db7e73480  caffe::P2PSync<>::Prepare()
    @     0x7f6db7e73f8e  caffe::P2PSync<>::Run()
    @           0x40ada0  train()
    @           0x407590  main
    @     0x7f6db6512830  __libc_start_main
    @           0x407db9  _start
    @              (nil)  (unknown)
Aborted (core dumped)

vamsus · 2017-05-06T08:48:47Z

@lhk @balloch I solved CUDA 8.0 installation. By disabling CUDNN support. As Nvidia 820M compute capability is 2.1. To support CUDNN compute capability should be more than 3.0.

(https://developer.nvidia.com/cuda-gpus) u can check your GPU compute capability. Disable it by commenting line in the makefile.

If u face same error then follow this installation guide link. (http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#axzz4ajfl49uf).

balloch · 2017-05-17T21:27:40Z

Thanks Sai!

…

On Sat, May 6, 2017 at 4:49 AM, Sai Varun ***@***.***> wrote: @lhk <https://github.com/lhk> @balloch <https://github.com/balloch> I solved CUDA 8.0 installation. By disabling CUDNN support. As Nvidia 820M compute capability is 2.1. To support CUDNN compute capability should be more than 3.0. (https://developer.nvidia.com/cuda-gpus) u can check your GPU compute capability. Disable it by commenting line in the makefile. If u face same error then follow this installation guide link. ( http://docs.nvidia.com/cuda/cuda-installation-guide-linux/ index.html#axzz4ajfl49uf). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#138 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACbyYtjDxdwCp831pBf6uYlrg1E4fpamks5r3DQOgaJpZM4BjyMO> .

-- Jonathan Balloch B.S. Physics, Mathematics M.S.E. Robotics

I was having problems with Thrust on my Ubuntu box with CUDA 8.0. Then I found BVLC/caffe#138, which contained the solution. Apparently, I just needed to [upgrade the compute capability](BVLC/caffe#138 (comment)).

sharoseali · 2017-12-12T17:06:57Z

I have Nvidia NVS 5200M and OS Windows 10 pro 64 bit and CUDA Version 8.0. When i run darknet\x64 \darknet_web_cam_voc i am having this error :
CUDA Error: invalid device function
CUDA Error: invalid device function: No error .. what is the issue plzz reply

eyildiz-ugoe · 2018-07-04T08:29:07Z

I have the following done already:

# For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.
CUDA_ARCH := 	-gencode arch=compute_30,code=sm_30 \
		-gencode arch=compute_35,code=sm_35 \
		-gencode arch=compute_50,code=sm_50 \
		-gencode arch=compute_52,code=sm_52 \
		-gencode arch=compute_60,code=sm_60 \
		-gencode arch=compute_61,code=sm_61 \
		-gencode arch=compute_61,code=compute_61

And I still have the problem, CUDA 9.0. Couldn't find any solution. Extremely frustrating.

dilipv09 · 2019-11-01T17:16:01Z

Thanks Yangqing. The problem has been solved. It is GPU's setting.

Jinlong

Hi Jinlong..gd day..could you please help what settings you changed in GPU and how?

HasanBank · 2020-04-29T00:51:46Z

I have a similar error. How did you solve it?
Error:
Check failed: error == cudaSuccess (98 vs. 0) invalid device function

yaofahua · 2022-09-13T06:30:22Z

I met similar problem, the error is :
F0907 15:41:09.264920 202420 im2col.cu:61] Check failed: error == cudaSuccess (8 vs. 0) invalid device function

And solve by change --generate-code=arch=compute_20,code=sm_20 to --generate-code=arch=compute_20,code=[compute_20,sm_20] in .\cmake\Cuda.cmake.

  # Tell NVCC to add binaries for the specified GPUs
  foreach(__arch ${__cuda_arch_bin})
    if(__arch MATCHES "([0-9]+)\\(([0-9]+)\\)")
      # User explicitly specified PTX for the concrete BIN
      # list(APPEND __nvcc_flags -gencode arch=compute_${CMAKE_MATCH_2},code=sm_${CMAKE_MATCH_1})
      list(APPEND __nvcc_flags -gencode arch=compute_${CMAKE_MATCH_2},code=[compute_${CMAKE_MATCH_2},sm_${CMAKE_MATCH_1}])
      list(APPEND __nvcc_archs_readable sm_${CMAKE_MATCH_1})
    else()
      # User didn't explicitly specify PTX for the concrete BIN, we assume PTX=BIN
      # list(APPEND __nvcc_flags -gencode arch=compute_${__arch},code=sm_${__arch})
      list(APPEND __nvcc_flags -gencode arch=compute_${__arch},code=[compute_${__arch},sm_${__arch}])
      list(APPEND __nvcc_archs_readable sm_${__arch})
    endif()
  endforeach()

  # Tell NVCC to add PTX intermediate code for the specified architectures
  foreach(__arch ${__cuda_arch_ptx})
    list(APPEND __nvcc_flags -gencode arch=compute_${__arch},code=compute_${__arch})
    list(APPEND __nvcc_archs_readable compute_${__arch})
  endforeach()

For more information see : https://github.com/yaofahua/InvalidDeviceFunction

shelhamer added hardware/portability and removed invalid labels Feb 23, 2014

sergeyk closed this as completed Feb 25, 2014

hanklin3 mentioned this issue Aug 1, 2014

Error: invalid device function niuzhiheng/caffe#25

Closed

kelvinxu mentioned this issue Aug 6, 2014

Invalid Device Function #869

Closed

andersbll mentioned this issue Dec 24, 2014

Exception while trying examples andersbll/deeppy#5

Closed

dutran mentioned this issue Jul 9, 2015

cuda issue during running facebookarchive/C3D#18

Closed

rbgirshick mentioned this issue Oct 8, 2015

Check failed: error == cudaSuccess (8 vs. 0) invalid device function rbgirshick/py-faster-rcnn#2

Closed

helinwang mentioned this issue Mar 22, 2017

CUDA error: invalid device function NVIDIA/nvidia-docker#346

Closed

lhk mentioned this issue May 6, 2017

Cuda 8: invalid device ordinal weiliu89/caffe#568

Open

wangshuaizs mentioned this issue Jun 27, 2019

Cuda failure 'invalid device function' horovod/horovod#1171

Closed

peterljq mentioned this issue Jul 2, 2019

Check failed: error == cudaSuccess (8 vs. 0) invalid device function peterljq/OpenMMD#17

Closed

pdberangin mentioned this issue Oct 25, 2019

Seems like recent update break something RuntimeError: CUDA error: invalid device function googlecolab/colabtools#796

Closed

sbelharbi mentioned this issue Sep 4, 2021

build_hash CUDA kernel failure: invalid device function. pytorch 1.9.0 HapeMask/crfrnn_layer#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda kernel failed. Error: invalid device function #138

Cuda kernel failed. Error: invalid device function #138

caijinlong commented Feb 21, 2014

Yangqing commented Feb 21, 2014

caijinlong commented Feb 22, 2014

nickjacob commented Mar 10, 2014

shelhamer commented Mar 10, 2014

nickjacob commented Mar 10, 2014

sguada commented Mar 10, 2014

nickjacob commented Mar 10, 2014

ailzhang commented May 5, 2014

eendebakpt commented Jun 3, 2014

zimenglan-sysu commented Jun 11, 2014

zimenglan-sysu commented Jun 11, 2014

ihsanafredi commented Jul 23, 2014

lireagan commented Dec 20, 2014

empty16 commented Oct 21, 2015

empty16 commented Oct 21, 2015

ihsanafredi commented Nov 9, 2015

dragontas commented Feb 17, 2016

hongzhenwang commented Jul 28, 2016

Jumabek commented Sep 7, 2016

loretoparisi commented Oct 25, 2016 •

edited

Loading

kevinhuang06 commented Nov 14, 2016

vamsus commented Mar 10, 2017

balloch commented Apr 27, 2017

lhk commented May 6, 2017 •

edited

Loading

vamsus commented May 6, 2017

balloch commented May 17, 2017 via email

sharoseali commented Dec 12, 2017

eyildiz-ugoe commented Jul 4, 2018 •

edited

Loading

dilipv09 commented Nov 1, 2019

HasanBank commented Apr 29, 2020

yaofahua commented Sep 13, 2022

Cuda kernel failed. Error: invalid device function #138

Cuda kernel failed. Error: invalid device function #138

Comments

caijinlong commented Feb 21, 2014

Yangqing commented Feb 21, 2014

caijinlong commented Feb 22, 2014

nickjacob commented Mar 10, 2014

shelhamer commented Mar 10, 2014

nickjacob commented Mar 10, 2014

sguada commented Mar 10, 2014

nickjacob commented Mar 10, 2014

ailzhang commented May 5, 2014

eendebakpt commented Jun 3, 2014

zimenglan-sysu commented Jun 11, 2014

zimenglan-sysu commented Jun 11, 2014

ihsanafredi commented Jul 23, 2014

lireagan commented Dec 20, 2014

empty16 commented Oct 21, 2015

empty16 commented Oct 21, 2015

ihsanafredi commented Nov 9, 2015

dragontas commented Feb 17, 2016

hongzhenwang commented Jul 28, 2016

# For CUDA < 6.0, comment the *_50 lines for compatibility.

Jumabek commented Sep 7, 2016

loretoparisi commented Oct 25, 2016 • edited Loading

kevinhuang06 commented Nov 14, 2016

vamsus commented Mar 10, 2017

balloch commented Apr 27, 2017

lhk commented May 6, 2017 • edited Loading

vamsus commented May 6, 2017

balloch commented May 17, 2017 via email

sharoseali commented Dec 12, 2017

eyildiz-ugoe commented Jul 4, 2018 • edited Loading

dilipv09 commented Nov 1, 2019

HasanBank commented Apr 29, 2020

yaofahua commented Sep 13, 2022

loretoparisi commented Oct 25, 2016 •

edited

Loading

lhk commented May 6, 2017 •

edited

Loading

eyildiz-ugoe commented Jul 4, 2018 •

edited

Loading