Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Invalid Device #0 #24

Closed
AuroraRAS opened this issue Jan 16, 2023 · 12 comments
Closed

RuntimeError: Invalid Device #0 #24

AuroraRAS opened this issue Jan 16, 2023 · 12 comments

Comments

@AuroraRAS
Copy link

RuntimeError: Invalid Device in there, I don't know why.

python mnist.py --device=ocl:0

Using device: ocl:0
Traceback (most recent call last):
  File "/home/ml/Projects/pytorch_dlprim/mnist.py", line 162, in <module>
    main()
  File "/home/ml/Projects/pytorch_dlprim/mnist.py", line 148, in main
    model = Net().to(device)
  File "/home/ml/.conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1132, in to
    return self._apply(convert)
  File "/home/ml/.conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 784, in _apply
    module._apply(fn)
  File "/home/ml/.conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 807, in _apply
    param_applied = fn(param)
  File "/home/ml/.conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: Invalid Device #0

clinfo

Number of platforms                               2
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 22.3.3
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   rusticl
  Platform Vendor                                 Mesa/X.org
  Platform Version                                OpenCL 3.0 
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             MESA
  Platform Host timer resolution                  0ns

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD Radeon R9 200 Series (pitcairn, LLVM 15.0.6, DRM 3.49, 6.1.5-200.fc37.x86_64)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 22.3.3
  Device Numeric Version                          0x401000 (1.1.0)
  Driver Version                                  22.3.3
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Max compute units                               20
  Max clock frequency                             1050MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple (kernel)     64
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 0        (n/a)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              2147483648 (2GiB)
  Error Correction support                        No
  Max memory allocation                           536870912 (512MiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       32768 bits (4096 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max number of constant args                     16
  Max constant buffer size                        67108864 (64MiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    ILs with version                              SPIR-V                                                           0x400000 (1.0.0)
  Built-in kernels with version                   (n/a)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_extended_versioning
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_extended_versioning                                       0x400000 (1.0.0)

  Platform Name                                   rusticl
Number of devices                                 0

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [MESA]
  clCreateContext(NULL, ...) [default]            Success [MESA]
  clCreateContext(NULL, ...) [other]              
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD Radeon R9 200 Series (pitcairn, LLVM 15.0.6, DRM 3.49, 6.1.5-200.fc37.x86_64)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD Radeon R9 200 Series (pitcairn, LLVM 15.0.6, DRM 3.49, 6.1.5-200.fc37.x86_64)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD Radeon R9 200 Series (pitcairn, LLVM 15.0.6, DRM 3.49, 6.1.5-200.fc37.x86_64)

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.1
  ICD loader Profile                              OpenCL 3.0

conda list

# packages in environment at ~/.conda/envs/pytorch:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
blas                      1.0                         mkl  
brotlipy                  0.7.0           py310h7f8727e_1002  
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2022.10.11           h06a4308_0  
certifi                   2022.12.7       py310h06a4308_0  
cffi                      1.15.1          py310h5eee18b_3  
charset-normalizer        2.0.4              pyhd3eb1b0_0  
cpuonly                   2.0                           0    pytorch-nightly
cryptography              38.0.4          py310h9ce1e76_0  
ffmpeg                    4.2.2                h20bf706_0  
flit-core                 3.6.0              pyhd3eb1b0_0  
freetype                  2.12.1               h4a9f257_0  
giflib                    5.2.1                h7b6447c_0  
gmp                       6.2.1                h295c915_3  
gmpy2                     2.1.2           py310heeb90bb_0  
gnutls                    3.6.15               he1e5248_0  
idna                      3.4             py310h06a4308_0  
intel-openmp              2021.4.0          h06a4308_3561  
jpeg                      9e                   h7f8727e_0  
lame                      3.100                h7b6447c_0  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.38                 h1181459_1  
lerc                      3.0                  h295c915_0  
libdeflate                1.8                  h7f8727e_5  
libffi                    3.4.2                h6a678d5_6  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libidn2                   2.3.2                h7f8727e_0  
libopus                   1.3.1                h7b6447c_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              11.2.0               h1234567_1  
libtasn1                  4.16.0               h27cfd23_0  
libtiff                   4.5.0                hecacb30_0  
libunistring              0.9.10               h27cfd23_0  
libuuid                   1.41.5               h5eee18b_0  
libvpx                    1.7.0                h439df22_0  
libwebp                   1.2.4                h11a3e52_0  
libwebp-base              1.2.4                h5eee18b_0  
lz4-c                     1.9.4                h6a678d5_0  
mkl                       2021.4.0           h06a4308_640  
mkl-service               2.4.0           py310h7f8727e_0  
mkl_fft                   1.3.1           py310hd6ae3a3_0  
mkl_random                1.2.2           py310h00e6091_0  
mpc                       1.1.0                h10f8cd9_1  
mpfr                      4.0.2                hb69a4c5_1  
mpmath                    1.2.1                    pypi_0    pypi
ncurses                   6.3                  h5eee18b_3  
nettle                    3.7.3                hbbd107a_1  
numpy                     1.23.5          py310hd5efca6_0  
numpy-base                1.23.5          py310h8e6c178_0  
openh264                  2.1.1                h4ff587b_0  
openssl                   1.1.1s               h7f8727e_0  
pillow                    9.3.0           py310hace64e9_1  
pip                       22.3.1          py310h06a4308_0  
pycparser                 2.21               pyhd3eb1b0_0  
pyopenssl                 22.0.0             pyhd3eb1b0_0  
pysocks                   1.7.1           py310h06a4308_0  
python                    3.10.8               h7a1cb2a_1  
pytorch                   2.0.0.dev20230116    py3.10_cpu_0    pytorch-nightly
pytorch-mutex             1.0                         cpu    pytorch
readline                  8.2                  h5eee18b_0  
requests                  2.28.1          py310h06a4308_0  
setuptools                65.6.3          py310h06a4308_0  
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.40.1               h5082296_0  
sympy                     1.11.1          py310h06a4308_0  
tk                        8.6.12               h1ccaba5_0  
torchaudio                2.0.0.dev20230116       py310_cpu    pytorch-nightly
torchvision               0.15.0.dev20230116       py310_cpu    pytorch-nightly
typing_extensions         4.4.0           py310h06a4308_0  
tzdata                    2022g                h04d1e81_0  
urllib3                   1.26.13         py310h06a4308_0  
wheel                     0.37.1             pyhd3eb1b0_0  
x264                      1!157.20191217       h7b6447c_0  
xz                        5.2.8                h5eee18b_0  
zlib                      1.2.13               h5eee18b_0  
zstd                      1.5.2                ha4553b6_0  
@artyom-beilis
Copy link
Owner

Can you build dlprimitives and run some basic tests - I want to see if there an issue with driver or something else

run some basic test from there https://github.com/artyom-beilis/dlprimitives/blob/master/docs/build.md#benchmarking to see if it runs at all.

@AuroraRAS
Copy link
Author

Can you build dlprimitives and run some basic tests - I want to see if there an issue with driver or something else

run some basic test from there https://github.com/artyom-beilis/dlprimitives/blob/master/docs/build.md#benchmarking to see if it runs at all.

cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo
make
make test

Running tests...
Test project /home/ml/Projects/pytorch_dlprim/dlprimitives/build
      Start  1: test_test_case_abs
 1/33 Test  #1: test_test_case_abs ...............   Passed    1.04 sec
      Start  2: test_test_case_activation
 2/33 Test  #2: test_test_case_activation ........   Passed    3.85 sec
      Start  3: test_test_case_batchnorm
 3/33 Test  #3: test_test_case_batchnorm .........   Passed   13.00 sec
      Start  4: test_test_case_concat
 4/33 Test  #4: test_test_case_concat ............   Passed    0.28 sec
      Start  5: test_test_case_conv2d
 5/33 Test  #5: test_test_case_conv2d ............   Passed  303.66 sec
      Start  6: test_test_case_conv2d_dsc
 6/33 Test  #6: test_test_case_conv2d_dsc ........   Passed  325.08 sec
      Start  7: test_test_case_conv2d_gemm
 7/33 Test  #7: test_test_case_conv2d_gemm .......   Passed  342.08 sec
      Start  8: test_test_case_conv2d_win
 8/33 Test  #8: test_test_case_conv2d_win ........***Failed   35.25 sec
      Start  9: test_test_case_elementwise
 9/33 Test  #9: test_test_case_elementwise .......   Passed   28.48 sec
      Start 10: test_test_case_global_pooling
10/33 Test #10: test_test_case_global_pooling ....   Passed   13.58 sec
      Start 11: test_test_case_hardtanh
11/33 Test #11: test_test_case_hardtanh ..........   Passed    1.49 sec
      Start 12: test_test_case_inner_product
12/33 Test #12: test_test_case_inner_product .....   Passed   51.32 sec
      Start 13: test_test_case_log_softmax
13/33 Test #13: test_test_case_log_softmax .......   Passed    1.20 sec
      Start 14: test_test_case_mse_loss
14/33 Test #14: test_test_case_mse_loss ..........   Passed    1.39 sec
      Start 15: test_test_case_nll_loss
15/33 Test #15: test_test_case_nll_loss ..........   Passed    1.01 sec
      Start 16: test_test_case_param
16/33 Test #16: test_test_case_param .............   Passed    0.46 sec
      Start 17: test_test_case_pooling2d
17/33 Test #17: test_test_case_pooling2d .........   Passed  155.98 sec
      Start 18: test_test_case_reduction
18/33 Test #18: test_test_case_reduction .........   Passed   57.60 sec
      Start 19: test_test_case_slice
19/33 Test #19: test_test_case_slice .............   Passed    0.46 sec
      Start 20: test_test_case_softmax
20/33 Test #20: test_test_case_softmax ...........   Passed    1.17 sec
      Start 21: test_test_case_softmax_loss
21/33 Test #21: test_test_case_softmax_loss ......   Passed    1.21 sec
      Start 22: test_test_case_threshold
22/33 Test #22: test_test_case_threshold .........   Passed    1.40 sec
      Start 23: test_test_case_tr_conv2d
23/33 Test #23: test_test_case_tr_conv2d .........   Passed  167.49 sec
      Start 24: test_test_case_tr_conv2d_dsc
24/33 Test #24: test_test_case_tr_conv2d_dsc .....   Passed  180.49 sec
      Start 25: test_test_case_tr_conv2d_gemm
25/33 Test #25: test_test_case_tr_conv2d_gemm ....   Passed  186.64 sec
      Start 26: test_test_case_tr_conv2d_win
26/33 Test #26: test_test_case_tr_conv2d_win .....   Passed  154.16 sec
      Start 27: test_net
27/33 Test #27: test_net .........................   Passed    8.95 sec
      Start 28: test_net_nonopt
28/33 Test #28: test_net_nonopt ..................   Passed    8.94 sec
      Start 29: test_json
29/33 Test #29: test_json ........................   Passed    0.01 sec
      Start 30: test_random
30/33 Test #30: test_random ......................   Passed    0.97 sec
      Start 31: test_context
31/33 Test #31: test_context .....................   Passed    0.29 sec
      Start 32: test_util
32/33 Test #32: test_util ........................   Passed   29.67 sec
      Start 33: test_broadcast_reduce
33/33 Test #33: test_broadcast_reduce ............   Passed   33.51 sec

97% tests passed, 1 tests failed out of 33

Total Test time (real) = 2112.12 sec

The following tests FAILED:
	  8 - test_test_case_conv2d_win (Failed)
Errors while running CTest
Output from these tests are in: /home/ml/Projects/pytorch_dlprim/dlprimitives/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
make: *** [Makefile:71: test] Error 8

@AuroraRAS
Copy link
Author

8/33 Test for LastTestsFailed.log

8/33 Testing: test_test_case_conv2d_win
8/33 Test: test_test_case_conv2d_win
Command: "~/Projects/pytorch_dlprim/dlprimitives/build/test_from_template" "0:0" "~/Projects/pytorch_dlprim/dlprimitives/tests/test_case_conv2d_win.json"
Directory: ~/Projects/pytorch_dlprim/dlprimitives/build
"test_test_case_conv2d_win" start time: Jan 18 00:42 CST
Output:
----------------------------------------------------------
Running tests for operator Convolution2D on AMD Radeon R9 200 Series (pitcairn, LLVM 15.0.6, DRM 3.49, 6.1.6-200.fc37.x86_64) on Clover
- Running test case 0 for options {"activation":"relu","bias":true,"bwd_data_algo":"winograd","bwd_filter_algo":"winograd","channels_in":98,"channels_out":128,"dilate":1,"fwd_algo":"winograd","groups":1,"kernel":5,"pad":2,"stride":1}
-- test for shape [[1,98,7,7]] fwd,bwd
-- test for shape [[1,98,8,8]] fwd,bwd
-- test for shape [[1,98,4,4]] fwd,bwd
- Running test case 1 for options {"bias":false,"bwd_data_algo":"winograd","bwd_filter_algo":"winograd","channels_in":1,"channels_out":1,"dilate":1,"fwd_algo":"winograd","groups":1,"kernel":3,"pad":1,"stride":1}
-- test for shape [[1,1,2,2]] fwd,bwd
-- test for shape [[1,1,7,7]] fwd,bwd
-- test for shape [[1,1,8,8]] fwd,bwd
-- test for shape [[1,1,4,4]] fwd,bwd
-- test for shape [[3,1,4,4]] fwd,bwd
-- test for shape [[2,1,7,7]] fwd,bwd
-- test for shape [[2,1,10,5]] fwd,bwd
-- test for shape [[2,1,10,10]] fwd,bwd
-- test for shape [[2,1,19,19]] fwd,bwd
-- test for shape [[2,1,20,20]] fwd,bwd
-- test for shape [[2,1,32,32]] fwd,bwd
-- test for shape [[64,1,64,64]] fwd,bwd
-- test for shape [[53,1,100,100]] fwd,bwd
- Running test case 2 for options {"bias":false,"bwd_data_algo":"winograd","bwd_filter_algo":"winograd","channels_in":2,"channels_out":1,"dilate":1,"fwd_algo":"winograd","groups":1,"kernel":3,"pad":1,"stride":1}
-- test for shape [[1,2,2,2]] fwd,bwd
-- test for shape [[1,2,7,7]] fwd,bwd
-- test for shape [[1,2,8,8]] fwd,bwd
-- test for shape [[1,2,4,4]] fwd,bwd
-- test for shape [[3,2,4,4]] fwd,bwd
-- test for shape [[2,2,7,7]] fwd,bwd
-- test for shape [[2,2,10,5]] fwd,bwd
-- test for shape [[2,2,10,10]] fwd,bwd
-- test for shape [[2,2,19,19]] fwd,bwd
-- test for shape [[2,2,20,20]] fwd,bwd
-- test for shape [[2,2,32,32]] fwd,bwd
-- test for shape [[64,2,64,64]] fwd,bwd
-- test for shape [[53,2,100,100]] fwd,bwd
- Running test case 3 for options {"bias":false,"bwd_data_algo":"winograd","bwd_filter_algo":"winograd","channels_in":1,"channels_out":2,"dilate":1,"fwd_algo":"winograd","groups":1,"kernel":3,"pad":1,"stride":1}
-- test for shape [[1,1,2,2]] fwd,bwd
-- test for shape [[1,1,7,7]] fwd,bwd
-- test for shape [[1,1,8,8]] fwd,bwd
-- test for shape [[1,1,4,4]] fwd,bwd
-- test for shape [[3,1,4,4]] fwd,bwd
-- test for shape [[2,1,7,7]] fwd,bwd
-- test for shape [[2,1,10,5]] fwd,bwd
-- test for shape [[2,1,10,10]] fwd,bwd
-- test for shape [[2,1,19,19]] fwd,bwd
-- test for shape [[2,1,20,20]] fwd,bwd
-- test for shape [[2,1,32,32]] fwd,bwd
-- test for shape [[64,1,64,64]] fwd,bwd
-- test for shape [[53,1,100,100]] fwd,bwd
- Running test case 4 for options {"bias":false,"bwd_data_algo":"winograd","bwd_filter_algo":"winograd","channels_in":3,"channels_out":8,"dilate":1,"fwd_algo":"winograd","groups":1,"kernel":3,"pad":1,"stride":1}
-- test for shape [[1,3,2,2]] fwd,bwd
-- test for shape [[1,3,7,7]] fwd,bwd
-- test for shape [[1,3,8,8]] fwd,bwd
-- test for shape [[1,3,4,4]] fwd,bwd
-- test for shape [[3,3,4,4]] fwd,bwd
-- test for shape [[2,3,7,7]] fwd,bwd
-- test for shape [[2,3,10,5]] fwd,bwd
-- test for shape [[2,3,10,10]] fwd,bwd
-- test for shape [[2,3,19,19]] fwd,bwd
-- test for shape [[2,3,20,20]] fwd,bwd
-- test for shape [[2,3,32,32]] fwd,bwd
-- test for shape [[64,3,64,64]] fwd,bwd
-- test for shape [[53,3,100,100]] fwd,bwd
- Running test case 5 for options {"bias":false,"bwd_data_algo":"winograd","bwd_filter_algo":"winograd","channels_in":128,"channels_out":64,"dilate":1,"fwd_algo":"winograd","groups":1,"kernel":3,"pad":1,"stride":1}
-- test for shape [[1,128,2,2]] fwd,bwd
-- test for shape [[1,128,7,7]] fwd,bwd
-- test for shape [[1,128,8,8]] fwd,bwd
-- test for shape [[1,128,4,4]] fwd,bwd
-- test for shape [[3,128,4,4]] fwd,bwd
-- test for shape [[2,128,7,7]] fwd,bwd
-- test for shape [[2,128,10,5]] fwd,bwd
-- test for shape [[2,128,10,10]] fwd,bwd
-- test for shape [[2,128,19,19]] fwd,bwd
-- test for shape [[2,128,20,20]] fwd,bwd
-- test for shape [[2,128,32,32]] fwd,bwd
-- test for shape [[64,128,64,64]] fwd,bwd
-- test for shape [[53,128,100,100]] fwd,bwd
- Running test case 6 for options {"activation":"relu","bias":true,"bwd_data_algo":"winograd","bwd_filter_algo":"winograd","channels_in":3,"channels_out":8,"dilate":1,"fwd_algo":"winograd","groups":1,"kernel":3,"pad":1,"stride":1}
-- test for shape [[1,3,2,2]] fwd,bwd
-- test for shape [[1,3,7,7]] fwd,bwd
-- test for shape [[1,3,8,8]] fwd,bwd
-- test for shape [[1,3,4,4]] fwd,bwd
-- test for shape [[3,3,4,4]] fwd,bwd
-- test for shape [[2,3,7,7]] fwd,bwd
-- test for shape [[2,3,10,5]] fwd,bwd
-- test for shape [[2,3,10,10]] fwd,bwd
-- test for shape [[2,3,19,19]] fwd,bwd
-- test for shape [[2,3,20,20]] fwd,bwd
-- test for shape [[2,3,32,32]] fwd,bwd
-- test for shape [[64,3,64,64]] fwd,bwd
-- test for shape [[53,3,100,100]] fwd,bwd
- Running test case 7 for options {"activation":"relu","bias":true,"bwd_data_algo":"winograd","bwd_filter_algo":"winograd","channels_in":3,"channels_out":8,"dilate":1,"fwd_algo":"winograd","groups":1,"kernel":3,"pad":1,"stride":1}
-- test for shape [[1,3,2,2]] fwd,bwd
-- test for shape [[1,3,7,7]] fwd,bwd
-- test for shape [[1,3,8,8]] fwd,bwd
-- test for shape [[1,3,4,4]] fwd,bwd
-- test for shape [[3,3,4,4]] fwd,bwd
-- test for shape [[2,3,7,7]] fwd,bwd
-- test for shape [[2,3,10,5]] fwd,bwd
-- test for shape [[2,3,10,10]] fwd,bwd
-- test for shape [[2,3,19,19]] fwd,bwd
-- test for shape [[2,3,20,20]] fwd,bwd
-- test for shape [[2,3,32,32]] fwd,bwd
-- test for shape [[64,3,64,64]] fwd,bwd
-- test for shape [[53,3,100,100]] fwd,bwdComparison failed for tensor data:0 at 95569 expecting -0.378173=-0.378173*1 got -0.253815 for esp=1e-05
Comparison failed for tensor data:0 at 95570 expecting -0.273047=-0.273047*1 got -0.366986 for esp=1e-05
Comparison failed for tensor data:0 at 95571 expecting 0.65315=0.65315*1 got 0.766756 for esp=1e-05
Comparison failed for tensor data:0 at 95669 expecting -0.393208=-0.393208*1 got -0.219796 for esp=1e-05
Comparison failed for tensor data:0 at 95670 expecting -0.223676=-0.223676*1 got -0.373241 for esp=1e-05
Comparison failed for tensor data:0 at 95671 expecting 0.318421=0.318421*1 got 0.451013 for esp=1e-05
Comparison failed for tensor data:0 at 95769 expecting -0.571868=-0.571868*1 got -0.484194 for esp=1e-05
Comparison failed for tensor data:0 at 95770 expecting 0.243095=0.243095*1 got 0.184242 for esp=1e-05
Comparison failed for tensor data:0 at 95771 expecting -0.897846=-0.897846*1 got -0.803248 for esp=1e-05
Comparison failed for tensor data:0 at 105569 expecting -0.419033=-0.419033*1 got -0.250495 for esp=1e-05


FAILED: Computations Failed
<end of output>
Test time =  35.25 sec
----------------------------------------------------------
Test Failed.
"test_test_case_conv2d_win" end time: Jan 18 00:43 CST
"test_test_case_conv2d_win" time elapsed: 00:00:35
----------------------------------------------------------

@AuroraRAS
Copy link
Author

AuroraRAS commented Jan 17, 2023

./dlprim_benchmark 0:0 ../docs/nets_for_benchmark/resnet18-b16.js

Using: AMD Radeon R9 200 Series (pitcairn, LLVM 15.0.6, DRM 3.49, 6.1.6-200.fc37.x86_64) on Clover
Inputs
- data: (16,3,224,224)
Outputs
- loss: (16,1000)
Step -5    549.446
Step -4    172.131
Step -3    173.069
Step -2    173.027
Step -1    172.077
Step  0    172.158
Step  1    172.775
Step  2    173.315
Step  3    171.667
Step  4    172.834
Step  5    172.461
Step  6    172.799
Step  7    172.202
Step  8    172.687
Step  9    172.011
Step 10    171.831
Step 11    171.646
Step 12    172.743
Step 13    171.402
Step 14    171.953
Step 15    173.605
Step 16    172.521
Step 17    172.836
Step 18    172.724
Step 19    171.633
Time per sample: 10.774 ms
TOT time per batch:  172.390 ms

./dlprim_benchmark -b 0:0 ../docs/nets_for_benchmark/resnet18-b16.js

Using: AMD Radeon R9 200 Series (pitcairn, LLVM 15.0.6, DRM 3.49, 6.1.6-200.fc37.x86_64) on Clover
Inputs
- data: (16,3,224,224)
Outputs
- loss: (16,1000)
Step -5    944.391   562.550   381.841
Step -4    568.292   186.458   381.834
Step -3    569.105   185.943   383.161
Step -2    567.985   185.340   382.645
Step -1    569.030   186.229   382.801
Step  0    567.343   185.491   381.852
Step  1    568.028   185.983   382.045
Step  2    566.410   185.105   381.304
Step  3    568.072   185.484   382.588
Step  4    567.650   184.827   382.823
Step  5    568.139   185.818   382.321
Step  6    568.514   186.141   382.374
Step  7    567.538   185.382   382.156
Step  8    569.284   185.928   383.355
Step  9    568.451   186.001   382.450
Step 10    568.275   185.067   383.208
Step 11    566.267   184.486   381.781
Step 12    568.327   185.258   383.069
Step 13    568.942   186.307   382.635
Step 14    568.047   186.173   381.874
Step 15    566.231   184.442   381.788
Step 16    566.303   185.724   380.578
Step 17    565.987   185.685   380.302
Step 18    567.147   186.259   380.887
Step 19    565.409   184.231   381.178
Time per sample: 35.470 ms
FWD time per batch:  185.490 ms
BWD time per batch:  382.028 ms
TOT time per batch:  567.518 ms

@artyom-beilis
Copy link
Owner

Looks like a nice result for an old GPU.

No it seems that from dlprimitives side it works more or less ok. There is some accuracy failure with Wingord that can be investigated further but in general seems to be working.

pytorch 2.0.0.dev20230116

I see you use 2.0.0. Can you check against 1.13? I just want to make sure it isn't related to latest development changes.

@AuroraRAS
Copy link
Author

Looks like a nice result for an old GPU.

No it seems that from dlprimitives side it works more or less ok. There is some accuracy failure with Wingord that can be investigated further but in general seems to be working.

pytorch 2.0.0.dev20230116

I see you use 2.0.0. Can you check against 1.13? I just want to make sure it isn't related to latest development changes.

I just switch pytorch to ver 1.13.1 and rebuild it, but RuntimeError: Invalid Device #0 still there.
python mnist.py --device ocl:0

Using device: privateuseone:0
Traceback (most recent call last):
  File "/home/ml/Projects/pytorch_dlprim/mnist.py", line 162, in <module>
    main()
  File "/home/ml/Projects/pytorch_dlprim/mnist.py", line 148, in main
    model = Net().to(device)
  File "/home/ml/.conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 989, in to
    return self._apply(convert)
  File "/home/ml/.conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/home/ml/.conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 664, in _apply
    param_applied = fn(param)
  File "/home/ml/.conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 987, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: Invalid Device #0

little change, first line output was Using device: ocl:0, it's Using device: privateuseone:0 now.

@artyom-beilis
Copy link
Owner

Sorry for delay can you please add to CLTensor.h few debug lines to see why device/platform wasn't loaded:

diff --git a/src/CLTensor.h b/src/CLTensor.h
index 8e01bc5..90e5dc3 100644
--- a/src/CLTensor.h
+++ b/src/CLTensor.h
@@ -159,7 +159,8 @@ namespace ptdlprim {
             try {
                 cl::Platform::get(&platforms);
             }
-            catch(cl::Error &) {
+            catch(cl::Error &e) {
+                fprintf(stderr,"Failed to get platforms list %s\n",e.what());
                 return;
             }
             for(size_t i=0;i<platforms.size();i++) {
@@ -167,13 +168,15 @@ namespace ptdlprim {
                 try{
                     platforms[i].getDevices(CL_DEVICE_TYPE_ALL, &devices);
                 }
-                catch(cl::Error &)
+                catch(cl::Error &e)
                 {
+                    fprintf(stderr,"Failed to get device list for platform %d: %s\n",int(i),e.what());
                     continue;
                 }
                 for(size_t j=0;j<devices.size();j++) {
                     std::unique_ptr<DevData> d(new DevData());
                     data_.push_back(std::move(d));
+                    fprintf(stderr,"Found platform/device %d:%d\n",int(i),int(j));
                     data_.back()->name = std::to_string(i) + ":" + std::to_string(j);
                 }
             }

@AuroraRAS
Copy link
Author

Sorry for delay can you please add to CLTensor.h few debug lines to see why device/platform wasn't loaded:

diff --git a/src/CLTensor.h b/src/CLTensor.h
index 8e01bc5..90e5dc3 100644
--- a/src/CLTensor.h
+++ b/src/CLTensor.h
@@ -159,7 +159,8 @@ namespace ptdlprim {
             try {
                 cl::Platform::get(&platforms);
             }
-            catch(cl::Error &) {
+            catch(cl::Error &e) {
+                fprintf(stderr,"Failed to get platforms list %s\n",e.what());
                 return;
             }
             for(size_t i=0;i<platforms.size();i++) {
@@ -167,13 +168,15 @@ namespace ptdlprim {
                 try{
                     platforms[i].getDevices(CL_DEVICE_TYPE_ALL, &devices);
                 }
-                catch(cl::Error &)
+                catch(cl::Error &e)
                 {
+                    fprintf(stderr,"Failed to get device list for platform %d: %s\n",int(i),e.what());
                     continue;
                 }
                 for(size_t j=0;j<devices.size();j++) {
                     std::unique_ptr<DevData> d(new DevData());
                     data_.push_back(std::move(d));
+                    fprintf(stderr,"Found platform/device %d:%d\n",int(i),int(j));
                     data_.back()->name = std::to_string(i) + ":" + std::to_string(j);
                 }
             }

That's OK. we have some new output here
python mnist.py --device=ocl:0

Using device: privateuseone:0
Failed to get platforms list clGetPlatformIDs
Traceback (most recent call last):
  File "/home/ml/Projects/pytorch_dlprim/mnist.py", line 162, in <module>
    main()
  File "/home/ml/Projects/pytorch_dlprim/mnist.py", line 148, in main
    model = Net().to(device)
  File "/home/ml/.conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 989, in to
    return self._apply(convert)
  File "/home/ml/.conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/home/ml/.conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 664, in _apply
    param_applied = fn(param)
  File "/home/ml/.conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 987, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: Invalid Device #0

@artyom-beilis
Copy link
Owner

Ok from what I see that the query of platforms list fails...

Looks like kind of installation of OpenCL drivers/runtime issue.

Can you check what libOpenCL.so the build/libpt_ocl.so is linked to and which is linked to clinfo? Check if there are several libOpenCL.so libraries exist.

Also look into /etc/OpenCL/vendors directory to see which platforms are installed.

@AuroraRAS
Copy link
Author

Ok from what I see that the query of platforms list fails...

Looks like kind of installation of OpenCL drivers/runtime issue.

Can you check what libOpenCL.so the build/libpt_ocl.so is linked to and which is linked to clinfo? Check if there are several libOpenCL.so libraries exist.

Also look into /etc/OpenCL/vendors directory to see which platforms are installed.

strace clinfo 2>&1 |grep libOpenCL.so
openat(AT_FDCWD, "/lib64/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = 3

strace python mnist.py --device=ocl:0 2>&1 |grep libOpenCL.so
openat(AT_FDCWD, "/lib64/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = 4

ls -l -a /etc/OpenCL/vendors

total 8
drwxr-xr-x. 2 root root 41 Jan 17 00:11 .
drwxr-xr-x. 3 root root 21 Jul 22  2022 ..
-rw-r--r--. 1 root root 19 Jan 12 04:39 mesa.icd
-rw-r--r--. 1 root root 22 Jan 12 04:39 rusticl.icd

@AuroraRAS
Copy link
Author

I just built a shared library with cl::Platform::get(&platforms) and loaded it with CDLL in python. It works fine in system Python, but not in conda Python.

I think the error is caused by conda environment configuration, not related to pytorch_dlprim.

@AuroraRAS
Copy link
Author

AuroraRAS commented Mar 7, 2023

conda install -c conda-forge ocl-icd-system
conda install -c conda-forge pyopencl

It works fine now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants