Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with OpenCL backend with cpu on macos #88

Closed
gsobala opened this issue Jun 16, 2018 · 8 comments
Closed

Crash with OpenCL backend with cpu on macos #88

gsobala opened this issue Jun 16, 2018 · 8 comments
Labels
bug Something isn't working

Comments

@gsobala
Copy link
Contributor

gsobala commented Jun 16, 2018

Trying to use opencl backend with cpu rather than gpu causes a crash after lc0 tries to tune the cpu:

Georges-iMac-Pro-2:pgn george$ DYLD_LIBRARY_PATH=/opt/intel/mkl/lib/ lc0 --backend=opencl "--backend-opts=gpu=0" -t 8
       _
|   _ | |
|_ |_ |_| built Jun 15 2018
isready
Found network file: ./338.txt
Creating backend [opencl]...
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 (Mar 15 2018 15:35:11)
Platform profile: FULL_PROFILE
Platform name:    Apple
Platform vendor:  Apple
Device ID:     0
Device name:   Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
Device type:   CPU
Device vendor: Intel
Device driver: 1.1
Device speed:  3200 MHz
Device cores:  16 CU
Device score:  512
Device ID:     1
Device name:   AMD Radeon Pro Vega 56 Compute Engine
Device type:   GPU
Device vendor: AMD
Device driver: 1.2 (May  8 2018 15:49:10)
Device speed:  1250 MHz
Device cores:  56 CU
Device score:  1112
Selected platform: Apple
Selected device: Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
Will try 578 valid configurations.
Failed to find a working configuration.
Check your OpenCL drivers.
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Tuner failed to find working configuration.
Abort trap: 6

The early versions of lczero did not attempt to tune when cpu was the detected device. Later versions did and crashed in the same way.

@frpays
Copy link
Contributor

frpays commented Jun 19, 2018

Apparently it crashes because, all the kernel variants are rejected because of an invalid group size (to be confirmed).
Another information: I was able to start the openCL backend on device 0 (8 virtual cores) on Windows-10. The Device Driver was OpenCL 2.0.

@frpays frpays added bug Something isn't working lc0 labels Jun 25, 2018
@kostya
Copy link

kostya commented Aug 27, 2018

i have the same bug, on my MacBook
./build/release/lc0 -w 11157.pb

       _
|   _ | |
|_ |_ |_| v0.17.0-rc2 built Aug 27 2018
go nodes 1
Loading weights file from: 11157.pb
Creating backend [opencl]...
OpenCL, maximum batch size set to 16.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 (May 24 2018 20:07:03)
Platform profile: FULL_PROFILE
Platform name:    Apple
Platform vendor:  Apple
Device ID:      0
Device name:    Intel(R) Core(TM) i5 CPU       M 540  @ 2.53GHz
Device type:    CPU
Device vendor:  Intel
Device driver:  1.1
Device speed:   2530 MHZ
Device cores:   4 CU
Device score:   512
Device ID:      1
Device name:    GeForce GT 330M
Device type:    GPU
Device vendor:  NVIDIA
Device driver:  10.4.14 310.90.30.05b27
Device speed:   1100 MHZ
Device cores:   6 CU
Device score:   1112
Selected platform: Apple
Selected device: GeForce GT 330M
with OpenCL 1.2 capability.
Started OpenCL SGEMM tuner with batch size 256.
Will try 578 valid configurations.
Failed to find a working configuration.
Check your OpenCL drivers.
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Tuner failed to find working configuration.
Abort trap: 6

@oscardssmith
Copy link
Contributor

Is this still an issue?

@KishanBagaria
Copy link

KishanBagaria commented Dec 30, 2018

I get this too on my Mac Pro and MacBook Pro. Tried both the release/0.19 and master branch.

This seems similar to leela-zero/leela-zero#1632 which was fixed in leela-zero/leela-zero#1633

Lc0 client version 19
2018/12/30 11:14:25 lc0_main.go:666: serverParams: [--visits=800 --cpuct=2.5 --resign-percentage=4.0 --resign-playthrough=20 --temperature=1.2 --temp-endgame=0.45 --temp-cutoff-move=16 --temp-visit-offset=-0.25 --fpu-strategy=absolute]
Args: [/Users/kishan/Downloads/lc0/build/release/lc0 selfplay --backend-opts=backend=opencl,gpu=0 --visits=800 --cpuct=2.5 --resign-percentage=4.0 --resign-playthrough=20 --temperature=1.2 --temp-endgame=0.45 --temp-cutoff-move=16 --temp-visit-offset=-0.25 --fpu-strategy=absolute --training=true --weights=networks/af2027d8113e3a3f5ba50b5756386a7001bbb59be7e8f91f881a07190f4a438d]
       _
|   _ | |
|_ |_ |_| v0.21.0-dev built Dec 30 2018
id name Lc0 v0.21.0-dev
id author The LCZero Authors.
Loading weights file from: networks/af2027d8113e3a3f5ba50b5756386a7001bbb59be7e8f91f881a07190f4a438d
Creating backend [multiplexing]...
Creating backend [opencl]...
OpenCL, maximum batch size set to 16.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 (Oct 11 2018 21:04:03)
Platform profile: FULL_PROFILE
Platform name:    Apple
Platform vendor:  Apple
Device ID:      0
Device name:    Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
Device type:    CPU
Device vendor:  Intel
Device driver:  1.1
Device speed:   3500 MHZ
Device cores:   12 CU
Device score:   512
Device ID:      1
Device name:    AMD Radeon HD - FirePro D500 Compute Engine
Device type:    GPU
Device vendor:  AMD
Device driver:  1.2 (Oct 16 2018 21:18:14)
Device speed:   150 MHZ
Device cores:   24 CU
Device score:   1112
Device ID:      2
Device name:    AMD Radeon HD - FirePro D500 Compute Engine
Device type:    GPU
Device vendor:  AMD
Device driver:  1.2 (Oct 16 2018 21:18:14)
Device speed:   150 MHZ
Device cores:   24 CU
Device score:   1112
Selected platform: Apple
Selected device: Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
with OpenCL 1.2 capability.
Started OpenCL SGEMM tuner with batch size 16.
Will try 578 valid configurations.
Failed to find a working configuration.
Check your OpenCL drivers.
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Tuner failed to find working configuration.
2018/12/30 11:14:29 lc0_main.go:515: BestMove channel closed unexpectedly, exiting train loop
2018/12/30 11:14:29 lc0_main.go:515: BestMove channel closed unexpectedly, exiting train loop
2018/12/30 11:14:29 lc0_main.go:515: BestMove channel closed unexpectedly, exiting train loop
2018/12/30 11:14:29 lc0_main.go:515: BestMove channel closed unexpectedly, exiting train loop
2018/12/30 11:14:29 lc0_main.go:526: GameInfo channel closed, exiting train loop
2018/12/30 11:14:29 lc0_main.go:543: Waiting for lc0 to stop
lc0 exited with: signal: abort trap2018/12/30 11:14:29 lc0_main.go:548: lc0 stopped
2018/12/30 11:14:29 lc0_main.go:550: Waiting for uploads to complete
2018/12/30 11:14:29 lc0_main.go:818: Client self-exited without producing any games.
2018/12/30 11:14:29 lc0_main.go:819: Sleeping for 30 seconds...

@frpays
Copy link
Contributor

frpays commented Jan 8, 2019

I don't think this is an issue.

The OpenCL backend cannot use any declared opencl device. There are some minimal requirements. Basically, if the tunning step cannot find any working configuration, this is probably because the device has too low capabilities (small local work size generally).

Note that just skipping the tuning step on cpu as suggested simply won't work. We obviously need a working configuration to compile the kernels.

Finally, trying to make the OpenCL work on cpu is probably not very productive as the BLAS backend will certainly get much more nps out of the device.

@mooskagh mooskagh removed the lc0 label Jan 21, 2019
@twoplan
Copy link

twoplan commented May 8, 2019

Is it the same issue here?

maxs-Air:Lc0 max$ ./lc0-021.2 -w weights_run1_42272.pb.gz 
       _
|   _ | |
|_ |_ |_| v0.21.2-rc1 built May  7 2019
go nodes 100
Loading weights file from: weights_run1_42272.pb.gz
Creating backend [opencl]...
OpenCL, imum batch size set to 16.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 (May 24 2018 22:33:53)
Platform profile: FULL_PROFILE
Platform name:    Apple
Platform vendor:  Apple
Device ID:      0
Device name:    Intel(R) Core(TM) i5-5350U CPU @ 1.80GHz
Device type:    CPU
Device vendor:  Intel
Device driver:  1.1
Device speed:   1800 MHZ
Device cores:   4 CU
Device score:   512
Device ID:      1
Device name:    Intel(R) Iris(TM) Graphics 6100
Device type:    GPU
Device vendor:  Intel Inc.
Device driver:  1.2(Feb 27 2019 02:17:35)
Device speed:   1000 MHZ
Device cores:   48 CU
Device score:   612
Selected platform: Apple
Selected device: Intel(R) Iris(TM) Graphics 6100
with OpenCL 1.2 capability.
Started OpenCL SGEMM tuner with batch size 16.
Will try 578 valid configurations.
(1/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=16 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 20428.1 us (26.3 GFLOPS)
(2/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 12921.8 us (41.5 GFLOPS)
(4/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=16 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 10767.5 us (49.9 GFLOPS)
(5/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 7628.6 us (70.4 GFLOPS)
(42/578) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=16 NDIMC=16 NWG=64 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 7621.6 us (70.4 GFLOPS)
(69/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=2 VWN=1 6047.7 us (88.8 GFLOPS)
(70/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=2 VWN=1 5100.3 us (105.3 GFLOPS)
(91/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=16 NDIMC=16 NWG=64 SA=0 SB=0 STRM=0 STRN=0 VWM=2 VWN=1 5086.5 us (105.5 GFLOPS)
(116/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=4 VWN=1 3487.5 us (153.9 GFLOPS)
(199/578) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=0 SB=0 STRM=0 STRN=0 VWM=2 VWN=2 3434.6 us (156.3 GFLOPS)
(224/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=4 VWN=2 3157.7 us (170.0 GFLOPS)
(267/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=64 SA=0 SB=0 STRM=0 STRN=0 VWM=2 VWN=4 3065.7 us (175.1 GFLOPS)
(272/578) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=0 SB=0 STRM=0 STRN=0 VWM=2 VWN=4 3015.1 us (178.1 GFLOPS)
(286/578) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=0 SB=0 STRM=0 STRN=0 VWM=4 VWN=4 2593.4 us (207.0 GFLOPS)
(517/578) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=4 VWN=2 2432.2 us (220.7 GFLOPS)
(575/578) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 VWM=4 VWN=4 2386.4 us (225.0 GFLOPS)
Wavefront/Warp size: 8

 workgroup size: 256
 workgroup dimensions: 256 256 256
info depth 1 seldepth 2 time 11464 nodes 4 score cp 20 hashfull 0 nps 0 tbhits 0 pv e2e4 e7e5
info depth 1 seldepth 2 time 16473 nodes 4 score cp 20 hashfull 0 nps 0 tbhits 0 pv e2e4 e7e5 h2h4
Abort trap: 6

@gennaro-tedesco
Copy link

I am seeing the exact same problem: does this have a known fix by now?

@frpays
Copy link
Contributor

frpays commented Aug 9, 2019

This issue is most likely a duplicate of #126 and a few others. This is typical: a crash after the search is over. It triggers a thread_local destructor which, in turn, deallocates opencl resources, that triggers crashes on some platforms.

This is solved by #516 which was merged into lc0-v0.19.0.

I will now close this case. If you feel you have a simular problem, open a new issue, stating everything (platform, lc0 version, steps leading to the crash, etc...). Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants