New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmented folds crash or give inconsistent results #423
Comments
There seems to be more than one thing going on On my laptop it produces the correct number, but still crashes on floats Here is the info from my laptop's GPU:
|
Thanks for the minimal test case! I can reproduce this on my machine, am investigating... |
Can you see if this fix works for you? In your extra-deps:
- git: https://github.com/tmcdonell/accelerate.git
commit: 442dcbdb8d95407bc650d8f4ce6aa62ab593e484
- git: https://github.com/tmcdonell/accelerate-llvm.git
commit: 6aacde0ffae37552f5f5bacd1ae3029b14012f8c
subdirs:
- 'accelerate-llvm'
- 'accelerate-llvm-native'
- 'accelerate-llvm-ptx' |
The only change in the generated assembly (modulo renaming): Old: LBB0_25: // %if83.entry
// in Loop: Header=BB0_23 Depth=2
add.s64 %rd30, %rd8, %rd62;
setp.ge.s64 %p13, %rd30, %rd20;
mov.f32 %f68, %f2;
@%p13 bra LBB0_27;
bra.uni LBB0_26; New: LBB0_25: // %if83.entry
// in Loop: Header=BB0_23 Depth=2
add.s64 %rd30, %rd8, %rd62;
setp.ge.s64 %p13, %rd30, %rd20;
// implicit-def: %f68
@%p13 bra LBB0_27;
bra.uni LBB0_26; |
It did not fix the bug on my desktop (the box with the Quadro K620) I updated my test repository to reflect pulling these changes |
Oh, I forgot to mention that you will need to delete the cache directory, |
Deleted the accelerate cache folder, the bug persists This bug goes farther than it originally seemed Updated the test repository |
the tagged commit doesn't entirely fix it, but all the necessary changes were on that branch. |
I am submitting a...
Description
When performing large (not huge) segmented folds with floats CUDA crashes
Expected behaviour
Folds should work the same with floats and doubles, within the limits of precision
Current behaviour
Floats crash
Steps to reproduce (for bugs)
A minimal project with this bug
https://github.com/electroCutie/AccelerateHs_Bugs
this bug is under the executable
foldSegBug
Your environment
Accelerate version: 1.2.0.0
Accelerate backend(s) used: accelerate-llvm-ptx
GHC version: The Glorious Glasgow Haskell Compilation System, version 8.0.2
Operating system and version: XUbuntu 17.10
Link to your project/example: https://github.com/electroCutie/AccelerateHs_Bugs
If this is a bug with the GPU backend, include the output of
nvidia-device-query
:CUDA device query (Driver API, statically linked)
CUDA driver version 9.0
CUDA API version 8.0
Detected 1 CUDA capable device
Device 0: Quadro K620
CUDA capability: 5.0
CUDA cores: 384 cores in 3 multiprocessors (128 cores/MP)
Global memory: 2 GB
Constant memory: 64 kB
Shared memory per block: 48 kB
Registers per block: 65536
Warp size: 32
Maximum threads per multiprocessor: 2048
Maximum threads per block: 1024
Maximum grid dimensions: 2147483647 x 65535 x 65535
Maximum block dimensions: 1024 x 1024 x 64
GPU clock rate: 1.124 GHz
Memory clock rate: 900.0 MHz
Memory bus width: 128-bit
L2 cache size: 2 MB
Maximum texture dimensions
1D: 65536
2D: 65536 x 65536
3D: 4096 x 4096 x 4096
Texture alignment: 512 B
Maximum memory pitch: 2 GB
Concurrent kernel execution: Yes
Concurrent copy and execution: Yes, with 1 copy engine
Runtime limit on kernel execution: Yes
Integrated GPU sharing host memory: No
Host page-locked memory mapping: Yes
ECC memory support: No
Unified addressing (UVA): Yes
PCI bus/location: 3/0
Compute mode: Default
Multiple contexts are allowed on the device simultaneously
The text was updated successfully, but these errors were encountered: