Originally reported here: https://groups.google.com/d/msg/thrust-users/X7-FEDtKfBo/4wVMgfGgBgAJ
Here's a self-contained example showing a bug with the latest Thrust (I've tried both the one included with Cuda 7.5 RC and the latest from the master branch of the repo which included a recent fix for inclusive_scan): https://gist.github.com/eglaser77/756e5a9234cf0f08a3fb.
I build it with the command:
/usr/local/cuda/bin/nvcc -arch=sm_30 thrust_test.cu -o thrust_test -I/usr/local/cuda/include -g -L/usr/local/cuda/lib64/ -lcuda -lcudart
Basically I am trying to get the locations of 'true' values in a stencil. The first method uses thrust::inclusive_scan followed by thrust::upper_bound. It works with host vectors but fails when run with device vectors on the GPU. The second method does a thrust::copy_if and works fine. I get the same results on a Quadro K2100M and a GeForce GTX 750 Ti.
Here's the output I get (hindices1 are from the inclusive_scan/upper_bound method; hindices2 are from copy_if):
i: 0 stencil_location: 467508 hindices1: 467508 hindices2: 467508
i: 1 stencil_location: 1326441 hindices1: 1326441 hindices2: 1326441
i: 2 stencil_location: 1541662 hindices1: 1541662 hindices2: 1541662
i: 3 stencil_location: 1679866 hindices1: 1679866 hindices2: 1679866
i: 4 stencil_location: 2234773 hindices1: 2234773 hindices2: 2234773
i: 5 stencil_location: 2387355 hindices1: 2387355 hindices2: 2387355
i: 6 stencil_location: 2653762 hindices1: 2653762 hindices2: 2653762
i: 7 stencil_location: 3159732 hindices1: 3159732 hindices2: 3159732
i: 8 stencil_location: 3226888 hindices1: 3226888 hindices2: 3226888
i: 9 stencil_location: 3828014 hindices1: 3828014 hindices2: 3828014
i: 10 stencil_location: 3887644 hindices1: 3887644 hindices2: 3887644
i: 11 stencil_location: 3909417 hindices1: 3909417 hindices2: 3909417
i: 12 stencil_location: 3924245 hindices1: 3924245 hindices2: 3924245
i: 13 stencil_location: 4042273 hindices1: 4233776 hindices2: 4042273
i: 14 stencil_location: 4150580 hindices1: 4446033 hindices2: 4150580
i: 15 stencil_location: 4233776 hindices1: 4484984 hindices2: 4233776
i: 16 stencil_location: 4425058 hindices1: 4836990 hindices2: 4425058
i: 17 stencil_location: 4446033 hindices1: 5328271 hindices2: 4446033
i: 18 stencil_location: 4484984 hindices1: 5483482 hindices2: 4484984
i: 19 stencil_location: 4565655 hindices1: 5755194 hindices2: 4565655
i: 20 stencil_location: 4629464 hindices1: 5781566 hindices2: 4629464
i: 21 stencil_location: 4703190 hindices1: 5987753 hindices2: 4703190
i: 22 stencil_location: 4836990 hindices1: 8000000 hindices2: 4836990
i: 23 stencil_location: 4903165 hindices1: 8000000 hindices2: 4903165
i: 24 stencil_location: 4910365 hindices1: 8000000 hindices2: 4910365
i: 25 stencil_location: 5328271 hindices1: 8000000 hindices2: 5328271
i: 26 stencil_location: 5483482 hindices1: 8000000 hindices2: 5483482
i: 27 stencil_location: 5755194 hindices1: 8000000 hindices2: 5755194
i: 28 stencil_location: 5781566 hindices1: 8000000 hindices2: 5781566
i: 29 stencil_location: 5966710 hindices1: 8000000 hindices2: 5966710
i: 30 stencil_location: 5987753 hindices1: 8000000 hindices2: 5987753
i: 31 stencil_location: 7870669 hindices1: 8000000 hindices2: 7870669
The problem appears to be in the inclusive_scan call. When I examine the values I see that it is not strictly increasing as I would expect. Printing out where the scanned values change I get the following:
i: 467508 hscanned[i]: 1
i: 1326441 hscanned[i]: 2
i: 1541662 hscanned[i]: 3
i: 1679866 hscanned[i]: 4
i: 2234773 hscanned[i]: 5
i: 2387355 hscanned[i]: 6
i: 2653762 hscanned[i]: 7
i: 3159732 hscanned[i]: 8
i: 3226888 hscanned[i]: 9
i: 3828014 hscanned[i]: 10
i: 3887644 hscanned[i]: 11
i: 3909417 hscanned[i]: 12
i: 3924245 hscanned[i]: 13
i: 4008960 hscanned[i]: 11
i: 4042273 hscanned[i]: 12
i: 4150580 hscanned[i]: 13
i: 4233776 hscanned[i]: 14
i: 4276224 hscanned[i]: 13
i: 4425058 hscanned[i]: 14
i: 4446033 hscanned[i]: 15
i: 4484984 hscanned[i]: 16
i: 4543488 hscanned[i]: 14
i: 4565655 hscanned[i]: 15
i: 4629464 hscanned[i]: 16
i: 4677120 hscanned[i]: 15
i: 4703190 hscanned[i]: 16
i: 4836990 hscanned[i]: 17
i: 4903165 hscanned[i]: 18
i: 4910365 hscanned[i]: 19
i: 4944384 hscanned[i]: 17
i: 5328271 hscanned[i]: 18
i: 5483482 hscanned[i]: 19
i: 5755194 hscanned[i]: 20
i: 5781566 hscanned[i]: 21
i: 5879808 hscanned[i]: 20
i: 5966710 hscanned[i]: 21
i: 5987753 hscanned[i]: 22
i: 6013440 hscanned[i]: 21
i: 7870669 hscanned[i]: 22
Originally reported here: https://groups.google.com/d/msg/thrust-users/X7-FEDtKfBo/4wVMgfGgBgAJ
Here's a self-contained example showing a bug with the latest Thrust (I've tried both the one included with Cuda 7.5 RC and the latest from the master branch of the repo which included a recent fix for inclusive_scan): https://gist.github.com/eglaser77/756e5a9234cf0f08a3fb.
I build it with the command:
/usr/local/cuda/bin/nvcc -arch=sm_30 thrust_test.cu -o thrust_test -I/usr/local/cuda/include -g -L/usr/local/cuda/lib64/ -lcuda -lcudart
Basically I am trying to get the locations of 'true' values in a stencil. The first method uses thrust::inclusive_scan followed by thrust::upper_bound. It works with host vectors but fails when run with device vectors on the GPU. The second method does a thrust::copy_if and works fine. I get the same results on a Quadro K2100M and a GeForce GTX 750 Ti.
Here's the output I get (hindices1 are from the inclusive_scan/upper_bound method; hindices2 are from copy_if):
i: 0 stencil_location: 467508 hindices1: 467508 hindices2: 467508
i: 1 stencil_location: 1326441 hindices1: 1326441 hindices2: 1326441
i: 2 stencil_location: 1541662 hindices1: 1541662 hindices2: 1541662
i: 3 stencil_location: 1679866 hindices1: 1679866 hindices2: 1679866
i: 4 stencil_location: 2234773 hindices1: 2234773 hindices2: 2234773
i: 5 stencil_location: 2387355 hindices1: 2387355 hindices2: 2387355
i: 6 stencil_location: 2653762 hindices1: 2653762 hindices2: 2653762
i: 7 stencil_location: 3159732 hindices1: 3159732 hindices2: 3159732
i: 8 stencil_location: 3226888 hindices1: 3226888 hindices2: 3226888
i: 9 stencil_location: 3828014 hindices1: 3828014 hindices2: 3828014
i: 10 stencil_location: 3887644 hindices1: 3887644 hindices2: 3887644
i: 11 stencil_location: 3909417 hindices1: 3909417 hindices2: 3909417
i: 12 stencil_location: 3924245 hindices1: 3924245 hindices2: 3924245
i: 13 stencil_location: 4042273 hindices1: 4233776 hindices2: 4042273
i: 14 stencil_location: 4150580 hindices1: 4446033 hindices2: 4150580
i: 15 stencil_location: 4233776 hindices1: 4484984 hindices2: 4233776
i: 16 stencil_location: 4425058 hindices1: 4836990 hindices2: 4425058
i: 17 stencil_location: 4446033 hindices1: 5328271 hindices2: 4446033
i: 18 stencil_location: 4484984 hindices1: 5483482 hindices2: 4484984
i: 19 stencil_location: 4565655 hindices1: 5755194 hindices2: 4565655
i: 20 stencil_location: 4629464 hindices1: 5781566 hindices2: 4629464
i: 21 stencil_location: 4703190 hindices1: 5987753 hindices2: 4703190
i: 22 stencil_location: 4836990 hindices1: 8000000 hindices2: 4836990
i: 23 stencil_location: 4903165 hindices1: 8000000 hindices2: 4903165
i: 24 stencil_location: 4910365 hindices1: 8000000 hindices2: 4910365
i: 25 stencil_location: 5328271 hindices1: 8000000 hindices2: 5328271
i: 26 stencil_location: 5483482 hindices1: 8000000 hindices2: 5483482
i: 27 stencil_location: 5755194 hindices1: 8000000 hindices2: 5755194
i: 28 stencil_location: 5781566 hindices1: 8000000 hindices2: 5781566
i: 29 stencil_location: 5966710 hindices1: 8000000 hindices2: 5966710
i: 30 stencil_location: 5987753 hindices1: 8000000 hindices2: 5987753
i: 31 stencil_location: 7870669 hindices1: 8000000 hindices2: 7870669
The problem appears to be in the inclusive_scan call. When I examine the values I see that it is not strictly increasing as I would expect. Printing out where the scanned values change I get the following:
i: 467508 hscanned[i]: 1
i: 1326441 hscanned[i]: 2
i: 1541662 hscanned[i]: 3
i: 1679866 hscanned[i]: 4
i: 2234773 hscanned[i]: 5
i: 2387355 hscanned[i]: 6
i: 2653762 hscanned[i]: 7
i: 3159732 hscanned[i]: 8
i: 3226888 hscanned[i]: 9
i: 3828014 hscanned[i]: 10
i: 3887644 hscanned[i]: 11
i: 3909417 hscanned[i]: 12
i: 3924245 hscanned[i]: 13
i: 4008960 hscanned[i]: 11
i: 4042273 hscanned[i]: 12
i: 4150580 hscanned[i]: 13
i: 4233776 hscanned[i]: 14
i: 4276224 hscanned[i]: 13
i: 4425058 hscanned[i]: 14
i: 4446033 hscanned[i]: 15
i: 4484984 hscanned[i]: 16
i: 4543488 hscanned[i]: 14
i: 4565655 hscanned[i]: 15
i: 4629464 hscanned[i]: 16
i: 4677120 hscanned[i]: 15
i: 4703190 hscanned[i]: 16
i: 4836990 hscanned[i]: 17
i: 4903165 hscanned[i]: 18
i: 4910365 hscanned[i]: 19
i: 4944384 hscanned[i]: 17
i: 5328271 hscanned[i]: 18
i: 5483482 hscanned[i]: 19
i: 5755194 hscanned[i]: 20
i: 5781566 hscanned[i]: 21
i: 5879808 hscanned[i]: 20
i: 5966710 hscanned[i]: 21
i: 5987753 hscanned[i]: 22
i: 6013440 hscanned[i]: 21
i: 7870669 hscanned[i]: 22