Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong CUDA backend results (while interpreter ones are good) #141

Closed
wdanilo opened this issue Dec 19, 2013 · 7 comments
Closed

Wrong CUDA backend results (while interpreter ones are good) #141

wdanilo opened this issue Dec 19, 2013 · 7 comments
Milestone

Comments

@wdanilo
Copy link

wdanilo commented Dec 19, 2013

Hi!
I've got an example, where I faced a strange problem. I'm reading an BMP image (Word32) and I'm decomposing it to 4 channels (r,g,b and a, each Word8). Then I'm composing it back (to Word32) and the result is different than original image when using CUDA backend and identical when using interpreter.

I want to ask you, If you could look at this example and tell me if it is a bug in Accelerate or there are some not allowed operations in CUDA backend, which are working in Interpreter by accident?

Example: https://github.com/wdanilo/AccelerateHS-tests/tree/master/channels

It bases on Canny accelerate example (to parse cmd arguments) and there is not much code - all example specific code is placed inside the 'src' directory.
There is also a sample image attached.
To run the example write: ./dist/build/test/test small.bmp small2.bmp

The program prints its values to the screen.
When running on CUDA, we get:
Image {_channels = fromList [("rgba",Raw Array (Z :. 3 :. 3) [0,0,0,33153,8454273,8487168,0,0,0])]})
and with interpreter:
Right (Image {_channels = fromList [("rgba",Raw Array (Z :. 3 :. 3) [4278255615,4294902015,4294967040,4278222976,4286578816,4286611456,4278190335,4278255360,4294901760])]}) (which is the correct value)

@wdanilo
Copy link
Author

wdanilo commented Dec 19, 2013

I've simplified the example. Now all code is in Main.hs (between line 49 and 65). The error is still the same.

@tmcdonell
Copy link
Member

It doesn't print anything for me... does that mean it is correct?

What GPU are you using?

@tmcdonell
Copy link
Member

Actually, if I add print $ ParseArgs.run backend myimg2 as the last thing in main, then I get

Array (Z :. 3 :. 3) [4278255615,4294902015,4294967040,4278222976,4286578816,4286611456,4278190335,4278255360,4294901760]

for both the CUDA and interpreter backends.

This might be #58 ?

@wdanilo
Copy link
Author

wdanilo commented Dec 20, 2013

@tmcdonell: I'm sorry - of course the line print $ ParseArgs.run backend myimg2 was missing. Running on CUDA I get wrong results there (you can look at the small2.bmp file - this is my results from CUDA computations)

The issue #58 migh be related to this one. I'm suprised, that you are getting good results, while I'm getting wrong. My card has compute capability of 1.2 - maybe this is related?

@tmcdonell
Copy link
Member

I guess that that is it, as I have a compute capability 3.0 card.

I never tracked down the root of the problem with #58. Perhaps texture references for 8- and 16-bit types need to be done specially in CUDA? Maybe it is a CUDA bug? Probably testing to see if 8-bit texture reference work when using plain CUDA is the best first step.

@wdanilo
Copy link
Author

wdanilo commented Dec 20, 2013

@tmcdonell: I do not know plain CUDA well. I'll check what I can do in this topic :)

@tmcdonell tmcdonell added this to the 0.15 release milestone Aug 16, 2014
@tmcdonell tmcdonell modified the milestones: 0.15 release, _|_ Aug 23, 2014
@tmcdonell
Copy link
Member

I think this was just a bug with the (now dead) accelerate-cuda backend; closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants