-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Collection for stack allocator backend #169
Use Collection for stack allocator backend #169
Conversation
This eliminates the need for host allocator, storage class, etc. I refactored the demo interactor detector stuff as part of it, as well as some of the runner code that needed to be changed as a cosequence. Additional changes were required in the livermore interactor since the stack allocator now has a template parameter on memory space. I refactored some parts of the livermore atomic relaxation to reduce propagation of those template paramters and to improve program flow.
Another data point(s), it does not crash on wc.fnal.gov with nvcc V11.1.105 on wc.fnal.gov. cuda-gdb with cuda memcheck does not complains and cuda-memcheck fails after printing the output json with (the seemingly unrelated):
|
Thanks @pcanal for the check. I spent another hour progressively backing out the changes that might have affected the initialize kernel or that translation unit in general... nothing seems to fix it. :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how this became a big job, especially integrating everything with the atomic relaxation/photoelectric classes, but the changes look great! Sounds like that error is looking more and more like a compiler bug?
This changes the stack allocator interface to be compatible with both host and device memory using Collections.
Runtime error on CUDA 10.1
The errors below only occur on CUDA 10.1, not 10.2 or later. @jefflarkin thinks this may be a bug in NVCC since it only shows up in 10.1.
Currently I'm encountering a really bizarre error when building in debug on wildstyle (and no dependencies aside from json):
Inside the ParticleTrackView, "params_" is apparently point to invalid memory:
...and at the kernel level the "params" variable doesn't even exist, and the states look corrupted (shifted by 1 maybe)?:
In comparison, on emmet:
The "initialize" kernel in which it's happening should NOT have been affected by any of the changes I made at all. This sort of off-by-one thing makes me wonder if it's a similar error to the corrupted ELF data seen in #118 ?? I've spent two hours starting at this mess, and a careful review of my code, and have nothing. The changes to the stack allocator were extensive enough that it's going to be difficult or impossible to bisect my changes to see where exactly I went wrong.