-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
groupshared memory compute shader debugging #1468
Conversation
Simulates all threads in a thread group so that changes to groupshared memory from other threads are available in the thread of interest.
Just a few high level comments:
|
Thanks for this PR, I appreciate you putting it together. At the moment though while this is something which is fine to live on your own fork if it suits your needs, it's not something I want to merge upstream. As you've found doing a brute-force simulate on all threads is not too large a change, if that were all that was required I would have implemented it before now. As I mentioned in #1054 though I have some concerns with exposing this ability to shoot yourself in the foot to users, and especially as you've done here where it isn't an opt-in option. It needs more careful investigation to get a robust solution. |
I referred to the CPU threads as waves in the code because I was thinking
of each thread as analagous to a GPU wave, but they are not equivalent to
the GPU waves that the threads ran in. If your CPU has 8 hardware threads
and there are 1024 threads in a thread group, you should get 128 threads in
each CPU wave. If you've got 36 hardware threads, you should get 29
threads in 35 of the waves and 9 in the 36th. Either way, large thread
groups should simulate correctly.
I find that debugging a compute shader that uses group sync instructions is
useless if the other threads in the group aren't simulated, because its
result depends on data from the other threads. This implementation only
simulates other threads in the group if there are group sync instructions,
so I think it only adds cost when you need it. Also, many compute shaders
don't use 1024 threads and don't take as long to simulate. That said, I
agree that a cancel button would be nice, and I would look into adding it
if there were interest.
If dxbc_debug.cpp is going to become API agnostic, it does seem like most
of this DebugThread function could be as well, which could also address
moving the lambda somewhere else.
…On Tue, Jul 23, 2019 at 1:53 PM Steve Karolewics ***@***.***> wrote:
Just a few high level comments:
- It feels like the detection of sync instructions would be better
located in DXBC and then retrieved via dxbc->HasAnySyncInstructions() or
something. This code would be easily reused for D3D12 debugging.
- Clamping the wave count to std::thread::hardware_concurrency means
that very large waves (i.e., 1024 threads) may simulate incorrectly. A
single-threaded lockstep or a pool of waves that worker threads pull from
to get them all to sync might be a better option.
- I think having the thread function pulled out to be separate instead
of being in a lambda would benefit readability.
- Given the potential for long runtimes, having this as an option to
enable before beginning a CS debug would help reduce waiting time when the
user doesn't need it.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1468?email_source=notifications&email_token=AEOTURAR366N7NLEYE3IOB3QA5AS5A5CNFSM4IGEQFOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2T5PKQ#issuecomment-514316202>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEOTURBDSO2AODONNQ7HBBLQA5AS5ANCNFSM4IGEQFOA>
.
|
Would you be interested in a version with a cancel button? Being able to
opt out doesn't seem useful to me because it already only simulates other
threads in the group when they are necessary to get useful results, but
being able to change your mind if you decide it's taking too long could be
nice.
…On Tue, Jul 23, 2019, 14:03 Baldur Karlsson ***@***.***> wrote:
Thanks for this PR, I appreciate you putting it together. At the moment
though while this is something which is fine to live on your own fork if it
suits your needs, it's not something I want to merge upstream.
As you've found doing a brute-force simulate on all threads is not too
large a change, if that were all that was required I would have implemented
it before now. As I mentioned in #1054
<#1054> though I have some
concerns with exposing this ability to shoot yourself in the foot to users,
and especially as you've done here where it isn't an opt-in option. It
needs more careful investigation to get a robust solution.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1468?email_source=notifications&email_token=AEOTURC3KDJL7MDW5LCEXZDQA5BWPA5CNFSM4IGEQFOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2T6KQQ#issuecomment-514319682>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEOTURGUOTNSL6HGBZBD323QA5BWPANCNFSM4IGEQFOA>
.
|
No that still wouldn't be something I'd want to merge as-is. Even in that case it's still making the debug operation orders of magnitude slower which might not be directly useful if the user is able to debug what they're interested in without accurate cross-thread communication. |
How about a cancel button and a checkbox next to the debug button to turn
the feature on or off?
…On Wed, Jul 24, 2019, 06:26 Baldur Karlsson ***@***.***> wrote:
No that still wouldn't be something I'd want to merge as-is. Even in that
case it's still making the debug operation orders of magnitude slower which
might not be directly useful if the user is able to debug what they're
interested in without accurate cross-thread communication.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1468?email_source=notifications&email_token=AEOTURHQ2BCPNZNWZYVVRXTQBAU6PA5CNFSM4IGEQFOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2V42LI#issuecomment-514575661>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEOTURG5TDHDAXCKRSANINTQBAU6PANCNFSM4IGEQFOA>
.
|
If the direction was OK but it needed UI changes or other refactors I would ask for those, as I do with other PRs. In this case the broad concept is not something I want to merge currently. One benefit of open source is that not everything has to be merged upstream immediately to be useful, you can keep local changes and customisations while still keeping up to date as needed. |
Sorry, I didn't intend to be pushy. I was overcome by my own enthusiasm
for this feature! I greatly appreciate the open source nature of the
project.
…On Wed, Jul 24, 2019 at 9:10 AM Baldur Karlsson ***@***.***> wrote:
If the direction was OK but it needed UI changes or other refactors I
would ask for those, as I do with other PRs. In this case the broad concept
is not something I want to merge currently. One benefit of open source is
that not everything has to be merged upstream immediately to be useful, you
can keep local changes and customisations while still keeping up to date as
needed.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1468?email_source=notifications&email_token=AEOTURAMJTWRP3ANASHUWN3QBBIEJA5CNFSM4IGEQFOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2WI56Y#issuecomment-514625275>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEOTURAEOQC25Y6RZ75JQELQBBIEJANCNFSM4IGEQFOA>
.
|
It's fine! I appreciate you prototyping this and I'm glad it worked out for you, I just have to be careful with things I merge upstream. Features are forever and there's a commitment that they will be maintained indefinitely. I don't want to rush into an implementation until I know I can back that up. |
I have 8x8 pixels tiles for light culling and groupshared memory for min/max depth for each tile. An option allowing me debug this properly would be very useful. |
Is there any update to this feature? It would be helpful to have this when debugging compute shaders with groupshared variables. Currently at v1.27 all groupshared values are not accessable and leads to incorrect control flow. |
Please do not comment on old issues or pull requests that have been closed. If you are encountering a bug or are requesting a feature please open a new issue. You can reference this issue if you wish, but opening a new issue prevents any confusion of accidentally linking two unrelated issues and means that each issue can be handled in a clean process. |
Simulates all threads in a thread group so that changes to groupshared memory from other threads are available in the thread of interest.
Description
This is the implementation of simulating all the threads in a thread group across all available cpu cores that I mentioned in response to issue #1054 "Compute shader results from other threads". I made it about a year ago when I needed to debug a compute shader with groupshared memory. It can take a while with a large group size, but it's time well spent if you're trying to understand what your compute shader is doing!
If it detects group sync instructions, it will simulate all the threads in the thread group of the target thread until the point at which they no longer affect the target's results. If threads in another group can write to a location that the simulated group reads then that won't be reflected. Ideally it would detect cases that won't be simulated correctly, and either warn you or simulate everything it needs to, but it doesn't at the moment.
It works by blocking at each thread sync instruction until all threads have arrived. Atomic instructions are inside critical sections. It uses deferred contexts, which turns out to be a bit of nuisance because I have to turn off the single threaded device bit, but also nice, because I don't have to save and restore state on the immediate context. I assume it's going to conflict with pull request #1457 "Move D3D11 API usage out of dxbc_debug.cpp".
It puts up the cancel dialog if it runs for a certain number of instructions. Ideally I think it would have a cancel button next to the "thinking" indicator that you could press at any time, but I haven't done that. It would also be nice if it indicated how many threads it's simulating, so you can get an idea of what you're in for. I suppose it could also give you the option to override simulating everything it thinks you need.