groupshared memory compute shader debugging #1468

bredbored · 2019-07-23T13:40:27Z

Simulates all threads in a thread group so that changes to groupshared memory from other threads are available in the thread of interest.

Description

This is the implementation of simulating all the threads in a thread group across all available cpu cores that I mentioned in response to issue #1054 "Compute shader results from other threads". I made it about a year ago when I needed to debug a compute shader with groupshared memory. It can take a while with a large group size, but it's time well spent if you're trying to understand what your compute shader is doing!

If it detects group sync instructions, it will simulate all the threads in the thread group of the target thread until the point at which they no longer affect the target's results. If threads in another group can write to a location that the simulated group reads then that won't be reflected. Ideally it would detect cases that won't be simulated correctly, and either warn you or simulate everything it needs to, but it doesn't at the moment.

It works by blocking at each thread sync instruction until all threads have arrived. Atomic instructions are inside critical sections. It uses deferred contexts, which turns out to be a bit of nuisance because I have to turn off the single threaded device bit, but also nice, because I don't have to save and restore state on the immediate context. I assume it's going to conflict with pull request #1457 "Move D3D11 API usage out of dxbc_debug.cpp".

It puts up the cancel dialog if it runs for a certain number of instructions. Ideally I think it would have a cancel button next to the "thinking" indicator that you could press at any time, but I haven't done that. It would also be nice if it indicated how many threads it's simulating, so you can get an idea of what you're in for. I suppose it could also give you the option to override simulating everything it thinks you need.

Simulates all threads in a thread group so that changes to groupshared memory from other threads are available in the thread of interest.

skarolewics · 2019-07-23T17:53:35Z

Just a few high level comments:

It feels like the detection of sync instructions would be better located in DXBC and then retrieved via dxbc->HasAnySyncInstructions() or something. This code would be easily reused for D3D12 debugging.
Clamping the wave count to std::thread::hardware_concurrency means that very large waves (i.e., 1024 threads) may simulate incorrectly. A single-threaded lockstep or a pool of waves that worker threads pull from to get them all to sync might be a better option.
I think having the thread function pulled out to be separate instead of being in a lambda would benefit readability.
Given the potential for long runtimes, having this as an option to enable before beginning a CS debug would help reduce waiting time when the user doesn't need it.

baldurk · 2019-07-23T18:03:04Z

Thanks for this PR, I appreciate you putting it together. At the moment though while this is something which is fine to live on your own fork if it suits your needs, it's not something I want to merge upstream.

As you've found doing a brute-force simulate on all threads is not too large a change, if that were all that was required I would have implemented it before now. As I mentioned in #1054 though I have some concerns with exposing this ability to shoot yourself in the foot to users, and especially as you've done here where it isn't an opt-in option. It needs more careful investigation to get a robust solution.

bredbored · 2019-07-23T18:52:30Z

I referred to the CPU threads as waves in the code because I was thinking of each thread as analagous to a GPU wave, but they are not equivalent to the GPU waves that the threads ran in. If your CPU has 8 hardware threads and there are 1024 threads in a thread group, you should get 128 threads in each CPU wave. If you've got 36 hardware threads, you should get 29 threads in 35 of the waves and 9 in the 36th. Either way, large thread groups should simulate correctly. I find that debugging a compute shader that uses group sync instructions is useless if the other threads in the group aren't simulated, because its result depends on data from the other threads. This implementation only simulates other threads in the group if there are group sync instructions, so I think it only adds cost when you need it. Also, many compute shaders don't use 1024 threads and don't take as long to simulate. That said, I agree that a cancel button would be nice, and I would look into adding it if there were interest. If dxbc_debug.cpp is going to become API agnostic, it does seem like most of this DebugThread function could be as well, which could also address moving the lambda somewhere else.

…

On Tue, Jul 23, 2019 at 1:53 PM Steve Karolewics ***@***.***> wrote: Just a few high level comments: - It feels like the detection of sync instructions would be better located in DXBC and then retrieved via dxbc->HasAnySyncInstructions() or something. This code would be easily reused for D3D12 debugging. - Clamping the wave count to std::thread::hardware_concurrency means that very large waves (i.e., 1024 threads) may simulate incorrectly. A single-threaded lockstep or a pool of waves that worker threads pull from to get them all to sync might be a better option. - I think having the thread function pulled out to be separate instead of being in a lambda would benefit readability. - Given the potential for long runtimes, having this as an option to enable before beginning a CS debug would help reduce waiting time when the user doesn't need it. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1468?email_source=notifications&email_token=AEOTURAR366N7NLEYE3IOB3QA5AS5A5CNFSM4IGEQFOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2T5PKQ#issuecomment-514316202>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEOTURBDSO2AODONNQ7HBBLQA5AS5ANCNFSM4IGEQFOA> .

bredbored · 2019-07-24T05:24:53Z

Would you be interested in a version with a cancel button? Being able to opt out doesn't seem useful to me because it already only simulates other threads in the group when they are necessary to get useful results, but being able to change your mind if you decide it's taking too long could be nice.

…

On Tue, Jul 23, 2019, 14:03 Baldur Karlsson ***@***.***> wrote: Thanks for this PR, I appreciate you putting it together. At the moment though while this is something which is fine to live on your own fork if it suits your needs, it's not something I want to merge upstream. As you've found doing a brute-force simulate on all threads is not too large a change, if that were all that was required I would have implemented it before now. As I mentioned in #1054 <#1054> though I have some concerns with exposing this ability to shoot yourself in the foot to users, and especially as you've done here where it isn't an opt-in option. It needs more careful investigation to get a robust solution. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1468?email_source=notifications&email_token=AEOTURC3KDJL7MDW5LCEXZDQA5BWPA5CNFSM4IGEQFOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2T6KQQ#issuecomment-514319682>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEOTURGUOTNSL6HGBZBD323QA5BWPANCNFSM4IGEQFOA> .

baldurk · 2019-07-24T10:26:33Z

No that still wouldn't be something I'd want to merge as-is. Even in that case it's still making the debug operation orders of magnitude slower which might not be directly useful if the user is able to debug what they're interested in without accurate cross-thread communication.

bredbored · 2019-07-24T12:58:29Z

How about a cancel button and a checkbox next to the debug button to turn the feature on or off?

…

On Wed, Jul 24, 2019, 06:26 Baldur Karlsson ***@***.***> wrote: No that still wouldn't be something I'd want to merge as-is. Even in that case it's still making the debug operation orders of magnitude slower which might not be directly useful if the user is able to debug what they're interested in without accurate cross-thread communication. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1468?email_source=notifications&email_token=AEOTURHQ2BCPNZNWZYVVRXTQBAU6PA5CNFSM4IGEQFOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2V42LI#issuecomment-514575661>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEOTURG5TDHDAXCKRSANINTQBAU6PANCNFSM4IGEQFOA> .

baldurk · 2019-07-24T13:10:14Z

If the direction was OK but it needed UI changes or other refactors I would ask for those, as I do with other PRs. In this case the broad concept is not something I want to merge currently. One benefit of open source is that not everything has to be merged upstream immediately to be useful, you can keep local changes and customisations while still keeping up to date as needed.

bredbored · 2019-07-24T13:52:39Z

Sorry, I didn't intend to be pushy. I was overcome by my own enthusiasm for this feature! I greatly appreciate the open source nature of the project.

…

On Wed, Jul 24, 2019 at 9:10 AM Baldur Karlsson ***@***.***> wrote: If the direction was OK but it needed UI changes or other refactors I would ask for those, as I do with other PRs. In this case the broad concept is not something I want to merge currently. One benefit of open source is that not everything has to be merged upstream immediately to be useful, you can keep local changes and customisations while still keeping up to date as needed. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1468?email_source=notifications&email_token=AEOTURAMJTWRP3ANASHUWN3QBBIEJA5CNFSM4IGEQFOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2WI56Y#issuecomment-514625275>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEOTURAEOQC25Y6RZ75JQELQBBIEJANCNFSM4IGEQFOA> .

baldurk · 2019-07-24T14:02:42Z

It's fine! I appreciate you prototyping this and I'm glad it worked out for you, I just have to be careful with things I merge upstream. Features are forever and there's a commitment that they will be maintained indefinitely. I don't want to rush into an implementation until I know I can back that up.

WoodyPWX · 2021-03-29T12:36:45Z

I have 8x8 pixels tiles for light culling and groupshared memory for min/max depth for each tile. An option allowing me debug this properly would be very useful.

APassbyDreg · 2023-08-02T07:22:24Z

Is there any update to this feature? It would be helpful to have this when debugging compute shaders with groupshared variables. Currently at v1.27 all groupshared values are not accessable and leads to incorrect control flow.

baldurk · 2023-08-02T09:13:15Z

Please do not comment on old issues or pull requests that have been closed.

If you are encountering a bug or are requesting a feature please open a new issue. You can reference this issue if you wish, but opening a new issue prevents any confusion of accidentally linking two unrelated issues and means that each issue can be handled in a clean process.

bredbored force-pushed the groupshared branch from ac9f2cf to 94fa137 Compare July 23, 2019 14:11

groupshared memory compute shader debugging

4a6b66c

Simulates all threads in a thread group so that changes to groupshared memory from other threads are available in the thread of interest.

bredbored force-pushed the groupshared branch from 94fa137 to 4a6b66c Compare July 23, 2019 14:16

baldurk closed this Jul 23, 2019

Repository owner locked as resolved and limited conversation to collaborators Aug 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupshared memory compute shader debugging #1468

groupshared memory compute shader debugging #1468

bredbored commented Jul 23, 2019

skarolewics commented Jul 23, 2019

baldurk commented Jul 23, 2019

bredbored commented Jul 23, 2019 via email

bredbored commented Jul 24, 2019 via email

baldurk commented Jul 24, 2019

bredbored commented Jul 24, 2019 via email

baldurk commented Jul 24, 2019

bredbored commented Jul 24, 2019 via email

baldurk commented Jul 24, 2019

WoodyPWX commented Mar 29, 2021

APassbyDreg commented Aug 2, 2023

baldurk commented Aug 2, 2023

groupshared memory compute shader debugging #1468

groupshared memory compute shader debugging #1468

Conversation

bredbored commented Jul 23, 2019

Description

skarolewics commented Jul 23, 2019

baldurk commented Jul 23, 2019

bredbored commented Jul 23, 2019 via email

bredbored commented Jul 24, 2019 via email

baldurk commented Jul 24, 2019

bredbored commented Jul 24, 2019 via email

baldurk commented Jul 24, 2019

bredbored commented Jul 24, 2019 via email

baldurk commented Jul 24, 2019

WoodyPWX commented Mar 29, 2021

APassbyDreg commented Aug 2, 2023

baldurk commented Aug 2, 2023