Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupshared memory compute shader debugging #1468

Closed
wants to merge 1 commit into from

Conversation

bredbored
Copy link
Contributor

Simulates all threads in a thread group so that changes to groupshared memory from other threads are available in the thread of interest.

Description

This is the implementation of simulating all the threads in a thread group across all available cpu cores that I mentioned in response to issue #1054 "Compute shader results from other threads". I made it about a year ago when I needed to debug a compute shader with groupshared memory. It can take a while with a large group size, but it's time well spent if you're trying to understand what your compute shader is doing!

If it detects group sync instructions, it will simulate all the threads in the thread group of the target thread until the point at which they no longer affect the target's results. If threads in another group can write to a location that the simulated group reads then that won't be reflected. Ideally it would detect cases that won't be simulated correctly, and either warn you or simulate everything it needs to, but it doesn't at the moment.

It works by blocking at each thread sync instruction until all threads have arrived. Atomic instructions are inside critical sections. It uses deferred contexts, which turns out to be a bit of nuisance because I have to turn off the single threaded device bit, but also nice, because I don't have to save and restore state on the immediate context. I assume it's going to conflict with pull request #1457 "Move D3D11 API usage out of dxbc_debug.cpp".

It puts up the cancel dialog if it runs for a certain number of instructions. Ideally I think it would have a cancel button next to the "thinking" indicator that you could press at any time, but I haven't done that. It would also be nice if it indicated how many threads it's simulating, so you can get an idea of what you're in for. I suppose it could also give you the option to override simulating everything it thinks you need.

Simulates all threads in a thread group so that changes to groupshared memory from other threads are available in the thread of interest.
@skarolewics
Copy link
Contributor

Just a few high level comments:

  • It feels like the detection of sync instructions would be better located in DXBC and then retrieved via dxbc->HasAnySyncInstructions() or something. This code would be easily reused for D3D12 debugging.
  • Clamping the wave count to std::thread::hardware_concurrency means that very large waves (i.e., 1024 threads) may simulate incorrectly. A single-threaded lockstep or a pool of waves that worker threads pull from to get them all to sync might be a better option.
  • I think having the thread function pulled out to be separate instead of being in a lambda would benefit readability.
  • Given the potential for long runtimes, having this as an option to enable before beginning a CS debug would help reduce waiting time when the user doesn't need it.

@baldurk
Copy link
Owner

baldurk commented Jul 23, 2019

Thanks for this PR, I appreciate you putting it together. At the moment though while this is something which is fine to live on your own fork if it suits your needs, it's not something I want to merge upstream.

As you've found doing a brute-force simulate on all threads is not too large a change, if that were all that was required I would have implemented it before now. As I mentioned in #1054 though I have some concerns with exposing this ability to shoot yourself in the foot to users, and especially as you've done here where it isn't an opt-in option. It needs more careful investigation to get a robust solution.

@baldurk baldurk closed this Jul 23, 2019
@bredbored
Copy link
Contributor Author

bredbored commented Jul 23, 2019 via email

@bredbored
Copy link
Contributor Author

bredbored commented Jul 24, 2019 via email

@baldurk
Copy link
Owner

baldurk commented Jul 24, 2019

No that still wouldn't be something I'd want to merge as-is. Even in that case it's still making the debug operation orders of magnitude slower which might not be directly useful if the user is able to debug what they're interested in without accurate cross-thread communication.

@bredbored
Copy link
Contributor Author

bredbored commented Jul 24, 2019 via email

@baldurk
Copy link
Owner

baldurk commented Jul 24, 2019

If the direction was OK but it needed UI changes or other refactors I would ask for those, as I do with other PRs. In this case the broad concept is not something I want to merge currently. One benefit of open source is that not everything has to be merged upstream immediately to be useful, you can keep local changes and customisations while still keeping up to date as needed.

@bredbored
Copy link
Contributor Author

bredbored commented Jul 24, 2019 via email

@baldurk
Copy link
Owner

baldurk commented Jul 24, 2019

It's fine! I appreciate you prototyping this and I'm glad it worked out for you, I just have to be careful with things I merge upstream. Features are forever and there's a commitment that they will be maintained indefinitely. I don't want to rush into an implementation until I know I can back that up.

@WoodyPWX
Copy link

I have 8x8 pixels tiles for light culling and groupshared memory for min/max depth for each tile. An option allowing me debug this properly would be very useful.

@APassbyDreg
Copy link

Is there any update to this feature? It would be helpful to have this when debugging compute shaders with groupshared variables. Currently at v1.27 all groupshared values are not accessable and leads to incorrect control flow.

@baldurk
Copy link
Owner

baldurk commented Aug 2, 2023

Please do not comment on old issues or pull requests that have been closed.

If you are encountering a bug or are requesting a feature please open a new issue. You can reference this issue if you wish, but opening a new issue prevents any confusion of accidentally linking two unrelated issues and means that each issue can be handled in a clean process.

Repository owner locked as resolved and limited conversation to collaborators Aug 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants