Skip to content
This repository has been archived by the owner on Nov 30, 2020. It is now read-only.

V2 - Fix threadgroup size warning on MacOS Metal #591

Open
wants to merge 1 commit into
base: v2
Choose a base branch
from

Conversation

MayaViolet
Copy link

Fix for #580

Summary:

  • Add runtime utilities property to query if using Metal on MacOS
  • Reduced lerp & LUT threadgroup size on MacOS Metal to stay under threadgroup size limit

- Add runtime utilities property to query if using Metal on MacOS
- Reduced lerp & LUT threadgroup size on MacOS Metal to stay under threadgroup size limit
@bitinn
Copy link

bitinn commented Jun 24, 2018

Not saying I understand the ins and outs of Metal API, but this PR looks like a step in the right direction.

I would love to see this merged or worked on at some point (assuming someone in the Post Processing team owns a MacBook.)

My hope:

  • Test if existing threadgroup warning cause post processing effect to fail?
  • Provide some helpers to make future support for Metal easier, say, on top of SHADER_API_METAL, can we get METAL_MAX_THREAD_IN_GROUP which reflects the limit of current device?

@MayaViolet
Copy link
Author

MayaViolet commented Jun 25, 2018

Yes. The real solution is some way to query the threadgroup size limit.

Unfortunately things are a little complicated, as maximum threadgroup size varies by both device and the particular kernel, so needs a way to be determined per-kernel at runtime (which may not be possible with a simple define such as METAL_MAX_THREAD_IN_GROUP)
maxTotalThreadsPerThreadgroup reference

Regarding whether the warning leads to effects failing, in my limited testing (mabook pro 13" 207 & ipad air 2) it looks like the max threadgroup size value used to issue the warning may differ from the actual number of threads the device is capable of running the given kernel at.

For example, a trivial test kernel I wrote to use 1024 threads (so it always triggers the warning) reported the same max threadgroup size of 256 in the warning on both the macbook & ipad. However, the macbook still happily dispatched the kernel & executed all 1024 threads, where the same kernel failed to execute on the ipad. I'm working on a test project for this & looking to see if this discrepancy has already been reported. If this is the case, that makes kernel development on Metal very much working in the dark as there's no reliable way to determine the actual threadgroup size limits aside from trial and error for all kernels on all target devices, which is untenable.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants