Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support UMA archs (e.g. iOS) #238

Open
darksylinc opened this issue Nov 2, 2021 · 0 comments
Open

Feature Request: Support UMA archs (e.g. iOS) #238

darksylinc opened this issue Nov 2, 2021 · 0 comments
Labels
enhancement New feature or request Metal Issue affects Metal specifically. Don't use this tag if all RenderSystems are affected Vulkan

Comments

@darksylinc
Copy link
Member

darksylinc commented Nov 2, 2021

Note: This is brainstorming

We currently have:

  • BT_IMMUTABLE must be uploaded on creation
  • BT_DEFAULT must use upload() / staging buffers
  • BT_DYNAMIC_* must use map()

iOS shared works like default in terms of speed, but wants access using map.

Intended use case

  • Immediate read access to vertex data (and similar) that is immutable or nearly immutable (GPU side) to perform ray cast computations and other physics stuff
  • Avoid relying on a shadow copy which provides the intended speed benefit because device is memory constraint

Possible solutions:

  • Offer a new BT_TYPE
  • Add a boolean flag to indicate BT_DEFAULT can be mapped
  • Add a function that returns a ptr to the raw memory, e.g. BufferPacked::getRawGpuPtr()

Issue: Synchronization

All writes from CPU are race conditions if the GPU is using it. That's why BT_DYNAMIC 3x the memory and why BT_DEFAULT uses StagingBuffers to upload. Easiest solution is to disallow writes from CPU i.e. getRawGpuPtr returns a const pointer. If the user const casts the pointer and writes to it, it's his responsability.

All writes from GPU are race conditions if the CPU wants to read that data. Writes from GPU:

  • UAV / Compute: suggestion, disallow this feature on UAV buffers
  • BufferPacked::copyTo
  • Uploads via StagingBuffer

Possible solution:

  1. Add BufferPacked::getRawGpuPtr(); returns nullptr if it's not supported or there is a data hazard
  2. Add BufferPacked::isRawGpuAvailable() (name pending). If it returns true; it is safe to call getRawGpuPtr and it will return a valid ptr.
  3. Add BufferPacked::rawGpuBufferAvailableFrame() (name pending)
  4. Add a new BT_TYPE which uses a derived class e.g. MetalSharedBufferInterface. When a GPU write operation is called (e.g. user calls BufferPacked::copyTo); MetalSharedBufferInterface has an extra uint32 to store the current frame i.e. it stores VaoManager::getFrameCount.
    • The reason for a new BT_TYPE and MetalSharedBufferInterface is to avoid the extra uint32 on buffers that don't need it. But since it's only 4 bytes, I'm not feeling too strong about this
  5. The implementation would look like this:
bool BufferPacked::isRawGpuAvailable() const
{
   return vaoManager->isFrameFinished( this->rawGpuBufferAvailableFrame() ) ;
}

VaoManager::waitForSpecificFrameToFinish can be used to wait for the buffer.

  • If the frame counter overflows, a longer-than-usual wait will happen (should we increase the counter to 64 bits?).
  1. The feature could be supported on BT_IMMUTABLE: Immutable buffers could add a boolean instead of a frame idx. Once the first upload has finished GPU side; it is forever accessible and rawGpuBufferAvailableFrame could always return VaoManager::getFrameCount - VaoManager::getDynamicBufferMultiplier.
    • Alternatively instead of a boolean; the buffer could be placed in a list. Once the buffer is out of that list, it means the upload operation is done
    • Or alternatively, it behaves exactly the same as described above (i.e. keep the frame count it was uploaded. If getFrameCount overflows, a small hitch will happen)

Issue: Validation layer

Expect Metal to complain a bit that certain operations can or cannot be done when the buffer type is MTLStorageModeShared, these are often minor fixes

What if user holds on to the returned pointer getRawGpuPtr?

User should call getRawGpuPtr to prevent data races. Otherwise all bets are off. It's not our responsibility anymore.

If the user calls VaoManager::cleanupEmptyPools, the pointer may no longer be valid.

Textures

Textures can't be accessed directly. They don't have linear tiling access. Supporting linear tiling is a mess and we recommend to use TexBufferPacked instead

Metal & Vulkan

Metal & Vulkan can benefit from the same features. As long as the GPU exposes a HOST_VISIBLE_BIT|HOST_COHERENT_BIT|DEVICE_LOCAL_BIT (HOST_CACHED_BIT too?) then the feature can be implemented.

It can benefit Desktop GPUs too. With PCIE resizable bar / SAM, GPU drivers are exposing HOST_VISIBLE_BIT|HOST_COHERENT_BIT|DEVICE_LOCAL_BIT (not cached) memory which lives on the GPU and can be read from CPU (it gets pulled from the PCIE bus).

Though given that it is not cached, users should read from each byte only once (or use memcpy to make a local copy). It should be treated the same way as write combined memory.

When Resizable Bar is not available we should not use that memory because there's only 256 MB, and the driver needs some of that too; and it's too precious.

@darksylinc darksylinc added enhancement New feature or request Metal Issue affects Metal specifically. Don't use this tag if all RenderSystems are affected Vulkan labels Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Metal Issue affects Metal specifically. Don't use this tag if all RenderSystems are affected Vulkan
Projects
None yet
Development

No branches or pull requests

1 participant