user managed staging memory buffer in transferDataFromCPU() and transferDataToCPU() #129

anarkiwi · 2023-08-22T02:54:28Z

Thanks so much for VkFFT!

I observe an overhead in these two functions, calling allocateBuffer() for staging memory and then free/destroying it with each call.

Would it be possible to have a variation, where the caller supplies their own staging buffer (that is then remapped as necessary within transferData())? I have experimented with this on Pi4 specifically, and it makes a significant difference allocating the staging memory buffer once on application startup.

Thanks,

DTolm · 2023-08-22T07:36:52Z

Hello, thanks for using VkFFT!

Sure, I will add an option to provide a user-defined staging buffer in VkGPU and in VkFFTApplication. However, transferDataFromCPU and transferDataToCPU are not part of the VkFFT - they are simple tools in the benchmark suite that were not meant to be used in any production codes, just as a reference to how simply transfer data.

Best regards,
Dmitrii

-Added double-double support in VkFFT. Requires cpu initialization in full quad precision, so only supports gcc for now. Potentially possible to add full FP128 support or some other FP128 library (like mpir) in the future. -Data has to be stored in double-double before VkFFT kernels calls (no fp128<->double-double conversion on the GPU yet). -Full 1e-32 precision, but same range as FP64. See Library for Double-Double and Quad-Double Arithmetic by Y Hida for more information on double-double. -Reuqires FMA contraction to be disabled (due to ab-cd contraction rounding mismatch). Doesn't work on Vulkan as I haven't found how to do that yet. -Fixed warnings (#138) -Added proper check for app to be zero before initializeVkFFT call and zeroing on deletion (#134) -Added an option to provide staging buffer in application and VkGPU handle (#129) -Added guards for build type (#128) -Fixed missing deallocation calls for the inverse Bluestein axes. Fixed the buffer layout size in Vulkan in some cases. -Refactored the code generator and container struct layout for better handling complex numbers (-5k loc). -Added more precision tests and benchmarks. -Will be merged in the main branch after more testing and update to the documentation.

DTolm · 2023-09-25T21:26:54Z

Hello,

I have added an option to provide the staging buffer. I haven't tested it thoroughly yet and haven't updated documentation about it, but it should be pretty straightforward (just provide the staging buffer and memory pointers to the configuration).

Best regards,
Dmitrii

anarkiwi · 2023-09-28T01:02:00Z

That's really nice - thanks!

-Added double-double support in VkFFT. Requires cpu initialization in full quad precision, so only supports gcc with quadmath dependency for now. Potentially possible to add full FP128 support or some other FP128 library (like mpir) in the future. -Data has to be stored in double-double before VkFFT kernels calls (no fp128<->double-double conversion on the GPU yet). -Full 1e-32 precision, but same range as FP64. See Library for Double-Double and Quad-Double Arithmetic by Y Hida for more information on double-double. -Double-double requires FMA contraction to be disabled (due to ab-cd contraction rounding mismatch). Doesn't work on Vulkan as I haven't found how to do that yet. -Added DST I-IV support. -Fixed warnings (#138) -Added proper check for app to be zero before initializeVkFFT call and zeroing on deletion (#134) -Added an option to provide a staging buffer in the application and VkGPU handle (#129) -Added guards for build type (#128) -Changed default innermost stride for real buffers in out-of-place R2C from size[0]+2 to size[0] (#139) -Allow specifying glslang version (#135) -Improved instruction count and accuracy for radix-7. -Fixed missing deallocation calls for the inverse Bluestein axes. Fixed the buffer layout size in Vulkan in some cases. -Refactored the code generator and container struct layout for better handling complex numbers (-5k loc). -Added more precision tests and benchmarks.

DTolm · 2023-10-23T11:34:21Z

It should be working now, if you have any other improvement ideas about the staging buffer - feel free to reopen the issue!

DTolm mentioned this issue Oct 23, 2023

VkFFT v1.3.2 #141

Merged

DTolm closed this as completed Oct 23, 2023

DTolm mentioned this issue Nov 7, 2023

VkFFT v1.3.2 release #143

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

user managed staging memory buffer in transferDataFromCPU() and transferDataToCPU() #129

user managed staging memory buffer in transferDataFromCPU() and transferDataToCPU() #129

anarkiwi commented Aug 22, 2023

DTolm commented Aug 22, 2023

DTolm commented Sep 25, 2023

anarkiwi commented Sep 28, 2023

DTolm commented Oct 23, 2023

user managed staging memory buffer in transferDataFromCPU() and transferDataToCPU() #129

user managed staging memory buffer in transferDataFromCPU() and transferDataToCPU() #129

Comments

anarkiwi commented Aug 22, 2023

DTolm commented Aug 22, 2023

DTolm commented Sep 25, 2023

anarkiwi commented Sep 28, 2023

DTolm commented Oct 23, 2023