You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am very interested in your OSDI21 work. However, I noticed that you used __fmaf_rn in your repo. This is a fast-math intrinsics according to documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#intrinsic-functions, while I observed that usually nvcc will heavily emit multiply-add instruction even though expressions are written in naive form and no such intrinsics are used. I am not sure if using this intrinsic aligns with the baseline and how this intrinsic could help you achieve your goal. Could you explain them to me? Thank you.
The text was updated successfully, but these errors were encountered:
Thanks for your interest in our project. Here are my current findings to your questions.
One of the major reason for not using Fast-Math Intrinsics is its reduced accuracy precision. However, based on our validation compared our kernel against the kernels without applying those Intrinsics and the standard graphConv kernel from DGL, we notice very minor (less than 10^-5) to none (i.e., exactly match) output differences depending on the input graphs.
Hi, I am very interested in your OSDI21 work. However, I noticed that you used
__fmaf_rn
in your repo. This is a fast-math intrinsics according to documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#intrinsic-functions, while I observed that usuallynvcc
will heavily emit multiply-add instruction even though expressions are written in naive form and no such intrinsics are used. I am not sure if using this intrinsic aligns with the baseline and how this intrinsic could help you achieve your goal. Could you explain them to me? Thank you.The text was updated successfully, but these errors were encountered: