-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
limits file not found #2049
Comments
@Epliz Thanks for reporting. It seems like you do not have the binary kernel cache package installed. This is HIPRTC-specific issue. naive_conv.cpp needs to be fixed like this: More details can be found at #627. [Attribution] @junliume @johnny-keker https://github.com/ROCmSoftwarePlatform/MIOpen/labels/bug https://github.com/ROCmSoftwarePlatform/MIOpen/labels/urgency_normal, Proposed assignee: @carlushuang |
thank you for looking into this.
I have |
@Epliz Do you use MI100 with 120 compute units? |
Yes, as shown by the rocminfo output:
I typically disable the RX 6700XT that I use for display output whenever using tensorflow by setting |
This is good, because MIOpen currently requires all GPUs to be identical. |
No, you seem to have done everything right. But the library should read the naive kernel from the precompiled binary cache because you have it installed. I am wondering why it even tries to build it. Can you attach a text file with log captured with |
I can't reproduce anymore even after installing the libstdc++12 package, so I think it was most likely a mistake on my side. I probably had not yet installed properly the kernels. If you are interested in seeing the logs, I am still attaching them. But they seem to show that the kernel database was found properly. |
@Epliz Please remove user kernel cache and try (.ukdb files somewhere in |
@Epliz
This is user kernel cache (I've asked you to remove):
|
Here is with deleting the cache before |
Yes, if anything it was most likely due to not having the package of kernels installed. For that matter, I think users would appreciate if the |
@Epliz Thanks, I think we already have this feature in our backlog and the ticket is not necessary. |
The pr for the fix has been merged, so closing the ticket. Thank you for your quick action and your help was much appreciated! |
Latest nightly of torch on Fedora 38 seems still affected.
shows
on use. |
@Mershl could you help to provide more detailed log? Could you provide system info (which GPU) and try if the above suggestions might help? i.e. install KDB, and/or install the libstdc++-12-dev on ubuntu 22.04.2 LTS. CC: @jeffdaily since this is PT wheel nightly related issue. PR #2050 was not included in ROCm 5.5, and maybe we should. |
Providing the Interesting though that I can now remove EDIT: it fails again when using previously never used features of |
Built kernels are cached at |
Hi,
When running the "AI Benchmark" form https://ai-benchmark.com/ranking_deeplearning.html for the first time, I got a crash with the following info:
It seems like I resolved it by installing the
libstdc++-12-dev
on ubuntu 22.04.2 LTS .I have rocm 5.4.3 installed through packages.
I guess one of your packages should declare that one or the right one as dependency.
Best regards,
Epliz
The text was updated successfully, but these errors were encountered: