-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nep executable error - "no kernel image is available for execution on the device" #576
Comments
You can try to change |
Thank you for your answer. Unfortunately, the error persists. Below I am sending the makefile that was used during the compilation.
… as well as compilation with and without PLUMED and NetCDF, giving the same effect. Moreover, the error message remains the same while using the input files (nep.in, train.xyz, and test.xyz) from the repository (GPUMD/examples/11_NEP_potential_PbTe/). |
Then I guess CUDA code does not work in your platform at all. You can try to compile and run the folloiwng simplest CUDA code: #include <stdio.h>
__global__ void hello_from_gpu()
{
printf("Hello World from the GPU!\n");
}
int main(void)
{
hello_from_gpu<<<1, 1>>>();
cudaDeviceSynchronize();
return 0;
} Save the above code into file |
I am sending the output from the commands after creating the hello.cu file:
It seems that the cudaDeviceSynchronize() command works correctly outside GPUMD on the cluster I use. Unfortunately, I do not know what is the origin of such behaviour (that the command works outside GPUMD, and within it does not). Do you have an idea? |
Being able to compile and run the simplest CUDA code means you have a working CUDA platform. Then did you run
means that your executable was not compiled to target your GPU architecture. However, you showed that you have used |
The error log I reported at the beginning of this issue, was shown after running the "nep" command directly from the command line. This was done in the directory with the input files (nep.in, train.xyz, and test.xyz). I did not use "gpumd" command yet. |
If possible, could you change a platform to test? |
I encountered a similar problem before. Just changed -arch=sm_XX to a smaller number and the problem was solved. |
Thanks for the tip. Unfortunately, in my case, the compilation with a lower number in -arch=sm_XX resulted in the same effect. The tested options were: |
I would like to close this if there is no more discussion. I believe this is a problem related to the CUDA environment instead of GPUMD. |
Hello,
I would like to report an issue I found using GPUMD version 3.9.1.
I was trying to create the first test neuroevolution potential using the “nep” executable on the cluster I use. After preparing the input files (nep.in, test.xyz, and train.xyz) and running the "nep" command, GPUMD gives the information:
Then the nep.in file is read successfully. Later:
With the help of the cluster admins, we checked that the error is caused by the command “CUDA_CHECK_KERNEL”, defined in the utilities/error.cuh as:
The function we think is causing the error is cudaDevicesSynchronize(). However, this command seems to work when we run it outside GPUMD.
Configuration of the Cluster: driver version: 470.129.06, CUDA Version: 11.4., GPU card: Tesla K80. The nvcc compilation with NVHPC 23.3 and CUDA 11.8. gave the same effect.
I do not know how to solve this issue. I would be very grateful for your help!
Kind regards,
Antoni
The text was updated successfully, but these errors were encountered: