C++ app in spack environment on Google cloud HPC with slurm -> illegal instruction. How to correctly build for the target VM. #4687
-
|
Hello, I hope this is the right place to ask, I'm trying to deploy an x ray simulation on a Google cloud HPC cluster with slurm and I got the 2989 illegal instruction (core dumped) error. I used a slightly modified version of the example present in the computing cluster repos which sets up a login and a controller node plus various computing nodes and a debug node. Here is the blueprint: https://github.com/michele-colle/CBCTSim/blob/main/HPCScripts/hpc-slurm.yaml Than on the login node I installed the spack environment (https://github.com/michele-colle/CBCTSim/blob/main/HPC_env_settings/spack.yaml) and build the app with cmake and the appropriate, already present compiler. After some try and error I was able to successfully run a test on the debug node (https://github.com/michele-colle/CBCTSim/blob/main/HPCScripts/test_debug.slurm) Than I proceeded to try out a more intense operation (around 10 minutes work) on a compute node (https://github.com/michele-colle/CBCTSim/blob/main/HPCScripts/job_C2D.slurm) but I got the above error. I am completely new on HPC computing but I struggle to find resources on CPP applications, I suspect it has something to do with the app building process but I am basically lost. Any help is appreciated, thanks for reading:) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
|
I speculate that the problem lies in the fact that the software is compiled on one type of CPU, but executed on another. Typically, a compiler will auto-detect the instruction set available and (potentially) fully utilize that. If the resulting binary is then executed on a different CPU (typically older) it won't work. You can try the following:
In fact, if you want to be absolutely sure, stick to the same Protip, you can actually omit the login node and just use the controller instead. There's only a need for both in very large deployments. |
Beta Was this translation helpful? Give feedback.
-
|
Hi, Closing the ticket for now as it is pending for long, Kindly feel free to re-open if there are any issues. |
Beta Was this translation helpful? Give feedback.
The easiest way to test whether the problem lies in the different architectures is to align the login and compute nodes. Doesn't have to be a fancy and modern VM type, C2 is fine for experimentation.
But to answer your question, lscpu and /proc/cpuinfo are your friends.