Running SPECFEM2D with Slurm #924

tianzeliu · 2018-04-23T05:10:30Z

I was trying to run SPECFEM2D on a computer cluster using Slurm as the system management tool. In this case I need to submit jobs as oppose to running the code directly. Although I could get SPECFEM2D running, it was never able to generate any output. Instead it would keep running until the time assigned by Slurm was up and then crash (the time should be enough for the simulation to finish as I have tested it on my own machine). Is there any idea on why this happened? Thanks a lot.

komatits · 2018-04-23T12:28:12Z

Hi, This is very likely unrelated to SPECFEM (we use Slurm here on several machines, without noticing any problem). You should probably contact your system administrator to ask him/her to check the installation of SLURM (or your submission scripts). Best regards, Dimitri.

…

On 04/23/2018 07:10 AM, tianzeliu wrote: I was trying to run SPECFEM2D on a computer cluster using Slurm as the system management tool. In this case I need to submit jobs as oppose to running the code directly. Although I could get SPECFEM2D running, it was never able to generate any output. Instead it would keep running until the time assigned by Slurm was up and then crash (the time should be enough for the simulation to finish as I have tested it on my own machine). Is there any idea on why this happened? Thanks a lot. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#924>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFjDKfJDDm41lHbLQrzC_RW85GHQbbYRks5trWJHgaJpZM4TfQIk>.

-- Dimitri Komatitsch, CNRS Research Director (DR CNRS) Laboratory of Mechanics and Acoustics, Marseille, France http://komatitsch.free.fr

tianzeliu · 2018-04-23T18:18:11Z

Hi Dimitri,
Thank you for the quick response. Is it possible for you to provide a sample submission script so that I could modify on top of that? Thanks a lot.

Best,
Tianze

komatits · 2018-04-23T21:13:42Z

Hi Tianze, Here is one: #!/bin/bash #SBATCH -J job_name #SBATCH --nodes=2 #SBATCH --ntasks=48 #SBATCH --ntasks-per-node=24 #SBATCH --threads-per-core=1 #SBATCH --mem=1GB #SBATCH --time=00:30:00 #SBATCH --output job_name.output module purge module load intel module load openmpi srun --mpi=pmi2 -K1 --resv-ports -n $SLURM_NTASKS ./my_executable param1 param2 ... Best regards, Dimitri.

…

On 04/23/2018 08:18 PM, tianzeliu wrote: Hi Dimitri, Thank you for the quick response. Is it possible for you to provide a sample submission script so that I could modify on top of that? Thanks a lot. Best, Tianze — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#924 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFjDKQ7JioVEAsUgEbi8lSyeoIm8y3XSks5trhrkgaJpZM4TfQIk>.

-- Dimitri Komatitsch, CNRS Research Director (DR CNRS) Laboratory of Mechanics and Acoustics, Marseille, France http://komatitsch.free.fr

tianzeliu · 2018-04-26T23:59:55Z

Hi Dimitri,
The submission script you provided seemed to work. However, the code now breaks at a fixed point of the simulation, which only happens on the cluster not on my local machine. The only difference between the cluster and the local machine is that the version on the cluster may be newer (I installed it just now). Here is the error message it gives:
Backtrace for this error: #0 0x7F3BEF41E6F7 #1 0x7F3BEF41ED3E #2 0x7F3BEE70726F #3 0x45078D in compute_forces_viscoelastic_ at compute_forces_viscoelastic.F90:465 #4 0x45572E in compute_forces_viscoelastic_main_ at compute_forces_viscoelastic_calling_routine.F90:67 #5 0x49A0D6 in iterate_time_ at iterate_time.F90:165
It does not seem to be a problem with an unstable time scheme, because the CFL number and the suggested minimum time step both look OK. Thanks a lot!

Tianze

komatits · 2018-04-27T00:14:49Z

Hi Tianze, Did you manage to successfully run some of the examples that are in the EXAMPLES directory? If not (i.e. if they also fail), then the problem is very likely specific to the cluster you use, i.e. that cluster has an installation problem. If you managed to run several of the examples successfully then please let me know and we will investigate the particular case that crashes. Thank you, Best regards, Dimitri.

…

On 04/27/2018 01:59 AM, tianzeliu wrote: Hi Dimitri, The submission script you provided seemed to work. However, the code now breaks at a fixed point of the simulation, which only happens on the cluster not on my local machine. The only difference between the cluster and the local machine is that the version on the cluster may be newer (I installed it just now). Here is the error message it gives: |Backtrace for this error: #0 0x7F3BEF41E6F7 #1 0x7F3BEF41ED3E #2 0x7F3BEE70726F #3 0x45078D in compute_forces_viscoelastic_ at compute_forces_viscoelastic.F90:465 #4 0x45572E in compute_forces_viscoelastic_main_ at compute_forces_viscoelastic_calling_routine.F90:67 #5 0x49A0D6 in iterate_time_ at iterate_time.F90:165| It does not seem to be a problem with an unstable time scheme, because the CFL number and the suggested minimum time step both look OK. Thanks a lot! Tianze — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#924 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFjDKdy_FcVyTrOE4hpPyhaqMudmObJeks5tsl98gaJpZM4TfQIk>.

-- Dimitri Komatitsch, CNRS Research Director (DR CNRS) Laboratory of Mechanics and Acoustics, Marseille, France http://komatitsch.free.fr

tianzeliu · 2018-04-27T02:11:20Z

Hi Dimitri,
I have tried two examples and they ran successfully, so I guess it is something wrong with my input. I am attaching the parameter files I use. The code seemed to break when the wave front hit the first interface. Thanks a lot!
interfaces.txt
Par_file.txt
SOURCE.txt

Best,
Tianze

tianzeliu · 2018-05-01T22:06:43Z

Hi Dimitri,
I increased the size of spectral element in both X and Z dimension by a factor of 2 while fixing the time step size and the code ran successfully. So I guess the previous problem was indeed caused by an unstable numeric scheme, though the CFL number appeared to be stable.
Thank you for your help anyway!

Best,
Tianze

komatits closed this as completed May 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running SPECFEM2D with Slurm #924

Running SPECFEM2D with Slurm #924

tianzeliu commented Apr 23, 2018

komatits commented Apr 23, 2018 via email

tianzeliu commented Apr 23, 2018

komatits commented Apr 23, 2018 via email

tianzeliu commented Apr 26, 2018

komatits commented Apr 27, 2018 via email

tianzeliu commented Apr 27, 2018 •

edited

Loading

tianzeliu commented May 1, 2018

Running SPECFEM2D with Slurm #924

Running SPECFEM2D with Slurm #924

Comments

tianzeliu commented Apr 23, 2018

komatits commented Apr 23, 2018 via email

tianzeliu commented Apr 23, 2018

komatits commented Apr 23, 2018 via email

tianzeliu commented Apr 26, 2018

komatits commented Apr 27, 2018 via email

tianzeliu commented Apr 27, 2018 • edited Loading

tianzeliu commented May 1, 2018

tianzeliu commented Apr 27, 2018 •

edited

Loading