Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{2023.06}[2023a] Modflow 6.4.4 and deps #522

Merged
merged 6 commits into from
Apr 5, 2024

Conversation

casparvl
Copy link
Collaborator

@casparvl casparvl commented Mar 28, 2024

Depends on:

5 out of 72 required modules missing:

* ParMETIS/4.0.3-gompi-2023a (ParMETIS-4.0.3-gompi-2023a.eb)
* Hypre/2.29.0-foss-2023a (Hypre-2.29.0-foss-2023a.eb)
* SuperLU_DIST/8.1.2-foss-2023a (SuperLU_DIST-8.1.2-foss-2023a.eb)
* SuiteSparse/7.1.0-foss-2023a (SuiteSparse-7.1.0-foss-2023a.eb)
* PETSc/3.20.3-foss-2023a (PETSc-3.20.3-foss-2023a.eb)

Copy link

eessi-bot-aws bot commented Mar 28, 2024

Instance eessi-bot-mc-aws is configured to build:

  • arch x86_64/generic for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/generic for repo eessi-hpc.org-2023.06-software
  • arch x86_64/generic for repo eessi.io-2023.06-compat
  • arch x86_64/generic for repo eessi.io-2023.06-software
  • arch x86_64/intel/haswell for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi-hpc.org-2023.06-software
  • arch x86_64/intel/haswell for repo eessi.io-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi.io-2023.06-software
  • arch x86_64/intel/skylake_avx512 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/intel/skylake_avx512 for repo eessi.io-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi.io-2023.06-software
  • arch x86_64/amd/zen2 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen2 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi.io-2023.06-software
  • arch x86_64/amd/zen3 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen3 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi.io-2023.06-software
  • arch aarch64/generic for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/generic for repo eessi-hpc.org-2023.06-software
  • arch aarch64/generic for repo eessi.io-2023.06-compat
  • arch aarch64/generic for repo eessi.io-2023.06-software
  • arch aarch64/neoverse_n1 for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi-hpc.org-2023.06-software
  • arch aarch64/neoverse_n1 for repo eessi.io-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi.io-2023.06-software
  • arch aarch64/neoverse_v1 for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi-hpc.org-2023.06-software
  • arch aarch64/neoverse_v1 for repo eessi.io-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi.io-2023.06-software

@laraPPr
Copy link
Collaborator

laraPPr commented Apr 2, 2024

@casparvl #521 is merged can we than start building this one?

@casparvl
Copy link
Collaborator Author

casparvl commented Apr 2, 2024

Yes, once we can use the bot again... :)

@casparvl
Copy link
Collaborator Author

casparvl commented Apr 2, 2024

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3

Copy link

eessi-bot-aws bot commented Apr 2, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 resulted in:

Copy link

eessi-bot-aws bot commented Apr 2, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_522/8917

date job status comment
Apr 02 14:29:52 UTC 2024 submitted job id 8917 awaits release by job manager
Apr 02 14:30:29 UTC 2024 released job awaits launch by Slurm scheduler
Apr 02 14:31:31 UTC 2024 running job 8917 is running
Apr 02 17:57:07 UTC 2024 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job8917.result does not exist in job directory or reading it failed.
  • No artefacts were found/reported.
Apr 02 17:57:07 UTC 2024 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job8917.test does not exist in job directory or reading it failed.

edit (by @boegel): I cancelled job 8917, tests were hanging

@boegel boegel added the 2023.06-software.eessi.io 2023.06 version of software.eessi.io label Apr 2, 2024
@casparvl
Copy link
Collaborator Author

casparvl commented Apr 2, 2024

bot: build repo:eessi.io-2023.06-software arch:aarch64/neoverse_v1

Copy link

eessi-bot-aws bot commented Apr 2, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:aarch64/neoverse_v1 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_v1
  • handling command build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_v1 resulted in:

Copy link

eessi-bot-aws bot commented Apr 2, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-neoverse_v1 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_522/8926

date job status comment
Apr 02 16:39:17 UTC 2024 submitted job id 8926 awaits release by job manager
Apr 02 16:39:29 UTC 2024 released job awaits launch by Slurm scheduler
Apr 02 16:40:31 UTC 2024 running job 8926 is running
Apr 02 17:41:15 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-8926.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_v1-1712079242.tar.gzsize: 184 MiB (193937066 bytes)
entries: 8295
modules under 2023.06/software/linux/aarch64/neoverse_v1/modules/all
Hypre/2.29.0-foss-2023a.lua
MODFLOW/6.4.4-foss-2023a.lua
netCDF-Fortran/4.6.1-gompi-2023a.lua
ParMETIS/4.0.3-gompi-2023a.lua
PETSc/3.20.3-foss-2023a.lua
SuiteSparse/7.1.0-foss-2023a.lua
SuperLU_DIST/8.1.2-foss-2023a.lua
software under 2023.06/software/linux/aarch64/neoverse_v1/software
Hypre/2.29.0-foss-2023a
MODFLOW/6.4.4-foss-2023a
netCDF-Fortran/4.6.1-gompi-2023a
ParMETIS/4.0.3-gompi-2023a
PETSc/3.20.3-foss-2023a
SuiteSparse/7.1.0-foss-2023a
SuperLU_DIST/8.1.2-foss-2023a
other under 2023.06/software/linux/aarch64/neoverse_v1
no other files in tarball
Apr 02 17:41:15 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-8926.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Apr 05 07:03:53 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-aarch64-neoverse_v1-1712079242.tar.gz to S3 bucket succeeded

@casparvl
Copy link
Collaborator Author

casparvl commented Apr 2, 2024

bot: build repo:eessi.io-2023.06-software arch:x86_64/generic
bot: build repo:eessi.io-2023.06-software arch:x86_64/intel/haswell
bot: build repo:eessi.io-2023.06-software arch:x86_64/intel/skylake_avx512
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3
bot: build repo:eessi.io-2023.06-software arch:aarch64/generic
bot: build repo:eessi.io-2023.06-software arch:aarch64/neoverse_n1

Copy link

eessi-bot-aws bot commented Apr 2, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/intel/haswell from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/intel/haswell
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/intel/skylake_avx512 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/intel/skylake_avx512
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3
  • received bot command build repo:eessi.io-2023.06-software arch:aarch64/generic from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:aarch64/generic
  • received bot command build repo:eessi.io-2023.06-software arch:aarch64/neoverse_n1 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_n1
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/intel/haswell resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/intel/skylake_avx512 resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:aarch64/generic resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_n1 resulted in:

Copy link

eessi-bot-aws bot commented Apr 2, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_522/8927

date job status comment
Apr 02 17:49:02 UTC 2024 submitted job id 8927 awaits release by job manager
Apr 02 17:49:29 UTC 2024 released job awaits launch by Slurm scheduler
Apr 02 17:54:49 UTC 2024 running job 8927 is running
Apr 02 19:37:52 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-8927.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-1712085663.tar.gzsize: 198 MiB (207959115 bytes)
entries: 8295
modules under 2023.06/software/linux/x86_64/generic/modules/all
Hypre/2.29.0-foss-2023a.lua
MODFLOW/6.4.4-foss-2023a.lua
netCDF-Fortran/4.6.1-gompi-2023a.lua
ParMETIS/4.0.3-gompi-2023a.lua
PETSc/3.20.3-foss-2023a.lua
SuiteSparse/7.1.0-foss-2023a.lua
SuperLU_DIST/8.1.2-foss-2023a.lua
software under 2023.06/software/linux/x86_64/generic/software
Hypre/2.29.0-foss-2023a
MODFLOW/6.4.4-foss-2023a
netCDF-Fortran/4.6.1-gompi-2023a
ParMETIS/4.0.3-gompi-2023a
PETSc/3.20.3-foss-2023a
SuiteSparse/7.1.0-foss-2023a
SuperLU_DIST/8.1.2-foss-2023a
other under 2023.06/software/linux/x86_64/generic
no other files in tarball
Apr 02 19:37:52 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-8927.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Apr 05 07:04:13 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-generic-1712085663.tar.gz to S3 bucket succeeded

Copy link

eessi-bot-aws bot commented Apr 2, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-intel-haswell for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_522/8928

date job status comment
Apr 02 17:49:05 UTC 2024 submitted job id 8928 awaits release by job manager
Apr 02 17:49:31 UTC 2024 released job awaits launch by Slurm scheduler
Apr 02 17:54:51 UTC 2024 running job 8928 is running
Apr 03 00:43:45 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-8928.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-haswell-1712104047.tar.gzsize: 211 MiB (221633809 bytes)
entries: 8295
modules under 2023.06/software/linux/x86_64/intel/haswell/modules/all
Hypre/2.29.0-foss-2023a.lua
MODFLOW/6.4.4-foss-2023a.lua
netCDF-Fortran/4.6.1-gompi-2023a.lua
ParMETIS/4.0.3-gompi-2023a.lua
PETSc/3.20.3-foss-2023a.lua
SuiteSparse/7.1.0-foss-2023a.lua
SuperLU_DIST/8.1.2-foss-2023a.lua
software under 2023.06/software/linux/x86_64/intel/haswell/software
Hypre/2.29.0-foss-2023a
MODFLOW/6.4.4-foss-2023a
netCDF-Fortran/4.6.1-gompi-2023a
ParMETIS/4.0.3-gompi-2023a
PETSc/3.20.3-foss-2023a
SuiteSparse/7.1.0-foss-2023a
SuperLU_DIST/8.1.2-foss-2023a
other under 2023.06/software/linux/x86_64/intel/haswell
no other files in tarball
Apr 03 00:43:45 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-8928.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot-aws bot commented Apr 2, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-intel-skylake_avx512 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_522/8929

date job status comment
Apr 02 17:49:09 UTC 2024 submitted job id 8929 awaits release by job manager
Apr 02 17:49:33 UTC 2024 released job awaits launch by Slurm scheduler
Apr 02 17:55:59 UTC 2024 running job 8929 is running
Apr 02 19:31:34 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-8929.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-skylake_avx512-1712085324.tar.gzsize: 214 MiB (225269992 bytes)
entries: 8295
modules under 2023.06/software/linux/x86_64/intel/skylake_avx512/modules/all
Hypre/2.29.0-foss-2023a.lua
MODFLOW/6.4.4-foss-2023a.lua
netCDF-Fortran/4.6.1-gompi-2023a.lua
ParMETIS/4.0.3-gompi-2023a.lua
PETSc/3.20.3-foss-2023a.lua
SuiteSparse/7.1.0-foss-2023a.lua
SuperLU_DIST/8.1.2-foss-2023a.lua
software under 2023.06/software/linux/x86_64/intel/skylake_avx512/software
Hypre/2.29.0-foss-2023a
MODFLOW/6.4.4-foss-2023a
netCDF-Fortran/4.6.1-gompi-2023a
ParMETIS/4.0.3-gompi-2023a
PETSc/3.20.3-foss-2023a
SuiteSparse/7.1.0-foss-2023a
SuperLU_DIST/8.1.2-foss-2023a
other under 2023.06/software/linux/x86_64/intel/skylake_avx512
no other files in tarball
Apr 02 19:31:34 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-8929.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Apr 05 07:04:58 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-intel-skylake_avx512-1712085324.tar.gz to S3 bucket succeeded

Copy link

eessi-bot-aws bot commented Apr 2, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_522/8930

date job status comment
Apr 02 17:49:12 UTC 2024 submitted job id 8930 awaits release by job manager
Apr 02 17:49:25 UTC 2024 released job awaits launch by Slurm scheduler
Apr 02 17:54:44 UTC 2024 running job 8930 is running
Apr 02 19:50:15 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-8930.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1712086252.tar.gzsize: 210 MiB (221223211 bytes)
entries: 8295
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
Hypre/2.29.0-foss-2023a.lua
MODFLOW/6.4.4-foss-2023a.lua
netCDF-Fortran/4.6.1-gompi-2023a.lua
ParMETIS/4.0.3-gompi-2023a.lua
PETSc/3.20.3-foss-2023a.lua
SuiteSparse/7.1.0-foss-2023a.lua
SuperLU_DIST/8.1.2-foss-2023a.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
Hypre/2.29.0-foss-2023a
MODFLOW/6.4.4-foss-2023a
netCDF-Fortran/4.6.1-gompi-2023a
ParMETIS/4.0.3-gompi-2023a
PETSc/3.20.3-foss-2023a
SuiteSparse/7.1.0-foss-2023a
SuperLU_DIST/8.1.2-foss-2023a
other under 2023.06/software/linux/x86_64/amd/zen2
no other files in tarball
Apr 02 19:50:15 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-8930.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Apr 05 07:05:20 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-1712086252.tar.gz to S3 bucket succeeded

Copy link

eessi-bot-aws bot commented Apr 2, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_522/8931

date job status comment
Apr 02 17:49:16 UTC 2024 submitted job id 8931 awaits release by job manager
Apr 02 17:49:27 UTC 2024 released job awaits launch by Slurm scheduler
Apr 02 17:54:47 UTC 2024 running job 8931 is running
Apr 02 19:18:42 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-8931.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1712084795.tar.gzsize: 211 MiB (221721045 bytes)
entries: 8295
modules under 2023.06/software/linux/x86_64/amd/zen3/modules/all
Hypre/2.29.0-foss-2023a.lua
MODFLOW/6.4.4-foss-2023a.lua
netCDF-Fortran/4.6.1-gompi-2023a.lua
ParMETIS/4.0.3-gompi-2023a.lua
PETSc/3.20.3-foss-2023a.lua
SuiteSparse/7.1.0-foss-2023a.lua
SuperLU_DIST/8.1.2-foss-2023a.lua
software under 2023.06/software/linux/x86_64/amd/zen3/software
Hypre/2.29.0-foss-2023a
MODFLOW/6.4.4-foss-2023a
netCDF-Fortran/4.6.1-gompi-2023a
ParMETIS/4.0.3-gompi-2023a
PETSc/3.20.3-foss-2023a
SuiteSparse/7.1.0-foss-2023a
SuperLU_DIST/8.1.2-foss-2023a
other under 2023.06/software/linux/x86_64/amd/zen3
no other files in tarball
Apr 02 19:18:42 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-8931.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Apr 05 07:05:42 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen3-1712084795.tar.gz to S3 bucket succeeded

Copy link

eessi-bot-aws bot commented Apr 2, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_522/8932

date job status comment
Apr 02 17:49:20 UTC 2024 submitted job id 8932 awaits release by job manager
Apr 02 17:49:24 UTC 2024 released job awaits launch by Slurm scheduler
Apr 02 17:53:40 UTC 2024 running job 8932 is running
Apr 02 19:21:58 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-8932.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1712085064.tar.gzsize: 189 MiB (198676837 bytes)
entries: 8295
modules under 2023.06/software/linux/aarch64/generic/modules/all
Hypre/2.29.0-foss-2023a.lua
MODFLOW/6.4.4-foss-2023a.lua
netCDF-Fortran/4.6.1-gompi-2023a.lua
ParMETIS/4.0.3-gompi-2023a.lua
PETSc/3.20.3-foss-2023a.lua
SuiteSparse/7.1.0-foss-2023a.lua
SuperLU_DIST/8.1.2-foss-2023a.lua
software under 2023.06/software/linux/aarch64/generic/software
Hypre/2.29.0-foss-2023a
MODFLOW/6.4.4-foss-2023a
netCDF-Fortran/4.6.1-gompi-2023a
ParMETIS/4.0.3-gompi-2023a
PETSc/3.20.3-foss-2023a
SuiteSparse/7.1.0-foss-2023a
SuperLU_DIST/8.1.2-foss-2023a
other under 2023.06/software/linux/aarch64/generic
no other files in tarball
Apr 02 19:21:58 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-8932.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Apr 05 07:06:03 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-aarch64-generic-1712085064.tar.gz to S3 bucket succeeded

Copy link

eessi-bot-aws bot commented Apr 2, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-neoverse_n1 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_522/8933

date job status comment
Apr 02 17:49:23 UTC 2024 submitted job id 8933 awaits release by job manager
Apr 02 17:50:35 UTC 2024 released job awaits launch by Slurm scheduler
Apr 02 17:57:02 UTC 2024 running job 8933 is running
Apr 02 19:23:03 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-8933.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_n1-1712085202.tar.gzsize: 190 MiB (199331750 bytes)
entries: 8295
modules under 2023.06/software/linux/aarch64/neoverse_n1/modules/all
Hypre/2.29.0-foss-2023a.lua
MODFLOW/6.4.4-foss-2023a.lua
netCDF-Fortran/4.6.1-gompi-2023a.lua
ParMETIS/4.0.3-gompi-2023a.lua
PETSc/3.20.3-foss-2023a.lua
SuiteSparse/7.1.0-foss-2023a.lua
SuperLU_DIST/8.1.2-foss-2023a.lua
software under 2023.06/software/linux/aarch64/neoverse_n1/software
Hypre/2.29.0-foss-2023a
MODFLOW/6.4.4-foss-2023a
netCDF-Fortran/4.6.1-gompi-2023a
ParMETIS/4.0.3-gompi-2023a
PETSc/3.20.3-foss-2023a
SuiteSparse/7.1.0-foss-2023a
SuperLU_DIST/8.1.2-foss-2023a
other under 2023.06/software/linux/aarch64/neoverse_n1
no other files in tarball
Apr 02 19:23:03 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-8933.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Apr 05 07:06:26 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-aarch64-neoverse_n1-1712085202.tar.gz to S3 bucket succeeded

@casparvl
Copy link
Collaborator Author

casparvl commented Apr 2, 2024

#522 (comment) was hanging in one of the tests from PETSc:

bot       217607  0.1  0.1  66592 62696 ?        S    15:04   0:13          |       |   |                       \_ make -j 16 test
bot       477241  0.0  0.0   7996  4300 ?        S    15:29   0:00          |       |   |                           \_ bash arch-linux-c-opt/tests/ksp/ksp/tutorials/runex71_bddc_elast.sh
bot       477306  0.0  0.0 158588 18972 ?        Sl   15:29   0:00          |       |   |                               \_ /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/OpenMPI/4.1.5-GCC-12.3.0/bin/mpiexec --oversubscribe -n 8 ../ex71 -petsc_ci -pde_type Elasticity -cells 7,9,8 -dim 3 -ksp_view -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged -pc_bddc_monolithic
bot       477344  0.0  0.0      0     0 ?        Z    15:29   0:00          |       |   |                                   \_ [ex71] <defunct>
bot       477345  0.0  0.0      0     0 ?        Z    15:29   0:00          |       |   |                                   \_ [ex71] <defunct>
bot       477347  0.0  0.0      0     0 ?        Z    15:29   0:00          |       |   |                                   \_ [ex71] <defunct>
bot       477349  0.0  0.0      0     0 ?        Z    15:29   0:00          |       |   |                                   \_ [ex71] <defunct>
bot       477350  0.0  0.0      0     0 ?        Z    15:29   0:00          |       |   |                                   \_ [ex71] <defunct>
bot       477353  0.0  0.0      0     0 ?        Z    15:29   0:00          |       |   |                                   \_ [ex71] <defunct>
bot       477354  0.0  0.0      0     0 ?        Z    15:29   0:00          |       |   |                                   \_ [ex71] <defunct>
bot       477355  0.0  0.0      0     0 ?        Z    15:29   0:00          |       |   |                                   \_ [ex71] <defunct>

However, a build on neoverse_v1 completed just fine. @boegel killed this one, since it was going nowhere.

I've triggered builds for all of the other architectures, including a rebuild on zen3 to see if this was an incident.

@bedroge bedroge changed the title Added Modflow 6.4.4 and deps {2023.06}[2023a] Modflow 6.4.4 and deps Apr 3, 2024
@bedroge
Copy link
Collaborator

bedroge commented Apr 4, 2024

I had a quick look at the build logs, as I was curious why the haswell build took so much longer than the others. It turns out that the both the build and test step of PETSc contain a lot of tests, and though quite a lot fail for all targets, the situation on haswell is much worse due to timeouts.

# x86_64 generic

$ bzgrep "not ok" x86_64/generic/software/PETSc/3.20.3-foss-2023a/easybuild/easybuild-PETSc-3.20.3-20240402.191644.log.bz2  | wc -l
564

$ bzgrep "Exceeded timeout limit" x86_64/generic/software/PETSc/3.20.3-foss-2023a/easybuild/easybuild-PETSc-3.20.3-20240402.191644.log.bz2  | wc -l
40

# haswell

$ bzgrep "not ok" x86_64/intel/haswell/software/PETSc/3.20.3-foss-2023a/easybuild/easybuild-PETSc-3.20.3-20240403.002309.log.bz2  | wc -l
4836
$ bzgrep "Exceeded timeout limit" x86_64/intel/haswell/software/PETSc/3.20.3-foss-2023a/easybuild/easybuild-PETSc-3.20.3-20240403.002309.log.bz2 | wc -l
4824

So something doesn't seem right here. I suspect that it may be related to the MPI issues on haswell that we've been observing for other PRs/builds as well.

@casparvl
Copy link
Collaborator Author

casparvl commented Apr 4, 2024

Just to log this: as discussed on Slack, Bob will manually put an LMOD hook in place for the bot in it's host_injections.

local hook = require("Hook")

local function eessi_bot_libfabric_set_psm3_devices_hook(t)
    local simpleName = string.match(t.modFullName, "(.-)/")
    -- we may want to be more specific in the future, and only do this for specific versions of libfabric
    if simpleName == 'libfabric' then
        -- set environment variables PSM3_DEVICES as workaround for MPI applications hanging in libfabric's PSM3 provider
        -- crf. https://github.com/easybuilders/easybuild-easyconfigs/issues/18925
        setenv('PSM3_DEVICES', 'self,shm')
    end
end

-- combine all load hook functions into a single one
function site_specific_load_hook(t)
    eessi_bot_libfabric_set_psm3_devices_hook(t)
end

local function combined_load_hook(t)
    -- Assuming this was called from EESSI's SitePackage.lua, this should be defined and thus run
    if eessi_load_hook ~= nil then
        eessi_load_hook(t)
    end
    site_specific_load_hook(t)
end

hook.register("load", combined_load_hook)

Essentially this mimics what was done on generoso, the EasyBuild test infra according to easybuilders/easybuild-easyconfigs#18925. This should hopefully solve this hang.

edit (@boegel): see also #531

@bedroge
Copy link
Collaborator

bedroge commented Apr 4, 2024

bot: build repo:eessi.io-2023.06-software arch:x86_64/intel/haswell

Copy link

eessi-bot-aws bot commented Apr 4, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/intel/haswell from bedroge
    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/intel/haswell

Copy link

eessi-bot-aws bot commented Apr 4, 2024

error: patch failed: easystacks/software.eessi.io/2023.06/eessi-2023.06-eb-4.9.0-2023a.yml:54 error: easystacks/software.eessi.io/2023.06/eessi-2023.06-eb-4.9.0-2023a.yml: patch does not apply Unable to download or merge changes between the source branch and the destination branch.Tip: This can usually be resolved by syncing your branch and resolving any merge conflicts.

@bedroge
Copy link
Collaborator

bedroge commented Apr 4, 2024

bot: build repo:eessi.io-2023.06-software arch:x86_64/intel/haswell

Copy link

eessi-bot-aws bot commented Apr 4, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/intel/haswell from bedroge

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/intel/haswell
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/intel/haswell resulted in:

Copy link

eessi-bot-aws bot commented Apr 4, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-intel-haswell for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.04/pr_522/9051

date job status comment
Apr 04 21:08:37 UTC 2024 submitted job id 9051 awaits release by job manager
Apr 04 21:08:50 UTC 2024 released job awaits launch by Slurm scheduler
Apr 04 21:13:25 UTC 2024 running job 9051 is running
Apr 04 23:00:01 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-9051.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-haswell-1712270588.tar.gzsize: 211 MiB (221628103 bytes)
entries: 8295
modules under 2023.06/software/linux/x86_64/intel/haswell/modules/all
Hypre/2.29.0-foss-2023a.lua
MODFLOW/6.4.4-foss-2023a.lua
netCDF-Fortran/4.6.1-gompi-2023a.lua
ParMETIS/4.0.3-gompi-2023a.lua
PETSc/3.20.3-foss-2023a.lua
SuiteSparse/7.1.0-foss-2023a.lua
SuperLU_DIST/8.1.2-foss-2023a.lua
software under 2023.06/software/linux/x86_64/intel/haswell/software
Hypre/2.29.0-foss-2023a
MODFLOW/6.4.4-foss-2023a
netCDF-Fortran/4.6.1-gompi-2023a
ParMETIS/4.0.3-gompi-2023a
PETSc/3.20.3-foss-2023a
SuiteSparse/7.1.0-foss-2023a
SuperLU_DIST/8.1.2-foss-2023a
other under 2023.06/software/linux/x86_64/intel/haswell
no other files in tarball
Apr 04 23:00:01 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-9051.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Apr 05 07:04:36 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-intel-haswell-1712270588.tar.gz to S3 bucket succeeded

@bedroge
Copy link
Collaborator

bedroge commented Apr 5, 2024

I had a quick look at the build logs, as I was curious why the haswell build took so much longer than the others. It turns out that the both the build and test step of PETSc contain a lot of tests, and though quite a lot fail for all targets, the situation on haswell is much worse due to timeouts.

# x86_64 generic

$ bzgrep "not ok" x86_64/generic/software/PETSc/3.20.3-foss-2023a/easybuild/easybuild-PETSc-3.20.3-20240402.191644.log.bz2  | wc -l
564

$ bzgrep "Exceeded timeout limit" x86_64/generic/software/PETSc/3.20.3-foss-2023a/easybuild/easybuild-PETSc-3.20.3-20240402.191644.log.bz2  | wc -l
40

# haswell

$ bzgrep "not ok" x86_64/intel/haswell/software/PETSc/3.20.3-foss-2023a/easybuild/easybuild-PETSc-3.20.3-20240403.002309.log.bz2  | wc -l
4836
$ bzgrep "Exceeded timeout limit" x86_64/intel/haswell/software/PETSc/3.20.3-foss-2023a/easybuild/easybuild-PETSc-3.20.3-20240403.002309.log.bz2 | wc -l
4824

So something doesn't seem right here. I suspect that it may be related to the MPI issues on haswell that we've been observing for other PRs/builds as well.

With the Lmod hook in place, the haswell build took about as long as for the other CPU targets. The number of failing / timed out tests is also very similar to the other ones:

$ bzgrep "not ok" x86_64/intel/haswell/software/PETSc/3.20.3-foss-2023a/easybuild/easybuild-PETSc-3.20.3-20240404.223827.log.bz2  | wc -l
547
$ bzgrep "Exceeded timeout limit" x86_64/intel/haswell/software/PETSc/3.20.3-foss-2023a/easybuild/easybuild-PETSc-3.20.3-20240404.223827.log.bz2  | wc -l
51

Copy link
Collaborator

@bedroge bedroge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing similar test results for the PETSc installations on our local cluster and on EB's generoso cluster, so I'm assuming this is "normal".

@bedroge bedroge added the bot:deploy Ask bot to deploy missing software installations to EESSI label Apr 5, 2024
@bedroge bedroge merged commit 3371bc2 into EESSI:2023.06-software.eessi.io Apr 5, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io bot:deploy Ask bot to deploy missing software installations to EESSI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants