Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A couple of Global Workflow tasks are failing on S4 due to over-threading by OpenMP #2211

Closed
souopgui opened this issue Jan 9, 2024 · 0 comments · Fixed by #2212
Closed
Labels
bug Something isn't working triage Issues that are triage

Comments

@souopgui
Copy link
Contributor

souopgui commented Jan 9, 2024

What is wrong?

A couple of Global Workflow tasks are failing on S4 due to lack of system resources for OpenMP
Affected tasks:

  • gfsatmanlprod
  • gdasatmanlprod
  • gfsatmprod_fxxx-fxxx
  • gdasatmprod_fxxx-fxxx

--typical error message--

OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.

What should have happened?

Tasks should complete successfully

What machines are impacted?

All or N/A

Steps to reproduce

Run any of the following tasks on resolution 192/96 on S4 and check for OpenMP errors.

  • gfsatmanlprod
  • gdasatmanlprod
  • gfsatmprod_fxxx-fxxx
  • gdasatmprod_fxxx-fxxx

Additional information

This issue is similar to issue #2206 however, they affect different set of tasks.

Do you have a proposed solution?

In /scripts/exglobal_atmos_products.sh,
prefix the call to "${HOMEgfs}/ush/run_mpmd.sh" by OMP_NUM_THREADS=1 as show bellow.

This ensures that OMP_NUM_THREADS is set to 1 only when running MPMD, where multiple instances are being kicked off simultaneously. By only prefixing this call, we are sure that no other run is affected.

/scripts/exglobal_atmos_products.sh

  # Run with MPMD or serial
  if [[ "${USE_CFP:-}" = "YES" ]]; then
    OMP_NUM_THREADS=1 "${HOMEgfs}/ush/run_mpmd.sh" "${DATA}/poescript"
    export err=$?
  else
    chmod 755 "${DATA}/poescript"
    bash +x "${DATA}/poescript" > mpmd.out 2>&1
    export err=$?
  fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issues that are triage
Projects
None yet
1 participant