Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cesm link step in builds with high pcols values #22

Merged

Conversation

gdicker1
Copy link

In GPU runs of CAM, it is desirable to increase the number of physics columns each MPI rank is responsible for by increasing the pcols variable. When pcols grows somewhere beyond 2048, builds fail in the link step.

Adding -mcmodel=medium to the FFLAGS and LDFLAGS allow builds to proceed. Since this can have performance implications, this is only being applied to the NVHPC compilers (the main compilers for GPU applications right now).

@gdicker1 gdicker1 added the bug Something isn't working label Jul 25, 2024
@gdicker1 gdicker1 self-assigned this Jul 25, 2024
@gdicker1
Copy link
Author

@sjsprecious, apologies if you aren't the right person to direct questions or this PR to.

I added you as a reviewer for your opinion on this change. Should this be made even more GPU specific? Or, should I apply this to intel and gnu compilers too? Any other thoughts?

@gdicker1
Copy link
Author

Once incorporated, this will address EarthWorksOrg/EarthWorks Issue #56

@sjsprecious
Copy link

Hi @gdicker1 , my understanding is that the -mcmodel=medium flag is only needed when you want to set a large PCOLS value. According to ChatGPT:

Using the -mcmodel=medium flag can hurt performance compared to the default -mcmodel=small because it requires the compiler to generate larger and potentially less efficient instructions to access global and static variables. This can lead to increased instruction cache pressure and slower memory accesses.

Therefore, I prefer not to set this flag by default and let a user add it manually when needed. Even for a GPU run, we may not have a large PCOLS value depending on the compset, resolution, number of CPU cores / GPUs per node, etc.

On the other hand, using a large PCOLS value will unfortunately hurt the CPU performance. Thus using the default PCOLS value makes sense for any CPU simulation and we should not need to add this flag for GNU/Intel compiler.

This is just my two cents.

@gdicker1
Copy link
Author

Thanks for the perspective @sjsprecious! I'll consider this a bit more...

You're correct that we only need to consider adding this flag for higher pcols values. Right now -mcmodel=medium might be something we want for GPU runs, based on Supreeth's work with 1 MPI rank per GPU (if I remember correctly). It seems like a cumbersome step to expect users to add it themselves (but GPU runs already have a few "special" steps).

@gdicker1
Copy link
Author

@supreethms1809 could you offer some thoughts or review?

I just updated this so -mcmodel=medium is only used for OpenACC offloads and the changes are at the bottom of the file below a "EarthWorks specifc: ..." comment.

@supreethms1809
Copy link

From my limited tests on CPUs, I haven't seen the flag hurting the performance. But it is needed for GPU builds.

@areanddee Do we expect to run with bigger PCOLs on CPUs as well?

@areanddee
Copy link

Hi @gdicker1 , my understanding is that the -mcmodel=medium flag is only needed when you want to set a large PCOLS value. According to ChatGPT:

Using the -mcmodel=medium flag can hurt performance compared to the default -mcmodel=small because it requires the compiler to generate larger and potentially less efficient instructions to access global and static variables. This can lead to increased instruction cache pressure and slower memory accesses.

Therefore, I prefer not to set this flag by default and let a user add it manually when needed. Even for a GPU run, we may not have a large PCOLS value depending on the compset, resolution, number of CPU cores / GPUs per node, etc.

On the other hand, using a large PCOLS value will unfortunately hurt the CPU performance. Thus using the default PCOLS value makes sense for any CPU simulation and we should not need to add this flag for GNU/Intel compiler.

This is just my two cents.

My 2 cents:
I would recommend we set this by default ON for GPUs and OFF for CPUs. We already have flags that are specifically for GPUs so I don't think this sets any sort of bad precedent.

gdicker1 added 2 commits July 30, 2024 18:36
When CAM is compiled with pcols set higher than 2048, builds will fail
during the link step. Adding `-mcmodel=medium` allows builds to succeed.
Also move this to the bottom of the file and add a comment that
indicates this change is only for EarthWorks.
@gdicker1 gdicker1 force-pushed the fix/mcmodel_highpcols_linkstep branch from c3bc0bd to b8a71e2 Compare July 31, 2024 00:37
@gdicker1
Copy link
Author

Thanks for your thoughts @areanddee. Since this will only activate for GPU builds (and more specifically OpenACC builds), I think this is good to move forward.

@gdicker1
Copy link
Author

Force push from c3bc0bd to b8a71e2 was rebasing this branch from being based on ESMCI/ccs_config_cesm tag ccs_config_cesm0.0.99 to being based on ew-develop (currently ccs_config-ew2.2.000).

@gdicker1 gdicker1 merged commit cb6c09c into EarthWorksOrg:ew-develop Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants