Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ngpus_per_node for a SCREAM/E3SM GPU run on Derecho #4687

Closed
wants to merge 1 commit into from

Conversation

sjsprecious
Copy link
Collaborator

In order to build and run E3SM/SCREAM on Derecho's A100 GPU, we need to set the ngpus_per_node correctly for E3SM/SCREAM.

E3SM/SCREAM does not use the GPU_TYPE and GPU_OFFLOAD options.

Copy link
Contributor

@jedwards4b jedwards4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that you have tested that this does what you want for both cesm and e3sm on derecho - what about other systems? Should you test there as well? Any way you can think of to move this to XML instead of doing it in python?

@rljacob
Copy link
Member

rljacob commented Sep 27, 2024

I thought GPU_TYPE and GPU_OFFLOAD were part of the machine config so if its defined for derecho, they will be used. We really want less of these model-specific hooks in the python.

@sjsprecious
Copy link
Collaborator Author

Thanks @jedwards4b and @rljacob for your comments. I see your concerns and I agree that these model- or machine-specific options are not appropriate for the python workflow.

Hmm, if we remove NGPUS_PER_NODE, GPU_OFFLOAD and GPU_TYPE from the python arguments and move them to a XML file, maybe we can ask a user to do something like:

1. create a new case
2. go to the case directory
3. ./xmlchange NGPUS_PER_NODE=xxx
4. ./xmlchange GPU_OFFLOAD=xxx
5. ./xmlchange GPU_TYPE=xxx
6. ./case.setup
7. ./case.build

If this approach is feasible, I just need to make sure that values set in the XML files can be used to set up the GPU flags, GPU node type, etc accordingly in the CMake file later. Is it possible?

@jedwards4b
Copy link
Contributor

Yes, I think that this is a good approach.

@sjsprecious
Copy link
Collaborator Author

Thanks @jedwards4b .

@rljacob what do you think about this approach? I want to make sure that you are also comfortable with the new method before I make the changes. Thanks.

@rljacob
Copy link
Member

rljacob commented Sep 30, 2024

Yes this is fine. But will you still need to make mods to the SCREAM Cmake files?

@sjsprecious
Copy link
Collaborator Author

Yes this is fine. But will you still need to make mods to the SCREAM Cmake files?

That is correct. I anyway need to add the CMake file for the Derecho machine in SCREAM.

@sjsprecious
Copy link
Collaborator Author

Will be addressed by a different approach in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants