Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automatic CUDA support for ELPA #2673

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

damianam
Copy link
Member

@damianam damianam commented Feb 3, 2022

No description provided.

@@ -52,6 +53,7 @@ def extra_options():
"""Custom easyconfig parameters for ELPA."""
extra_vars = {
'auto_detect_cpu_features': [True, "Auto-detect available CPU features, and configure accordingly", CUSTOM],
'cuda': [None, "Enable CUDA build if CUDA is among the dependencies", CUSTOM],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not set this to True by default? That would simplify the logic below a bit to:
if cuda_is_dep and self.cfg['cuda']:
...
elif not self.cfg['cuda']:
...
else:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually make sense: don't make a decision by default, let it depend on whether CUDA is included as a dependency or not, but allow explicitely opting in (True) or out (False).

@damianam is just using the same approach as in #2666 here...

@@ -159,6 +161,24 @@ def run_all_steps(self, *args, **kwargs):
self.cfg.update('configopts', '--with-mpi=no')
self.cfg.update('configopts', 'LIBS="$LIBLAPACK"')

# Add CUDA features
cuda_is_dep = 'CUDA' in [i['name'] for i in self.cfg.dependencies()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this, including the "cuda" extra_var, can be simplified to
if get_software_root('CUDA'):
...

That way you can drop the elif statements and the cuda extra_var.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually not fully true...

If the CUDA module was loaded manually, then $EBROOTCUDA is also defined (and that's all that get_software_root does: it grabs the corresponding $EBROOT* environment variable).

So this is actually a better approach...

We should have proper support for this in framework though, so you can just do self.cfg.is_dep('CUDA')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that if you do that (load CUDA manually before building) then you're doing it wrong and almost all our easyconfigs are at risk.
Using just if get_software_root is what we do everywhere else so why not here too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For JSC, CUDA is a dep of MPI by default (and that MPI works regardless of whether or not there actually is a GPU through the use of UCX variables). So get_software_root('CUDA') is not enough, you need to know if it is actually a dep to decide whether to trigger a GPU build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ocaisa if CUDA is a dep of MPI regardless of whether or not there actually is a GPU, don’t you want to trigger a GPU build also regardless of whether or not there actually is a GPU?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smoors No, because then that GPU-enabled build will (typically) not work on a system without a GPU (whereas OpenMPI will work just fine on that system as long as you set the appropriate btl). So at JSC, the GPU supported build is usually explicit

if cuda_is_dep and (self.cfg['cuda'] is None or self.cfg['cuda']):
self.cfg.update('configopts', '--enable-nvidia-gpu')
cuda_cc_space_sep = self.cfg.template_values['cuda_cc_space_sep'].replace('.', '').split()
# Just one is supported, so pick the highest one (but prioritize sm_80)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picking the highest one is likely a bad idea since then the code won't run on lower end GPU's in case there sites default settings lists all their models.
In this case it's best to raise an error if there are more than one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or pick first listed (or minimum rather than max) and print a clear warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LAMMPS easyblock selects the largest: https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/l/lammps.py#L307

Having a method, with a clear rational on what is picked, would be good for cases where the software supports building for one, and not all of the, CUDA CC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants