Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement MPI variants #90

Merged
merged 27 commits into from
Nov 20, 2018
Merged

implement MPI variants #90

merged 27 commits into from
Nov 20, 2018

Conversation

minrk
Copy link
Member

@minrk minrk commented Jun 29, 2018

Based on this comment by @mcg1969 hdf5 has 3 variants:

  • unqualified, serial builds. Should be unchanged.
  • track_features hdf5_mpich built with --enable-parallel and mpich
  • track_features hdf5_openmpi built with --enable-parallel and openmpi

The use of track_features on the mpi variants results in conda install hdf5 preferring the non-mpi variants unless explicitly requested. Downstream packages can depend directly on the feature-having variants.

Alternatives include:

  • building a separate hdf5-parallel package

See #51 for more details of the pros and cons of the separate-package option.

closes #51

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@jakirkham
Copy link
Member

Thanks for doing this @minrk. Starting with hdf5 seems reasonable since there is no precedent here.

So this may be premature, as we are still figuring this out, but it would be good to have something in the conda-forge docs about this. In the interim, maybe it could live in Dropbox Paper, HackMD, or somewhere else while we figure out what works and what doesn't. Just somewhere we can refine our understanding and use to formulate the docs on how to use this strategy. Thoughts? Preferences?

@jakirkham
Copy link
Member

cc @dougalsutherland

workaround Internal Error: get_unit(): Bad internal unit KIND
@minrk
Copy link
Member Author

minrk commented Jun 29, 2018

@jakirkham sounds good. I'll start a doc sketching out what we know so far.

FWIW, mpi-requiring packages (mumps-mpi, scalapack, petsc, etc.) are already building with mpi variants and it's working nicely. This is a slight variation because it's the first package that has a 'no mpi' variant to prefer, which is the reason for the track_features trick.

recipe/meta.yaml Outdated Show resolved Hide resolved
for the same reason some serial tests are skipped
minrk added 5 commits July 2, 2018 21:09
use it in `make check RUNPARALLEL=…`

sets environment variables, parameters for mpich/openmpi

from petsc, other recipes
test is meant to crash (that’s what it tests)
but openmpi sets an exit code when this happens that the test doesn’t deal with

see http://hdf-forum.184993.n3.nabble.com/HDF5-1-8-14-15-16-with-OpenMPI-1-10-1-and-Intel-16-1-td4028533.html
@minrk
Copy link
Member Author

minrk commented Jul 2, 2018

Linux builds succeed. I think mac will as well.

The only caveat I've hit is that the mpi fortran builds seem to fail with:

Internal Error: get_unit(): Bad internal unit KIND

mpi builds currently have fortran support disabled because of this.

This same error is showing up in the conda-build 3 PRs here, so I suspect it's the same issue, possibly using the same wrong fortran compiler? I'm not sure. Googling suggests that installing the gcc package (even on Linux) would fix this, but I haven't tried.

@minrk
Copy link
Member Author

minrk commented Jul 4, 2018

Having tracked down the Bad internal unit KIND error to conda-forge/toolchain-feedstock#39 I assume I should be able to get fortran builds with MPI working once the MPI providers are no longer linking libgfortran-ng:

so that packages can require on hdf5 with mpi or not.

testing h5py built with serial hdf5, it works with parallel hdf5,
but not the other way,
so mpi builds have run_exports for the right build string,
but serial builds do not have run_exports
breaks fortran compiler detection
shared-memory seems to have issues in ompi, at least on mac
@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe) and found some lint.

Here's what I've got...

For recipe:

  • Failed to even lint the recipe, probably because of a conda-smithy bug 😢. This likely indicates a problem in your meta.yaml, though. To get a traceback to help figure out what's going on, install conda-smithy and run conda smithy recipe-lint . from the recipe directory.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

recipe-lint fails if mpi is undefined
and apparently runs without defining mpi
(this means recipe-lint ignores conda_build_config)
seems to hang. Not sure why
@minrk
Copy link
Member Author

minrk commented Nov 19, 2018

I've now got all combinations of gcc,toolchain,clang+mpich,openmpi,nompi building here.

Writeup of mpi variants here.

This recipe:

  • builds nompi, mpich, openmpi varants (only nompi on windows)
  • uses track_features: hdf5_{mpich|openmpi} on the mpi builds to prefer the nompi variant (no package should every have these features)
  • build strings allow packages to pick which variant to build, as loosely or precisely as needed
  • run_exports ensures that mpi builds have runtime dependency on mpi builds (no such pinning for non-mpi builds)

I chose that strategy for run_exports specifically because I tested h5py built against serial hdf5 run against parallel hdf5 and it worked. h5py build against parallel hdf5 did not run against serial hdf5 or other mpi.

@minrk minrk changed the title [WIP] implement MPI variants implement MPI variants Nov 20, 2018
@minrk minrk mentioned this pull request Nov 20, 2018
@ocefpaf ocefpaf merged commit 50f91df into conda-forge:master Nov 20, 2018
@minrk minrk deleted the mpi branch November 21, 2018 10:39
@minrk minrk mentioned this pull request Dec 7, 2018
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

parallel hdf5
5 participants