-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
install nbodykit on NERSC Perlmutter #675
Comments
One difficulty I see is that nbodykit has to be built with the PrgEnv on
NERSC, thus getting the bccp conda channel version of nbodykit won't likely
work with the PrgEnv version of mpi4py anyway.
You will likely need to rebuild all of the nbodykit dependency packages
with PrgEnvGnu. It is possible that using pip install after PrgEnvGnu can
get you quite far. Did you try that?
The scripts in the m3035 project roughly does that, but via conda-build.
The scripts there also create a conda-channel with these PrgEnv built
packages for cori (at the m3035 project folder).
The bcast-bccp-3.8 environment was using that channel. The more 'proper'
way of fixing this might be upgrading those channel building scripts for
Permutter, especially if you plan to run things at scale with the 'bcast'
style environments.
- Yu
…On Sat, Feb 4, 2023 at 10:19 PM Biwei Dai ***@***.***> wrote:
Hi,
The nbodykit built for NERSC cori (
https://nbodykit.readthedocs.io/en/latest/getting-started/install.html#nbodykit-on-nersc
) does not seem to work on Perlmutter. When trying to load nbodykit with
source /global/common/software/m3035/conda-activate.sh 3.8
I got the following error:
/global/common/software/m3035/conda/envs/bcast-bccp-3.8/etc/conda/activate.d/activate-nersc-prgenv-gnu.sh:
line 6: /etc/profile.d/nerschost.sh: No such file or directory
/global/common/software/m3035/conda/envs/bcast-bccp-3.8/etc/conda/activate.d/activate-nersc-prgenv-gnu.sh:
line 7: /etc/profile.d/modules.sh: No such file or directory
/global/common/software/m3035/conda/envs/bcast-bccp-3.8/etc/conda/activate.d/activate-nersc-prgenv-gnu.sh:
line 8: /etc/profile.d/mpi-selector.sh: No such file or directory
/global/common/software/m3035/conda/envs/bcast-bccp-3.8/etc/conda/activate.d/activate-nersc-prgenv-gnu.sh:
line 9: /etc/bash.bashrc.local: No such file or directory
So I tried to create my conda environment with nbodykit. I install mpi4py
with
module swap PrgEnv-${PE_ENV,,} PrgEnv-gnu
MPICC="cc -shared" pip install --force-reinstall --no-cache-dir
--no-binary=mpi4py mpi4py
following
https://docs.nersc.gov/development/languages/python/parallel-python/#mpi4py-in-your-custom-conda-environment
I tested it on Perlmutter computing node and it works fine.
But when I try to install nbodykit with
conda install -c bccp nbodykit
It doesn't use the mpi4py I built and reinstalls mpi4py with conda, which
no longer works on Perlmutter computing nodes. Can I force it to use the
mpi4py I built?
I also tried reinstalling mpi4py again to overwrite the mpi4py conda
installed, and I got the following error when running the code:
Attempting to use an MPI routine before initializing MPICH
—
Reply to this email directly, view it on GitHub
<#675>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABBWTDQQYK7K7RAGFAW5JTWV5A7XANCNFSM6AAAAAAURTAD34>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Lagging behind Biwei I am just now running into this issue now that cori is gone forever (and have not yet found a way for this to work). |
Yu's suggestion works for me! I manually install the nbodykit dependencies and nbodykit with PrgEnv-gnu and pip install. Have you tried this? |
For posterity, here is what I did:
which seems to work on a perlmutter compute node |
@rainwoodman As you alluded to toward the top of this issue, recreating the bcast-pip scripts would be nice to have - I am running some <4 node jobs and am seeing 5-10 mins of startup time. IIRC bcast-pip improves this. |
Hi,
The nbodykit built for NERSC cori ( https://nbodykit.readthedocs.io/en/latest/getting-started/install.html#nbodykit-on-nersc ) does not seem to work on Perlmutter. When trying to load nbodykit with
I got the following error:
So I tried to create my conda environment with nbodykit. I install mpi4py with
following https://docs.nersc.gov/development/languages/python/parallel-python/#mpi4py-in-your-custom-conda-environment
I tested it on Perlmutter computing node and it works fine.
But when I try to install nbodykit with
It doesn't use the mpi4py I built and reinstalls mpi4py with conda, which no longer works on Perlmutter computing nodes. Can I force it to use the mpi4py I built?
I also tried reinstalling mpi4py again to overwrite the mpi4py conda installed, and I got the following error when running the code:
The text was updated successfully, but these errors were encountered: