Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installing WarpX on University of Michigan's Great Lakes cluster #4385

Closed
archermarx opened this issue Oct 19, 2023 · 2 comments
Closed

Installing WarpX on University of Michigan's Great Lakes cluster #4385

archermarx opened this issue Oct 19, 2023 · 2 comments
Assignees

Comments

@archermarx
Copy link
Contributor

archermarx commented Oct 19, 2023

Hi all,

I would like some help installing WarpX on the Great Lakes supercomputing cluster at the University of Michigan. I'm looking to do GPU computing so I can't just use the Conda build. Here's what I have tried so far:

1) Python

I created and ran the following SLURM script (I've removed the specific account):

#!/bin/bash
#SBATCH --job-name=warpx-install
#SBATCH --account=*****
#SBATCH --partition=standard
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --time=12:00:00
#SBATCH --output=output.log
#SBATCH --error=error.log
#SBATCH --mail-type=END,FAIL

# Load required modules
module load gcc openmpi cmake git cuda python3.10-anaconda

# Install required packages to user environment
pip install --user mpi4py periodictable numpy lasy picmistandard

# Set environment variables
WARPX_COMPUTE=CUDA
WARPX_DIMS="1;2;3;RZ"
WARPX_MPI=ON
WARPX_BUILD_PARALLEL=4

# Install warpx with verbose output
pip wheel -v git+https://github.com/ECP-WarpX/WarpX.git
pip install --user *.whl --force-reinstall                              

Running this seems to install WarpX correctly. However, upon trying to run the 1-D capacitive discharge case (https://warpx.readthedocs.io/en/latest/usage/examples.html#capacitive-discharge), it says that the 1D version has not been compiled, and also that AMREX is not installed. I think it may be trying to install AMREX to the /usr/ directory. Is there an environment variable I can set to make it install to my user partition? Also, am I setting the number of dimensions correctly?

Here's the entire error:

Traceback (most recent call last):
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/_libwarpx.py", line 88, in load_library
    import amrex.space1d as amr
ModuleNotFoundError: No module named 'amrex'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/gpfs/accounts/goroda_root/goroda0/marksta/warpx/benchmark-1d/1d.py", line 399, in <module>
    run.run_sim()
  File "/gpfs/accounts/goroda_root/goroda0/marksta/warpx/benchmark-1d/1d.py", line 355, in run_sim
    self.sim.step(self.max_steps - self.diag_steps)
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/picmi.py", line 1924, in step
    self.initialize_warpx(mpi_comm)
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/picmi.py", line 1916, in initialize_warpx
    pywarpx.warpx.init(mpi_comm, max_step=self.max_steps, stop_time=self.max_time)
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/WarpX.py", line 98, in init
    libwarpx.initialize(argv, mpi_comm=mpi_comm)
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/_libwarpx.py", line 128, in initialize
    self.amrex_init(argv, mpi_comm)
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/_libwarpx.py", line 118, in amrex_init
    self.libwarpx_so.amrex_init(argv)
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/_libwarpx.py", line 42, in __getattr__
    self.load_library()
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/_libwarpx.py", line 112, in load_library
    raise Exception(f"Dimensionality '{self.geometry_dim}' was not compiled in this Python install. Please recompile with -DWarpX_DIMS={_dims}")
Exception: Dimensionality '1d' was not compiled in this Python install. Please recompile with -DWarpX_DIMS=1

2) Cmake

After cloning the repo, I run the following script in the WarpX root directory

#!/bin/bash
#SBATCH --job-name=warpx-install
#SBATCH --account=***
#SBATCH --partition=standard
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=16000m
#SBATCH --time=12:00:00
#SBATCH --output=output.log
#SBATCH --error=error.log
#SBATCH --mail-type=END,FAIL

# Load required modules
module load gcc openmpi cmake git cuda python3.10-anaconda

# find dependencies & configure for all WarpX dimensionalities
cmake -S . -B build --DWarpX_DIMS="1;2;RZ;3" -DWarpX_PYTHON=ON -DPYINSTALLOPTIONS="--user"

# build and then call "python3 -m pip install ..."
cmake --build build --target pip_install -j 4

After quite some times, this fails with the following error message:

Usage:
  /usr/bin/python3.6 -m pip wheel [options] <requirement specifier> ...
  /usr/bin/python3.6 -m pip wheel [options] -r <requirements file> ...
  /usr/bin/python3.6 -m pip wheel [options] [-e] <vcs project url> ...
  /usr/bin/python3.6 -m pip wheel [options] [-e] <local project path> ...
  /usr/bin/python3.6 -m pip wheel [options] <archive url/path> ...

no such option: --no-build-isolation
gmake[3]: *** [_deps/fetchedpyamrex-build/CMakeFiles/pyamrex_pip_wheel.dir/build.make:73: _deps/fetchedpyamrex-build/CMakeFiles/pyamrex_pip_wheel] Error 2
gmake[2]: *** [CMakeFiles/Makefile2:2911: _deps/fetchedpyamrex-build/CMakeFiles/pyamrex_pip_wheel.dir/all] Error 2
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:1873: CMakeFiles/pip_install.dir/rule] Error 2
gmake: *** [Makefile:455: pip_install] Error 2

This occurs during the following stage of installation:

[ 85%] Linking CXX static library lib/libwarpx.rz.MPI.CUDA.DP.PDP.OPMD.QED.a
[ 85%] Built target lib_rz

Once I figure out how this install works, I can create a PR for the install process for this cluster, along with sample SLURM scripts. Thanks in advance!

EDIT: I also get a note about numba not being installed due to numpy being too advanced. Not sure how critical this is.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.56.4 requires numpy<1.24,>=1.18, but you have numpy 1.26.1 which is incompatible.

I manually changed the numpy version required to numpy<1.24 in the downloaded wheel and reinstalled pywarpx, but that didn't fix any issues.

@archermarx
Copy link
Contributor Author

Update: I installed AMReX and pyAMReX seperately, so python can find those now, but I still get the following error upon trying to run the 1D benchmark

Traceback (most recent call last):
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/_libwarpx.py", line 90, in load_library
    from . import warpx_pybind_1d as cxx_1d
ImportError: cannot import name 'warpx_pybind_1d' from 'pywarpx' (/home/marksta/.local/lib/python3.10/site-packages/pywarpx/__init__.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/gpfs/accounts/goroda_root/goroda0/marksta/warpx/benchmark-1d/1d.py", line 399, in <module>
    run.run_sim()
  File "/gpfs/accounts/goroda_root/goroda0/marksta/warpx/benchmark-1d/1d.py", line 355, in run_sim
    self.sim.step(self.max_steps - self.diag_steps)
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/picmi.py", line 1924, in step
    self.initialize_warpx(mpi_comm)
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/picmi.py", line 1916, in initialize_warpx
    pywarpx.warpx.init(mpi_comm, max_step=self.max_steps, stop_time=self.max_time)
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/WarpX.py", line 98, in init
    libwarpx.initialize(argv, mpi_comm=mpi_comm)
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/_libwarpx.py", line 128, in initialize
    self.amrex_init(argv, mpi_comm)
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/_libwarpx.py", line 118, in amrex_init
    self.libwarpx_so.amrex_init(argv)
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/_libwarpx.py", line 42, in __getattr__
    self.load_library()
  File "/home/marksta/.local/lib/python3.10/site-packages/pywarpx/_libwarpx.py", line 112, in load_library
    raise Exception(f"Dimensionality '{self.geometry_dim}' was not compiled in this Python install. Please recompile with -DWarpX_DIMS={_dims}")
Exception: Dimensionality '1d' was not compiled in this Python install. Please recompile with -DWarpX_DIMS=1

This is pyWarpX installed using the SLURM script in the first post. It seems the environment variables aren't working correctly?

@archermarx
Copy link
Contributor Author

archermarx commented Oct 30, 2023

Ok, I have gotten it installed and running, but I am encountering issues running with OpenMP and MPI.

First, when running with one MPI rank and only one grid, everything works fine. However, when using two or more grids, the code will run fine for a while and then begin to hang or take a second or more per iteration. This makes it impossible to complete even quite small simulations.

Second, if I run with mpirun -n 2, the code crashes while writing the first openpmd file with the following output:

MPI initialized with 2 MPI processes
MPI initialized with thread support level 3
OMP initialized with 1 OMP threads
AMReX (23.10-14-g7ee29121ed70) initialized
PICSAR (23.09)
WarpX (23.10-19-g8459109a725b)

    __        __             __  __
    \ \      / /_ _ _ __ _ __\ \/ /
     \ \ /\ / / _` | '__| '_ \\  /
      \ V  V / (_| | |  | |_) /  \
       \_/\_/ \__,_|_|  | .__/_/\_\
                        |_|

Level 0: dt = 1.485333333e-11 ; dz = 7.426666667e-05

Grids Summary:
  Level 0   15 grids  600 cells  100 % of domain
            smallest grid: 40  biggest grid: 40

-------------------------------------------------------------------------------
--------------------------- MAIN EM PIC PARAMETERS ----------------------------
-------------------------------------------------------------------------------
Precision:            | DOUBLE
Particle precision:   | DOUBLE
Geometry:             | 1D (Z)
Operation mode:       | Electrostatic
                      | - laboratory frame
                      | - vacuum
-------------------------------------------------------------------------------
Current Deposition:   | direct
Particle Pusher:      | Boris
Charge Deposition:    | standard
Field Gathering:      | energy-conserving
Particle Shape Factor:| 3
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
For full input parameters, see the file: warpx_used_inputs

--- INFO    : Writing openPMD file diags/openpmd000000
[marksta@gl-login3 ecdi-1d-unbounded]$ cat error.log
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unknown file format! Did you specify a file ending?
SIGABRT
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unknown file format! Did you specify a file ending?
SIGABRT
See Backtrace.1.0 file for details
See Backtrace.0.0 file for details

Note that I have installed with openpmd support. I've attached the input file I used as well as the backtrace file and slurm script.

As an example of what I mean, here's a profiler output from when it's running well (here 16 grids)

STEP 11477 starts ...
STEP 11477 ends. TIME = 2.992538086e-08 DT = 2.607421875e-12
Evolve time = 90.40318536 s; This step = 0.004930898 s; Avg. per step = 0.007876900354 s

And here's from later in the simulation when it slows to a crawl

STEP 22003 starts ...
STEP 22003 ends. TIME = 5.737110352e-08 DT = 2.607421875e-12
Evolve time = 244.2563833 s; This step = 1.05330376 s; Avg. per step = 0.0111010491 s

Backtrace.0.0.txt
Backtrace.1.0.txt
srun.sh.txt
inputs_1d.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants