Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi4py issue with large k-eigenvalue simulations in Lassen #101

Open
northroj opened this issue May 9, 2023 · 4 comments
Open

mpi4py issue with large k-eigenvalue simulations in Lassen #101

northroj opened this issue May 9, 2023 · 4 comments

Comments

@northroj
Copy link
Contributor

northroj commented May 9, 2023

On Lassen, MCDC breaks after the first eigenvalue cycle if the number of histories per cycle is larger than 1e5 and numba is enabled.

#     k        k (avg)
 ==== ======= ===================
 1     1.43481
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '_'.
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '\xba'.
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '\x00'.
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '\x00'.
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
OverflowError: BINBYTES exceeds system's maximum size of 9223372036854775807 bytes
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '\x0f'.
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '\xfe'.
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '\x00'.
_pickle.UnpicklingError: invalid load key, '\x00'.
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: unexpected MARK found
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '\x9f'.
@jpmorgan98
Copy link
Collaborator

@spasmann: was this like any of the issues you had seen?

@spasmann
Copy link
Collaborator

spasmann commented May 19, 2023

I've also encountered this issue, typically when running on the ND cluster. But I just ran the Takeda-1 problem with N=2e5 with 2 MPI processes and seem to get the same issue, although the output is somewhat different. This could be a problem with my installation of MPI. I will try again on the ND cluster when they are back online.

 #     k        k (avg)
 ==== ======= ===================
←[K 1     0.41197================] 100%
Traceback (most recent call last):
  File "C:\Users\Sam\Documents\Github\MCDC\examples\eigenvalue\takeda_1\input.py", line 192, in <module>
Traceback (most recent call last):
  File "C:\Users\Sam\Documents\Github\MCDC\examples\eigenvalue\takeda_1\input.py", line 192, in <module>
    mcdc.run()
  File "c:\users\sam\documents\github\mcdc\mcdc\main.py", line 45, in run
    mcdc.run()
  File "c:\users\sam\documents\github\mcdc\mcdc\main.py", line 45, in run
    loop_main(mcdc)
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1839, in mpi4py.MPI.Intracomm.Exscan
  File "mpi4py/MPI/Comm.pyx", line 1839, in mpi4py.MPI.Intracomm.Exscan
  File "mpi4py/MPI/msgbuffer.pxi", line 874, in mpi4py.MPI._p_msg_cco.for_exscan
  File "mpi4py/MPI/msgbuffer.pxi", line 874, in mpi4py.MPI._p_msg_cco.for_exscan
ValueError: mismatch in send and receive MPI datatypes
ValueError: mismatch in send and receive MPI datatypes

@ilhamv
Copy link
Collaborator

ilhamv commented May 19, 2023

I think these issues happen due to the use of the lowercase, instead of the uppercase, mpi4py functions. Such as send instead of Send. The easier-to-use lowercase versions may be less portable compared to the uppercase ones, which are closer to the actual MPI functions.

@northroj
Copy link
Contributor Author

Tested the other mpi modules on Lassen. spectrum-mpi/2019.06.24 didn't run at all, and spectrum-mpi/2020.08.19 and spectrum-mpi/test-rolling-release got similar errors to the rolling release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants