Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlio_benchmark_test.py segfaults on second benchmark test #179

Open
krehm opened this issue Apr 3, 2024 · 4 comments
Open

dlio_benchmark_test.py segfaults on second benchmark test #179

krehm opened this issue Apr 3, 2024 · 4 comments

Comments

@krehm
Copy link
Contributor

krehm commented Apr 3, 2024

Commit fb762c2 added MPI init and finalize calls in each test in dlio_benchmark_test.py. On my test setup this causes dlio_benchmark_test.py to segfault as it starts the second benchmark test. It appears that the MPI finalize must somehow unmap some memory that is needed for the subsequent MPI calls.

OS is Rocky 8.9, openmpi is 4.1.7a1 release 1.2310055 installed as part of MOFED 23.10-OFED.23.10.0.5.5.1

@krehm
Copy link
Contributor Author

krehm commented Apr 3, 2024

I should mention that I ran the test simply as "pytest ./dlio_benchmark_test.py", I did not run it via MPI. I then tried "mpirun -np 1 pytest dlio_benchmark_test.py" but that segfaults the same way.

An unrelated question, why is there unittest code in the file? The structure of the file doesn't seem compatible with unittest.

@krehm
Copy link
Contributor Author

krehm commented Apr 3, 2024

I changed the "DLIOMPI.get_instance().finalize()" call in finalize() to "pass" and the benchmark now runs. Hmm, but with lots of failures, " Exception: method DLIOMPI.initialize() called in a child process"

@krehm
Copy link
Contributor Author

krehm commented Apr 3, 2024

dlio_benchmark_test.py runs as a single program start to finish, but init() and finalize() calls have been added to it which are called around every test. finalize() calls DLIOMPI.get_instance().finalize() which calls MPI.Finalize(), and once that routine is called, the program cannot call MPI routines anymore without segfaulting. After this new finalize() call has been executed, both MPI.Is_initialized() and MPI.is_finalized() are True.

I can modify the DLIOMPI.finalize() routine to set self.mpi_state to MPIState.UNINITIALIZED so that the subsequent init() call will call the DLIOMPI.initialize() routine to reinitialize the DLIOMPI state again, but that doesn't help, MPI functions can't be called anymore. I can add an assert in DLIOMPI.initialize() to check if MPI.Is_finalized() is True and perform an assert to catch and report cases like this. But the init() and finalize() calls in dlio_benchmark_test.py still have to be removed, init() should be called only once at process startup and finalize() should be called only once and program end. I'd be interested in hearing the history behind why the init() and finalize() routines were added around each test. None of the MPI parameters like size or rank are going to change between tests, so I don't understand why init() and finalize() are per-test.

@krehm
Copy link
Contributor Author

krehm commented Apr 3, 2024

I can see that things are more complicated than the above. I will keep digging, and provide a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant