dlio_benchmark_test.py segfaults on second benchmark test #179

krehm · 2024-04-03T11:08:05Z

Commit fb762c2 added MPI init and finalize calls in each test in dlio_benchmark_test.py. On my test setup this causes dlio_benchmark_test.py to segfault as it starts the second benchmark test. It appears that the MPI finalize must somehow unmap some memory that is needed for the subsequent MPI calls.

OS is Rocky 8.9, openmpi is 4.1.7a1 release 1.2310055 installed as part of MOFED 23.10-OFED.23.10.0.5.5.1

The text was updated successfully, but these errors were encountered:

krehm · 2024-04-03T16:48:15Z

I should mention that I ran the test simply as "pytest ./dlio_benchmark_test.py", I did not run it via MPI. I then tried "mpirun -np 1 pytest dlio_benchmark_test.py" but that segfaults the same way.

An unrelated question, why is there unittest code in the file? The structure of the file doesn't seem compatible with unittest.

krehm · 2024-04-03T19:55:37Z

I changed the "DLIOMPI.get_instance().finalize()" call in finalize() to "pass" and the benchmark now runs. Hmm, but with lots of failures, " Exception: method DLIOMPI.initialize() called in a child process"

krehm · 2024-04-03T22:24:16Z

dlio_benchmark_test.py runs as a single program start to finish, but init() and finalize() calls have been added to it which are called around every test. finalize() calls DLIOMPI.get_instance().finalize() which calls MPI.Finalize(), and once that routine is called, the program cannot call MPI routines anymore without segfaulting. After this new finalize() call has been executed, both MPI.Is_initialized() and MPI.is_finalized() are True.

I can modify the DLIOMPI.finalize() routine to set self.mpi_state to MPIState.UNINITIALIZED so that the subsequent init() call will call the DLIOMPI.initialize() routine to reinitialize the DLIOMPI state again, but that doesn't help, MPI functions can't be called anymore. I can add an assert in DLIOMPI.initialize() to check if MPI.Is_finalized() is True and perform an assert to catch and report cases like this. But the init() and finalize() calls in dlio_benchmark_test.py still have to be removed, init() should be called only once at process startup and finalize() should be called only once and program end. I'd be interested in hearing the history behind why the init() and finalize() routines were added around each test. None of the MPI parameters like size or rank are going to change between tests, so I don't understand why init() and finalize() are per-test.

krehm · 2024-04-03T23:23:25Z

I can see that things are more complicated than the above. I will keep digging, and provide a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dlio_benchmark_test.py segfaults on second benchmark test #179

dlio_benchmark_test.py segfaults on second benchmark test #179

krehm commented Apr 3, 2024

krehm commented Apr 3, 2024 •

edited

Loading

krehm commented Apr 3, 2024 •

edited

Loading

krehm commented Apr 3, 2024

krehm commented Apr 3, 2024

dlio_benchmark_test.py segfaults on second benchmark test #179

dlio_benchmark_test.py segfaults on second benchmark test #179

Comments

krehm commented Apr 3, 2024

krehm commented Apr 3, 2024 • edited Loading

krehm commented Apr 3, 2024 • edited Loading

krehm commented Apr 3, 2024

krehm commented Apr 3, 2024

krehm commented Apr 3, 2024 •

edited

Loading

krehm commented Apr 3, 2024 •

edited

Loading