Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Memory alloction/deallocation is not safe with MVAPICH #1703
After investigating 'segmentation fault ' on Cooley at ALCF, I found that the Mallocator was causing the problem but I think this is not our fault but MVAPICH.
Mallocator uses aligned_alloc whenever available. If aligned_alloc is not available, posix_memalign is used. On Cooley, aligned_alloc is available from the OS.
I think it is the fault of MVAPICH overriding system(libc/glibc) malloc/posix_memalign/free but aligned_alloc was missed.
I tend to use solution 2 and push MVAPICH to fix the problem. Any thoughts?
Thank you for bringing this to our attention. We appreciate it. So far, we have not received any issues about users wanting to use “aligned_alloc”. We will see how to handle it in our code and get back to you.
Some history about this feature is given below.
As you may know, (barring a few exceptions) any buffer that an InifniBand HCA can act upon must be registered with it ahead of time. Since registration for InfiniBand is very expensive we attempt to cache these registrations so if the same buffer is re-used again for communication it will already be registered (speeding up the application). The reason why MVAPICH2 (and several other MPI libraries like OpenMPI – please refer to https://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned; https://www.open-mpi.org/papers/euro-pvmmpi-2006-hpc-protocols/euro-pvmmpi-2006-hpc-protocols.pdf) intercept malloc and free routines is to allow correctness while caching these InfiniBand memory registrations (since the MPI library needs to know if the memory is being freed etc).
Whether disabling registration cache will have a negative effect on application performance depends entirely on the communication pattern of the application. If the application uses mostly small to medium sized messages (approximately less than 16 KB), then disabling registration cache will mostly have no impact on the performance of the application.
The following section of the userguide has more information about the impact of disabling memory registration cache on application performance.
This feature can be disabled at runtime by setting “MV2_USE_LAZY_MEM_UNREGISTER=0”. The following section of the userguide has more information about this parameter.
This feature can be disabled at configuration time, the “--disable-registration-cache” parameter can be used. The following section of the userguide has more information about this parameter.
@harisubramoni I can confirm that after building MVAPICH with