fix out of host memory during HDF5 dumping #2690

psychocoderHPC · 2018-08-16T14:16:59Z

For HDF5 checkpoints and normal dumps we use CUDA mapped memory to transfere the particles to the host memory. Due to the usage of cupla and the reason that mapped memory is not supported by cupla and alpaka we fall back to native CUDA functions. To free mapped memory the function cudaFreeHost. This function is also used to free normal host memory allocated with cudaMallocHost which is supported by cupla.
The cupla internal macros rename cudaFreeHost to cuplaFreeHost and fall back to alpaka functionallity to free the memory. In the case where we allocated the memory without the knowledge of cupla, cupla is trowing an error and cudaFreeHost will never be called.
The result in PIConGPU is that with each particle dump the host memory footprint grows until we run out of memory.

fix broken memory freeing
MappedBufferIntern
- fix that wrong function is used to allocate mapped memory
- add workaround to free mapped memory
- add fallpack if an CPU accelerator is used

Note: MappedBufferIntern is not used for long time therefore the wrong allocation call was not seen by any user of PMacc

This bugfix ~~should~~ solve: #2504

Tests

KHI with 192x512x64 cells on 1 GPU

CC-ing: @PrometheusPi @steindev @HighIander @BeyondEspresso

ax3l · 2018-08-16T14:29:10Z

include/picongpu/plugins/output/WriteSpeciesCommon.hpp

-             */
-            if(rc != cuplaErrorMemoryAllocation)
-                CUDA_CHECK(rc)
+/* cupla is not supporting the function cudaHostAlloc to create mapped memory.


can you please open a cupla & alpaka issue and link it here in the comment?

Even if it might currently be a wontfix it needs to be doc-ed as an issue.

I can open a feature request in alpaka.
This bug is not important for cupla because I as user ignored the error cupla provided.

I opened the alpaka feature request: https://github.com/ComputationalRadiationPhysics/alpaka/issues/612

thx, please link it in both comments in-code :)

Please link https://github.com/ComputationalRadiationPhysics/alpaka/issues/296 as the original alpaka issue

typo: cupla 0.1.0 does not support ...

ax3l · 2018-08-16T14:29:49Z

include/pmacc/memory/buffers/MappedBufferIntern.hpp

@@ -69,7 +73,21 @@ class MappedBufferIntern : public DeviceBuffer<TYPE, DIM>

        if (pointer && ownPointer)
        {
-            CUDA_CHECK(cudaFreeHost(pointer));
+#if( PMACC_CUDA_ENABLED == 1 )
+/* cupla is not supporting the function cudaHostAlloc to create mapped memory.


link same issues here, please

PrometheusPi · 2018-08-17T10:22:44Z

I tested your bug fix on the two simulations mentioned above. They do no longer crash. Great work. 👍

ax3l · 2018-08-17T17:29:42Z

@psychocoderHPC if this is also fixing #2504, then this also affects the CUDA backend. Can you confirm?

psychocoderHPC · 2018-08-17T18:13:56Z

Maybe I mentioned it in the PR description. I started yesterday a test if it solves the issue but missed to check the result today :-) I will check it later this evening or next week. [update] the job is now running since 25h and passed over 120k steps. Normally died always after 35ksteps. I will wait until next week because the full job runs 60h.

psychocoderHPC · 2018-08-20T07:35:38Z

#2504 is solved by this PR.

ax3l · 2018-08-20T21:23:50Z

Ok, so I tag this as CUDA bug now as well.

ax3l · 2018-08-20T21:24:26Z

I merge this now, but I think the alpaka issue should have been linked as annotated. If you like to add this, pls do in a follow-up PR.

ax3l

Can't merge now.
Please add:

link to alpaka issue on mapped mem (2x)
#ifdef before #undef guard

Please amend the changes, so we have one commit in the end.

ax3l · 2018-08-20T21:25:15Z

include/picongpu/plugins/output/WriteSpeciesCommon.hpp

+ * @todo this is a workaround plese fix me. We need to investigate if
+ * it is possible to have mapped/unified memory in alpaka.
+ */
+#   undef cudaFreeHost


I would guard this with a #ifdef

I will skip this because it is already guarded by PMACC_CUDA_ENABLED. cupla always defines cudaFreeHost if not than something is broken anyway. Also if I guard this code part I need also to guard the reintroducing of # define cudaFreeHost(...) cuplaFreeHost(__VA_ARGS__) in the case it was not defined before.
All this will become only unreadable without increasing the code quality.

The reason why one would guard a def or undef is to make it more flexible for other hacks/workarounds that might (un)set.

You can keep it as is under the mentioned assumption, but that would make it more flexible.

ax3l · 2018-08-20T21:25:34Z

include/pmacc/memory/buffers/MappedBufferIntern.hpp

+ * @todo this is a workaround plese fix me. We need to investigate if
+ * it is possible to have mapped/unified memory in alpaka.
+ */
+#   undef cudaFreeHost


I would guard this with an #ifdef cudaFreeHost

ax3l · 2018-08-20T21:27:35Z

include/picongpu/plugins/output/WriteSpeciesCommon.hpp

-            if(rc != cuplaErrorMemoryAllocation)
-                CUDA_CHECK(rc)
+/* cupla is not supporting the function cudaHostAlloc to create mapped memory.
+ * Therefore we need to call the native CUDA function cudaFreeHost to free the memory.


need -> have
free ~~the~~ memory

ax3l · 2018-08-20T21:28:09Z

include/picongpu/plugins/output/WriteSpeciesCommon.hpp

+ * Therefore we need to call the native CUDA function cudaFreeHost to free the memory.
+ * Due to the renaming of cuda functions with cupla via macros we need to remove
+ * the renaming to get access to the native cuda function.
+ * @todo this is a workaround plese fix me. We need to investigate if


ax3l · 2018-08-20T21:28:21Z

include/pmacc/memory/buffers/MappedBufferIntern.hpp

+ * Therefore we need to call the native CUDA function cudaFreeHost to free the memory.
+ * Due to the renaming of cuda functions with cupla via macros we need to remove
+ * the renaming to get access to the native cuda function.
+ * @todo this is a workaround plese fix me. We need to investigate if


same typos as above :)

For HDF5 checkpoints and normal dumps we use CUDA mapped memory to transfere the particles to the host memory. Due to the usage of cupla and the reason that mapped memory is not supported by cupla and alpaka we fall back to native CUDA functions. To free mapped memory the function `cudaFreeHost`. This function is also used to free normal host memory allocated with `cudaMallocHost` which is supported by cupla. The cupla internal macros rename `cudaFreeHost` to `cuplaFreeHost` and fall back to alpaka functionallity to free the memory. In the case where we allocated the memory with without the knowlage of cupla cupla is trowing an error and `cudaFreeHost` will never be called. The result in PIConGPU is that with each particle dump the host memory footprint grows until we run out of memory. - fix broken memory freeing - `MappedBufferIntern` - fix that wrong function is used to allocate mapped memory - add workaround to free mapped memory - add fallpack if an CPU accelerator is used

psychocoderHPC · 2018-08-21T07:24:51Z

I linked the alpaka issues and fixed the typos/spelling

The guards make no sense #2690 (comment)

I will keep this PR as it is, including the undef of the cupla defines. I will have a look into the alpaka method to create unified/mapped memory and if possible I will create a follow up PR.

note: This PR is well tested and I would like to avoid to break it again with untested alapka features.

psychocoderHPC added bug a bug in the project's code backend: serial Serial CPU Backend labels Aug 16, 2018

psychocoderHPC added this to the 0.4.0: CPU Support, Filter & Merging milestone Aug 16, 2018

psychocoderHPC assigned ax3l Aug 16, 2018

psychocoderHPC added this to todo in 0.4.0 Finalize via automation Aug 16, 2018

psychocoderHPC requested a review from ax3l August 16, 2018 14:17

psychocoderHPC added component: core in PIConGPU (core application) component: PMacc in PMacc labels Aug 16, 2018

ax3l added the component: plugin in PIConGPU plugin label Aug 16, 2018

ax3l reviewed Aug 16, 2018

View reviewed changes

ax3l mentioned this pull request Aug 16, 2018

Release 0.4.0-rc2 #2686

Merged

4 tasks

ax3l added backend: omp2b OpenMP2 backend backend: tbb TBB CPU backend labels Aug 17, 2018

psychocoderHPC mentioned this pull request Aug 20, 2018

Crash of the PIConGPU simulation on k20 #2504

Closed

ax3l added the backend: cuda CUDA backend label Aug 20, 2018

ax3l requested changes Aug 20, 2018

View reviewed changes

ax3l reviewed Aug 20, 2018

View reviewed changes

psychocoderHPC force-pushed the fix-hdf5DumpingRunsOutOfHostMem branch from 7d29093 to 174a03a Compare August 21, 2018 07:21

ax3l approved these changes Aug 21, 2018

View reviewed changes

ax3l merged commit c5b714c into ComputationalRadiationPhysics:dev Aug 21, 2018

0.4.0 Finalize automation moved this from todo to done Aug 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix out of host memory during HDF5 dumping #2690

fix out of host memory during HDF5 dumping #2690

psychocoderHPC commented Aug 16, 2018 •

edited

Loading

ax3l Aug 16, 2018

psychocoderHPC Aug 16, 2018

psychocoderHPC Aug 16, 2018

ax3l Aug 17, 2018

ax3l Aug 17, 2018

ax3l Aug 20, 2018

ax3l Aug 16, 2018

PrometheusPi commented Aug 17, 2018

ax3l commented Aug 17, 2018

psychocoderHPC commented Aug 17, 2018 via email •

edited

Loading

psychocoderHPC commented Aug 20, 2018

ax3l commented Aug 20, 2018

ax3l commented Aug 20, 2018

ax3l left a comment

ax3l Aug 20, 2018

psychocoderHPC Aug 21, 2018

ax3l Aug 21, 2018

ax3l Aug 20, 2018

ax3l Aug 20, 2018

ax3l Aug 20, 2018

ax3l Aug 20, 2018

psychocoderHPC commented Aug 21, 2018

fix out of host memory during HDF5 dumping #2690

fix out of host memory during HDF5 dumping #2690

Conversation

psychocoderHPC commented Aug 16, 2018 • edited Loading

Tests

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PrometheusPi commented Aug 17, 2018

ax3l commented Aug 17, 2018

psychocoderHPC commented Aug 17, 2018 via email • edited Loading

psychocoderHPC commented Aug 20, 2018

ax3l commented Aug 20, 2018

ax3l commented Aug 20, 2018

ax3l left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

psychocoderHPC commented Aug 21, 2018

psychocoderHPC commented Aug 16, 2018 •

edited

Loading

psychocoderHPC commented Aug 17, 2018 via email •

edited

Loading