Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix out of host memory during HDF5 dumping #2690

Conversation

psychocoderHPC
Copy link
Member

@psychocoderHPC psychocoderHPC commented Aug 16, 2018

For HDF5 checkpoints and normal dumps we use CUDA mapped memory to transfere the particles to the host memory. Due to the usage of cupla and the reason that mapped memory is not supported by cupla and alpaka we fall back to native CUDA functions. To free mapped memory the function cudaFreeHost. This function is also used to free normal host memory allocated with cudaMallocHost which is supported by cupla.
The cupla internal macros rename cudaFreeHost to cuplaFreeHost and fall back to alpaka functionallity to free the memory. In the case where we allocated the memory without the knowledge of cupla, cupla is trowing an error and cudaFreeHost will never be called.
The result in PIConGPU is that with each particle dump the host memory footprint grows until we run out of memory.

  • fix broken memory freeing
  • MappedBufferIntern
    • fix that wrong function is used to allocate mapped memory
    • add workaround to free mapped memory
    • add fallpack if an CPU accelerator is used

Note: MappedBufferIntern is not used for long time therefore the wrong allocation call was not seen by any user of PMacc

This bugfix should solve: #2504

Tests

  • KHI with 192x512x64 cells on 1 GPU

CC-ing: @PrometheusPi @steindev @HighIander @BeyondEspresso

@psychocoderHPC psychocoderHPC added bug a bug in the project's code backend: serial Serial CPU Backend labels Aug 16, 2018
@psychocoderHPC psychocoderHPC added this to todo in 0.4.0 Finalize via automation Aug 16, 2018
@psychocoderHPC psychocoderHPC added component: core in PIConGPU (core application) component: PMacc in PMacc labels Aug 16, 2018
@ax3l ax3l added the component: plugin in PIConGPU plugin label Aug 16, 2018
*/
if(rc != cuplaErrorMemoryAllocation)
CUDA_CHECK(rc)
/* cupla is not supporting the function cudaHostAlloc to create mapped memory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please open a cupla & alpaka issue and link it here in the comment?

Even if it might currently be a wontfix it needs to be doc-ed as an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can open a feature request in alpaka.
This bug is not important for cupla because I as user ignored the error cupla provided.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, please link it in both comments in-code :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: cupla 0.1.0 does not support ...

@@ -69,7 +73,21 @@ class MappedBufferIntern : public DeviceBuffer<TYPE, DIM>

if (pointer && ownPointer)
{
CUDA_CHECK(cudaFreeHost(pointer));
#if( PMACC_CUDA_ENABLED == 1 )
/* cupla is not supporting the function cudaHostAlloc to create mapped memory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link same issues here, please

@ax3l ax3l mentioned this pull request Aug 16, 2018
4 tasks
@PrometheusPi
Copy link
Member

I tested your bug fix on the two simulations mentioned above. They do no longer crash. Great work. 👍

@ax3l ax3l added backend: omp2b OpenMP2 backend backend: tbb TBB CPU backend labels Aug 17, 2018
@ax3l
Copy link
Member

ax3l commented Aug 17, 2018

@psychocoderHPC if this is also fixing #2504, then this also affects the CUDA backend. Can you confirm?

@psychocoderHPC
Copy link
Member Author

psychocoderHPC commented Aug 17, 2018 via email

@psychocoderHPC
Copy link
Member Author

#2504 is solved by this PR.

@ax3l
Copy link
Member

ax3l commented Aug 20, 2018

Ok, so I tag this as CUDA bug now as well.

@ax3l
Copy link
Member

ax3l commented Aug 20, 2018

I merge this now, but I think the alpaka issue should have been linked as annotated. If you like to add this, pls do in a follow-up PR.

Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't merge now.
Please add:

  • link to alpaka issue on mapped mem (2x)
  • #ifdef before #undef guard

Please amend the changes, so we have one commit in the end.

* @todo this is a workaround plese fix me. We need to investigate if
* it is possible to have mapped/unified memory in alpaka.
*/
# undef cudaFreeHost
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guard this with a #ifdef

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will skip this because it is already guarded by PMACC_CUDA_ENABLED. cupla always defines cudaFreeHost if not than something is broken anyway. Also if I guard this code part I need also to guard the reintroducing of # define cudaFreeHost(...) cuplaFreeHost(__VA_ARGS__) in the case it was not defined before.
All this will become only unreadable without increasing the code quality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why one would guard a def or undef is to make it more flexible for other hacks/workarounds that might (un)set.

You can keep it as is under the mentioned assumption, but that would make it more flexible.

* @todo this is a workaround plese fix me. We need to investigate if
* it is possible to have mapped/unified memory in alpaka.
*/
# undef cudaFreeHost
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guard this with an #ifdef cudaFreeHost

if(rc != cuplaErrorMemoryAllocation)
CUDA_CHECK(rc)
/* cupla is not supporting the function cudaHostAlloc to create mapped memory.
* Therefore we need to call the native CUDA function cudaFreeHost to free the memory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need -> have
free the memory

* Therefore we need to call the native CUDA function cudaFreeHost to free the memory.
* Due to the renaming of cuda functions with cupla via macros we need to remove
* the renaming to get access to the native cuda function.
* @todo this is a workaround plese fix me. We need to investigate if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please

* Therefore we need to call the native CUDA function cudaFreeHost to free the memory.
* Due to the renaming of cuda functions with cupla via macros we need to remove
* the renaming to get access to the native cuda function.
* @todo this is a workaround plese fix me. We need to investigate if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same typos as above :)

For HDF5 checkpoints and normal dumps we use CUDA mapped memory to transfere the
particles to the host memory. Due to the usage of cupla and the reason that
mapped memory is not supported by cupla and alpaka we fall back to native CUDA
functions. To free mapped memory the function `cudaFreeHost`. This function is also used to free normal host memory allocated with `cudaMallocHost` which is supported by cupla.
The cupla internal macros rename `cudaFreeHost` to `cuplaFreeHost` and fall back to alpaka
functionallity to free the memory. In the case where we allocated the memory with without the knowlage of cupla cupla is trowing an error and `cudaFreeHost` will never be called.
The result in PIConGPU is that with each particle dump the host memory footprint grows until we run out of memory.

- fix broken memory freeing
- `MappedBufferIntern`
  - fix that wrong function is used to allocate mapped memory
  - add workaround to free mapped memory
  - add fallpack if an CPU accelerator is used
@psychocoderHPC
Copy link
Member Author

I linked the alpaka issues and fixed the typos/spelling

The guards make no sense #2690 (comment)

I will keep this PR as it is, including the undef of the cupla defines. I will have a look into the alpaka method to create unified/mapped memory and if possible I will create a follow up PR.

note: This PR is well tested and I would like to avoid to break it again with untested alapka features.

@ax3l ax3l merged commit c5b714c into ComputationalRadiationPhysics:dev Aug 21, 2018
0.4.0 Finalize automation moved this from todo to done Aug 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: cuda CUDA backend backend: omp2b OpenMP2 backend backend: serial Serial CPU Backend backend: tbb TBB CPU backend bug a bug in the project's code component: core in PIConGPU (core application) component: plugin in PIConGPU plugin component: PMacc in PMacc
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants