Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdpar] Implement {m,aligned_}alloc and free #1114

Merged
merged 9 commits into from
Sep 12, 2023

Conversation

nilsfriess
Copy link
Collaborator

No description provided.

@illuhad
Copy link
Collaborator

illuhad commented Aug 23, 2023

Great to see! Can you also update doc/stdpar.md? It says there in the "memory model" section that only new/delete are hijacked and malloc is not.

We should also look at malloc's other C-style friends like realloc and calloc, but that can be a separate PR :)

@nilsfriess
Copy link
Collaborator Author

Great to see! Can you also update doc/stdpar.md? It says there in the "memory model" section that only new/delete are hijacked and malloc is not.

Done :)

We should also look at malloc's other C-style friends like realloc and calloc, but that can be a separate PR :)

Yes definitely. I think calloc and aligned_alloc should be relatively easy to add, and I think the C standard also allows realloc to just free the old memory and allocate new memory instead of reusing the already allocated memory (if possible), so we could just do that.

There is no `__libc_aligned_alloc` so we cannot do the same thing as
with `malloc` and `free` to avoid recursive calls to our own
implementation. However, there is an equivalent function
`posix_memalign` that we can use (at least on Linux).
@nilsfriess
Copy link
Collaborator Author

I now also added calloc and aligned_alloc. I think realloc is not that easy to implement because the standard says

The reallocation is done by [...] allocating a new memory block of size new_size bytes, copying memory area with size equal the lesser of the new and the old sizes, and freeing the old block.

That means it needs to know the size of the old allocation and I don't really know where to get that from

@nilsfriess nilsfriess changed the title [pstl] Implement malloc and free [pstl] Implement {m,c,aligned_}alloc and free Aug 24, 2023
@illuhad
Copy link
Collaborator

illuhad commented Aug 24, 2023

One solution could be to allocate more memory than needed and letting the actual allocation follow a header that contains the size. When we need to realloc, we could then access the header by subtracting sizeof(header) from the user pointer. This seems to be how roc-stdpar does it:

https://github.com/ROCmSoftwarePlatform/roc-stdpar/blob/25e68a71e9f6d96478d440af9ea858540e448a3f/include/hipstdpar_lib.hpp#L167

However, I don't see code there to actually copy the data which might be an oversight.
I'm not sure if this is the best right solution, and whether the consequences are worth it (e.g. then you'd no longer have interoperability between sycl::malloc_shared/free and our hijacked memory functions, as you can no longer use a pointer with both).

I'd be fine leaving realloc unimplemented for now if you prefer.

@nilsfriess
Copy link
Collaborator Author

I think for now we can leave it unimplemented, I think it’s too much effort to implement it “just in case”.

@illuhad illuhad changed the title [pstl] Implement {m,c,aligned_}alloc and free [stdpar] Implement {m,c,aligned_}alloc and free Aug 24, 2023

static void memset(void* ptr, int value, std::size_t num_bytes) {
if(thread_local_storage::get().disabled_stack == 0) {
detail::single_device_dispatch::get_queue().memset(ptr, value, num_bytes);
Copy link
Collaborator

@illuhad illuhad Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three remarks here:

  • I think here we definitely need a wait() call. Otherwise there's no guarantee the memset is complete by the time calloc returns the pointer.
  • Also, it might be worth a look into whether hipsycl::algorithms::fill() from algorithms/algorithm.hpp might be a better fit here. It's not always best for performance to use queue::memset(), sometimes a queue::parallel_for() with a manual fill kernel is better. For example, on NUMA systems on CPU, parallel_for might better take locality into account. The fill() implementation has special code paths for these cases.
  • Can you double check whether the disabled_stack condition is sufficient here? malloc has more complex conditions (disable_stack==0, usm_context::is_alive() (this prevents USM operations in the shutdown phase of the application), and a check on the pointer type (I assume this we can ignore because memset should only be called by calloc in the USM path, when we know that it is a USM pointer)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we definitely need a wait() call. Otherwise there's no guarantee the memset is complete by the time calloc returns the pointer.

Yup, that makes sense.

Also, it might be worth a look into whether hipsycl::algorithms::fill() from algorithms/algorithm.hpp might be a better fit here.

Yes that sounds like a better approach.

Can you double check whether the disabled_stack condition is sufficient here? malloc has more complex conditions (disable_stack==0, usm_context::is_alive())

malloc currently does not check whether usm_context::is_alive() (or am I missing something?), only free does, that's why I didn't add a check for that here. But I think I might redo this whole part a bit differently anyway and use malloc to allocate memory and use either hipsycl::algorithms::fill_n or libc's memset if malloc gave us a pointer with hipsycl::sycl::get_pointer_type == unknown to zero the memory region. So all allocations are then done using malloc and we don't need to reinvent the wheel here.

@nilsfriess nilsfriess changed the title [stdpar] Implement {m,c,aligned_}alloc and free [stdpar] Implement {m,aligned_}alloc and free Sep 12, 2023
@nilsfriess
Copy link
Collaborator Author

I removed the calloc parts from this PR again as we've discussed.

Copy link
Collaborator

@illuhad illuhad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@illuhad illuhad merged commit d75e4fe into AdaptiveCpp:develop Sep 12, 2023
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants