-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdpar] Implement {m,aligned_}alloc
and free
#1114
Conversation
Great to see! Can you also update We should also look at malloc's other C-style friends like |
Done :)
Yes definitely. I think calloc and aligned_alloc should be relatively easy to add, and I think the C standard also allows realloc to just free the old memory and allocate new memory instead of reusing the already allocated memory (if possible), so we could just do that. |
There is no `__libc_aligned_alloc` so we cannot do the same thing as with `malloc` and `free` to avoid recursive calls to our own implementation. However, there is an equivalent function `posix_memalign` that we can use (at least on Linux).
92f98e5
to
fe76d1d
Compare
I now also added
That means it needs to know the size of the old allocation and I don't really know where to get that from |
malloc
and free
{m,c,aligned_}alloc
and free
One solution could be to allocate more memory than needed and letting the actual allocation follow a header that contains the size. When we need to realloc, we could then access the header by subtracting However, I don't see code there to actually copy the data which might be an oversight. I'd be fine leaving realloc unimplemented for now if you prefer. |
I think for now we can leave it unimplemented, I think it’s too much effort to implement it “just in case”. |
{m,c,aligned_}alloc
and free
{m,c,aligned_}alloc
and free
|
||
static void memset(void* ptr, int value, std::size_t num_bytes) { | ||
if(thread_local_storage::get().disabled_stack == 0) { | ||
detail::single_device_dispatch::get_queue().memset(ptr, value, num_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three remarks here:
- I think here we definitely need a
wait()
call. Otherwise there's no guarantee the memset is complete by the timecalloc
returns the pointer. - Also, it might be worth a look into whether
hipsycl::algorithms::fill()
fromalgorithms/algorithm.hpp
might be a better fit here. It's not always best for performance to usequeue::memset()
, sometimes aqueue::parallel_for()
with a manual fill kernel is better. For example, on NUMA systems on CPU,parallel_for
might better take locality into account. Thefill()
implementation has special code paths for these cases. - Can you double check whether the
disabled_stack
condition is sufficient here?malloc
has more complex conditions (disable_stack==0
,usm_context::is_alive()
(this prevents USM operations in the shutdown phase of the application), and a check on the pointer type (I assume this we can ignore becausememset
should only be called bycalloc
in the USM path, when we know that it is a USM pointer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we definitely need a
wait()
call. Otherwise there's no guarantee the memset is complete by the timecalloc
returns the pointer.
Yup, that makes sense.
Also, it might be worth a look into whether
hipsycl::algorithms::fill()
fromalgorithms/algorithm.hpp
might be a better fit here.
Yes that sounds like a better approach.
Can you double check whether the
disabled_stack
condition is sufficient here?malloc
has more complex conditions (disable_stack==0
,usm_context::is_alive()
)
malloc
currently does not check whether usm_context::is_alive()
(or am I missing something?), only free
does, that's why I didn't add a check for that here. But I think I might redo this whole part a bit differently anyway and use malloc
to allocate memory and use either hipsycl::algorithms::fill_n
or libc's memset
if malloc
gave us a pointer with hipsycl::sycl::get_pointer_type == unknown
to zero the memory region. So all allocations are then done using malloc
and we don't need to reinvent the wheel here.
{m,c,aligned_}alloc
and free
{m,aligned_}alloc
and free
I removed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
No description provided.