Warning comment on numa_allocator is not very clear #1895

thedrow · 2015-12-06T18:18:16Z

https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/parallel/util/numa_allocator.hpp#L24
Can anyone elaborate this?
Possibly adding the explanation as a comment to the code.

hkaiser · 2015-12-06T20:06:55Z

[12:31] the_drow: hi guys, can someone explain the warning comment on the numa_allocator?
[12:32] zao: https://github.com/STEllAR-GROUP/hpx/blob/2926ad6811040fee3f74d903419270f73e3ddc12/hpx/parallel/util/numa_allocator.hpp#L24
[12:33] zao: This one?
[12:35] the_drow: zao: yeh
[12:40] the_drow: zao: Any idea?
[12:43] zao: I don't recall what it was all about when new.
[12:43] zao: Don't have my IRC logs around anymore, but I'd reckon you could poke the people responsible for it.
[12:43] the_drow: I opened an issue about it
[12:44] the_drow: I'm interested in learning more about these stuff
[12:44] the_drow: NUMA, SIMD and other parallalization technics
[12:57] the_drow: Can anyone explain hpx vs. OpenMP for non-distributed computing?
[12:59] zao: OpenMP directives tend to be mostly limited to fan-out/fan-in parallelization of previously serial code.
[13:00] the_drow: and hpx?
[13:00] zao: "take this loop and split it up over a bunch of cores and pray it works well".
[13:01] zao: HPX has parallel implementations of the standard algorithms, built upon a layer of composable futures.
[13:01] zao: So you glue together a bunch of "when this completes, do <X>" and let the runtime schedule them to run on real threads, on the current or other machines.
[13:02] the_drow: So it's based on coroutines for I/O?
[13:02] the_drow: What if I'm 100% CPU bound?
[13:03] zao: Depends on what you burn your CPU for.
[13:03] K-ballo: why "for I/O"?
[13:03] zao: When you use the HPX primitives for mutexes, barriers, waiting for futures, etc, the runtime will preempt the green threads of computation and run something more relevant.
[13:03] the_drow: coroutines are usually used to avoid blocking on I/O
[13:03] zao: I don't think HPX does much at all around I/O.
[13:04] zao: My impression is that the point is to have a lot of small bits of work in flight and avoid bubbles.
[13:04] zao: Your "100% CPU" may be quite wasted in blocking on thread-level synchronization.
[13:08] the_drow: Ok, I see where I got it wrong. I'm comming from languages with a GIL so only one thread can execute code. In C++ multiple threads can execute code on multiple cores so we can perform CPU intensive tasks while waiting for other tasks to complete. I/O bound or CPU bound
[13:08] the_drow: Ok cool
[13:11] the_drow: So if I take Postgres for example and use hpx I can parallelize tasks very easily right? Why aren't there any databases written in such methods?
[13:17] the_drow: Is there a relation to SIMD or CUDA accelerated applications?
[13:21] zao: Some try to glue HPX together with CUDA and OpenCL, to weave compute operations into HPX actions.
[13:21] zao: SIMD is rather orthogonal, that's more about what you do in your actions, not how they're scheduled or anything.
[13:22] hkaiser: the_drow: heh
[13:22] hkaiser: the comment states the truth :-P
[13:22] the_drow: hkaiser: yeh but what truth?
[13:23] hkaiser: the_drow: so the question is what would you expect it to do?
[13:23] the_drow: Something along the lines of implementing something like memkind for STL containers
[13:23] the_drow: https://github.com/memkind/memkind
[13:24] hkaiser: the_drow: I don't know anything about this, need to look closer
[13:24] the_drow: Or if we're talking buzzwords, NUMA aware STL containers
[13:24] hkaiser: but what is it you are looking for - more concretely?
[13:24] the_drow: RethinkDB has tons of STL and STL like usage which could use a better allocator
[13:25] hkaiser: well, the allocator will make sure the alocated memory is placed close to the threads encapsulated by the given executor, so that access to the allocated memory later on through the same executor will guarantee for it to be placed optimally
[13:25] the_drow: I am trying to find a NUMA aware allocator for STL containers that will help me benchmark performance differences with NUMA aware containers and without them
[13:25] the_drow: hkaiser: They have their own scheduler
[13:25] the_drow: Replacing it with hpx won't be feasable
[13:26] hkaiser: well, you'll need to do something similar for their scheduler, then
[13:26] the_drow: Or at least will take a lot of work
[13:27] circleci-bot has disconnected: Client Quit
[13:28] the_drow: hkaiser: So the allocator is related to https://en.wikipedia.org/wiki/Non-uniform_memory_access
[13:29] the_drow: hkaiser: but I don't see any libnuma usage
[13:29] the_drow: Did you guys write this yourselves?
[13:29] hkaiser: the allocator uses the first touch policy to make sure the OS places the allocated memory close to the thread/core which touches it first
[13:30] hkaiser: the_drow: here https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/parallel/util/numa_allocator.hpp#L96
[13:31] the_drow: So this is a custom NUMA usecase?
[13:31] hkaiser: what do you mean by 'custom'?
[13:31] the_drow: I mean that libnuma exposes many many APIs to control allocation
[13:32] the_drow: memkind uses them to provide a customized jemalloc that works with it
[13:32] the_drow: hkaiser: what am I missing here?
[13:32] hkaiser: ok, we have not put a lot of effort into this, just have used the 'first touch policy' (google it)
[13:33] hkaiser: libnuma is very costly, we tried to avoid it
[13:33] the_drow: Oh, that answers my question
[13:33] the_drow: why btw?
[13:35] the_drow: hkaiser: Is it badly written?
[13:56] the_drow: hkaiser: Thanks for the info
[14:02] hkaiser: the_drow: it has to invoke the kernel which adds overheads
[14:03] the_drow: hmm
[14:03] the_drow: but for big allocations it's negligible right?
[14:05] the_drow: I can see now why you wouldn't need it to implement a first touch policy. But what about more complicated things like memkind does? (NUMA aware arena allocators for example)
[14:05] hkaiser: the_drow: could be, we have not done any measurements, so this is pure conjecture

hkaiser · 2015-12-29T16:36:18Z

@thedrow how would you like this comment to be changed to be more useful for you?

hkaiser · 2016-04-02T14:21:46Z

I'll close this because of lack of input. Feel free to reopen.

hkaiser added category: data-structures category: executors labels Dec 6, 2015

hkaiser added this to the 0.9.12 milestone Dec 6, 2015

hkaiser added the category: documentation label Dec 6, 2015

hkaiser closed this as completed Apr 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warning comment on numa_allocator is not very clear #1895

Warning comment on numa_allocator is not very clear #1895

thedrow commented Dec 6, 2015

hkaiser commented Dec 6, 2015

hkaiser commented Dec 29, 2015

hkaiser commented Apr 2, 2016

Warning comment on numa_allocator is not very clear #1895

Warning comment on numa_allocator is not very clear #1895

Comments

thedrow commented Dec 6, 2015

hkaiser commented Dec 6, 2015

hkaiser commented Dec 29, 2015

hkaiser commented Apr 2, 2016