Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning comment on numa_allocator is not very clear #1895

Closed
thedrow opened this issue Dec 6, 2015 · 3 comments
Closed

Warning comment on numa_allocator is not very clear #1895

thedrow opened this issue Dec 6, 2015 · 3 comments

Comments

@thedrow
Copy link

thedrow commented Dec 6, 2015

https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/parallel/util/numa_allocator.hpp#L24
Can anyone elaborate this?
Possibly adding the explanation as a comment to the code.

@hkaiser
Copy link
Member

hkaiser commented Dec 6, 2015

[12:31] the_drow: hi guys, can someone explain the warning comment on the numa_allocator?
[12:32] zao: https://github.com/STEllAR-GROUP/hpx/blob/2926ad6811040fee3f74d903419270f73e3ddc12/hpx/parallel/util/numa_allocator.hpp#L24
[12:33] zao: This one?
[12:35] the_drow: zao: yeh
[12:40] the_drow: zao: Any idea?
[12:43] zao: I don't recall what it was all about when new.
[12:43] zao: Don't have my IRC logs around anymore, but I'd reckon you could poke the people responsible for it.
[12:43] the_drow: I opened an issue about it
[12:44] the_drow: I'm interested in learning more about these stuff
[12:44] the_drow: NUMA, SIMD and other parallalization technics
[12:57] the_drow: Can anyone explain hpx vs. OpenMP for non-distributed computing?
[12:59] zao: OpenMP directives tend to be mostly limited to fan-out/fan-in parallelization of previously serial code.
[13:00] the_drow: and hpx?
[13:00] zao: "take this loop and split it up over a bunch of cores and pray it works well".
[13:01] zao: HPX has parallel implementations of the standard algorithms, built upon a layer of composable futures.
[13:01] zao: So you glue together a bunch of "when this completes, do <X>" and let the runtime schedule them to run on real threads, on the current or other machines.
[13:02] the_drow: So it's based on coroutines for I/O?
[13:02] the_drow: What if I'm 100% CPU bound?
[13:03] zao: Depends on what you burn your CPU for.
[13:03] K-ballo: why "for I/O"?
[13:03] zao: When you use the HPX primitives for mutexes, barriers, waiting for futures, etc, the runtime will preempt the green threads of computation and run something more relevant.
[13:03] the_drow: coroutines are usually used to avoid blocking on I/O
[13:03] zao: I don't think HPX does much at all around I/O.
[13:04] zao: My impression is that the point is to have a lot of small bits of work in flight and avoid bubbles.
[13:04] zao: Your "100% CPU" may be quite wasted in blocking on thread-level synchronization.
[13:08] the_drow: Ok, I see where I got it wrong. I'm comming from languages with a GIL so only one thread can execute code. In C++ multiple threads can execute code on multiple cores so we can perform CPU intensive tasks while waiting for other tasks to complete. I/O bound or CPU bound
[13:08] the_drow: Ok cool
[13:11] the_drow: So if I take Postgres for example and use hpx I can parallelize tasks very easily right? Why aren't there any databases written in such methods?
[13:17] the_drow: Is there a relation to SIMD or CUDA accelerated applications?
[13:21] zao: Some try to glue HPX together with CUDA and OpenCL, to weave compute operations into HPX actions.
[13:21] zao: SIMD is rather orthogonal, that's more about what you do in your actions, not how they're scheduled or anything.
[13:22] hkaiser: the_drow: heh
[13:22] hkaiser: the comment states the truth :-P
[13:22] the_drow: hkaiser: yeh but what truth?
[13:23] hkaiser: the_drow: so the question is what would you expect it to do?
[13:23] the_drow: Something along the lines of implementing something like memkind for STL containers
[13:23] the_drow: https://github.com/memkind/memkind
[13:24] hkaiser: the_drow: I don't know anything about this, need to look closer
[13:24] the_drow: Or if we're talking buzzwords, NUMA aware STL containers
[13:24] hkaiser: but what is it you are looking for - more concretely?
[13:24] the_drow: RethinkDB has tons of STL and STL like usage which could use a better allocator
[13:25] hkaiser: well, the allocator will make sure the alocated memory is placed close to the threads encapsulated by the given executor, so that access to the allocated memory later on through the same executor will guarantee for it to be placed optimally
[13:25] the_drow: I am trying to find a NUMA aware allocator for STL containers that will help me benchmark performance differences with NUMA aware containers and without them
[13:25] the_drow: hkaiser: They have their own scheduler
[13:25] the_drow: Replacing it with hpx won't be feasable
[13:26] hkaiser: well, you'll need to do something similar for their scheduler, then
[13:26] the_drow: Or at least will take a lot of work
[13:27] circleci-bot has disconnected: Client Quit
[13:28] the_drow: hkaiser: So the allocator is related to https://en.wikipedia.org/wiki/Non-uniform_memory_access
[13:29] the_drow: hkaiser: but I don't see any libnuma usage
[13:29] the_drow: Did you guys write this yourselves?
[13:29] hkaiser: the allocator uses the first touch policy to make sure the OS places the allocated memory close to the thread/core which touches it first
[13:30] hkaiser: the_drow: here https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/parallel/util/numa_allocator.hpp#L96
[13:31] the_drow: So this is a custom NUMA usecase?
[13:31] hkaiser: what do you mean by 'custom'?
[13:31] the_drow: I mean that libnuma exposes many many APIs to control allocation
[13:32] the_drow: memkind uses them to provide a customized jemalloc that works with it
[13:32] the_drow: hkaiser: what am I missing here?
[13:32] hkaiser: ok, we have not put a lot of effort into this, just have used the 'first touch policy' (google it)
[13:33] hkaiser: libnuma is very costly, we tried to avoid it
[13:33] the_drow: Oh, that answers my question
[13:33] the_drow: why btw?
[13:35] the_drow: hkaiser: Is it badly written?
[13:56] the_drow: hkaiser: Thanks for the info
[14:02] hkaiser: the_drow: it has to invoke the kernel which adds overheads
[14:03] the_drow: hmm
[14:03] the_drow: but for big allocations it's negligible right?
[14:05] the_drow: I can see now why you wouldn't need it to implement a first touch policy. But what about more complicated things like memkind does? (NUMA aware arena allocators for example)
[14:05] hkaiser: the_drow: could be, we have not done any measurements, so this is pure conjecture

@hkaiser
Copy link
Member

hkaiser commented Dec 29, 2015

@thedrow how would you like this comment to be changed to be more useful for you?

@hkaiser
Copy link
Member

hkaiser commented Apr 2, 2016

I'll close this because of lack of input. Feel free to reopen.

@hkaiser hkaiser closed this as completed Apr 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants