Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: multithreading? #1002

Closed
wants to merge 3 commits into from
Closed

RFC: multithreading? #1002

wants to merge 3 commits into from

Conversation

timholy
Copy link
Member

@timholy timholy commented Jul 2, 2012

I decided to test whether it would be possible to use pthreads (actually, libuv threads) with Julia. I was motivated to do this to avoid the overhead of creating DArrays. For example, if one wanted to create a parallel version of the "sum(v::Vector)" function, in most cases it might not be worth it do break into a DArray, do a parallel sum, and then recombine. I'm particularly motivated by the needs of the image library.

The result of my experiment is a segfault. Is this a "deep" problem, or am I doing something wrong? Or are there doable changes that could be made in Julia, e.g., suspending garbage collection?

A gist with the Julia test function is here https://gist.github.com/3033117

To get the segfault :-) you have to re-enable the calls to uv_thread_create and uv_thread_join (and best to disable the call to jl_apply).

Actual threading operations must be disabled, or a segfault
occurs. This currently fakes it by running the requested function
immediately.
@StefanKarpinski
Copy link
Member

I knew this would come up at some point. @JeffBezanson will have to address this more fully, but yes, in short there are all sorts of deep issues with making Julia multithreaded (object allocation, garbage collection, code gen, etc.). I've contemplated it many times, but I'm not sure it's actually a good idea. I rather favor the idea that certain kernels (like OpenBLAS operations and maybe parallel reduce operations) exhibit limited thread-parallelism while not exposing full thread-based concurrency to the user.

@timholy
Copy link
Member Author

timholy commented Jul 2, 2012

I did a little more experimenting. It occurred to me that one thing that might help is to make sure the function has been executed once to make sure it is already compiled.

I updated the gist with a version that, when you call runthreads(v,index1,index2), "only" gives a MemoryError(). Since those are returned by the gc and nowhere else (I think), something is still trying allocate memory despite my best attempts to stomp out any kind of memory allocation. I'm guessing it's that return nothing statement?

Interestingly, I've also noticed that if you run the commands inside runthreads manually on the command line, that everything works just fine! I'm guessing that there's some kind of collision (race condition) in the gc that gives the MemoryError()?

I appreciate that there may be very difficult issues here, I'm just trying to understand whether they are solvable, if perhaps even in a limited way (e.g., "submit your kernel to the analyzer function, and it's very picky, but if it passes then there are certain operations you can do with threads").

@JeffBezanson
Copy link
Member

Cool that you tried this, but it simply can't even remotely work without some deep work in the runtime system. We have all sorts of global state like GC data structures and method caches that get updated behind your back, and they would need to be thread-safe.

@timholy
Copy link
Member Author

timholy commented Jul 2, 2012

Presuming you're trying to get to 1.0, certainly now is not the time for deep surgery.

Would this be an easy (or easier) problem if one could disable memory allocation and code generation inside of any sub-thread? I.e., all code has to be pre-compiled (e.g., by running a small-scale test problem before calling threads), and all memory has to be pre-allocated (pass input, output, and temporary arrays/structures from the outside). This is restrictive, I know, although I usually do precisely that when I write multithreaded C code. For example, for Matlab I have the mexFunction wrapper allocate memory for all the threads, and then I assign each their little sub-problem, and then clean up after all the threads re-join. Would be lovely to be able to write the worker thread functions in Julia!

But writing allocation-free code in Julia could be trickier than it sounds.

@JeffBezanson
Copy link
Member

Yes, it is much trickier than it sounds. Julia freely uses memory allocation for all sorts of things and there is no realistic way to avoid it by following certain coding rules.

@timholy
Copy link
Member Author

timholy commented Jul 2, 2012

Gotcha. OK, let's think about this again for 2.0!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants