-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread blocked indefinitely #48
Comments
Actually, I have a related question on this same code that I might as well ask here. The thread that does puts the MVar after GPU results are available is forked with forkIO here: But doesn't that need to be a forkOS so that the GHC capability isn't stalled by the blocking CUDA call, interfering with other IO threads? I think Accelerate is not passing cuCtxCreate any flags, right now, correct? Which means we would get the default blocking/spinning behavior CU_CTX_SCHED_AUTO: Which in practice means spinning in most cases. However, I think that spinning in a foreign function is just as bad as blocking wrt the GHC RTS, right? Personally, I was hoping we could make CU_CTX_SCHED_BLOCKING_SYNC the Accelerate default so as to be gentle on CPU resources. It's a bit solipsistic for NVidia to make spinning the default -- waste power and screw up whatever the CPU is trying to do because, obviously, the GPU computation is the only thing you should care about ;-). |
forkIO vs. forkOS makes no difference to blocking, it only affects which OS thread the foreign call is made in. As long as the foreign call is marked "safe", it won't block other Haskell threads (provided you use -threaded). |
Ok, just to make sure I'm following -- If we compile with --threaded, and run with +RTS -N1, and then forkIO ten threads, one of which does a blocking CUDA call which blocks the hosting pthread for, say, a week, the other nine IO threads will have a chance to run in the intervening week, right?
Ah, so extra OS threads are forked on demand! Very nice. But from that Wiki I don't yet understand when these are forked. It would seem every "safe" foreign call from an IO thread can result in an OS thread being created? For example if we forkIO 10K threads, and do 10K foreign calls, we can end up with 10K OS threads, irrespective of +RTS -N, right? Is it fair to say that blocking foreign calls should be marked as "safe"? This wiki makes it sound as if safe/unsafe is just a question of whether the foreign function calls back into Haskell. But if, again, a foreign call blocks for a week on CUDA, even if it doesn't call back into Haskell, we want it on its own OS thread... It looks like "waitForReturnCapability" is for returning safe FFI calls to get back in: |
P.S. It looks like 100% of the foreign decls in the This would need to be changed in a couple places to get the behavior Simon describes, right? |
@simonmar okay, thanks! I have added some notes to the documentation as to why this happens. 'seq' is definitely one way to go avoiding it. |
@rrnewton the default context does not pass any context creation flags, so yes, it would just pick up the CU_CTX_SCHED_AUTO spinning behaviour. Changing to CU_CTX_SCHED_BLOCKING_SYNC would be good, although I think some explicit synchronisation points will need to be added in the execute phase --- I'm pretty sure I know how to do that now, however. From memory, 100% of the foreign calls in the I've no objection to changing the foreign calls to "safe". |
On 16/05/2012 03:28, Ryan Newton wrote:
Correct. The best docs for this are in the GHC users guide: http://www.haskell.org/ghc/docs/latest/html/users_guide/ffi-ghc.html#ffi-threads
Also correct. Every blocked FFI call needs a separate OS thread, and we More background here: This is why we have the IO manager: it handles the majority of blocking
Absolutely.
rts/Schedule.c:suspendThread() is called by the Haskell code right Cheers, |
On 16/05/2012 04:23, Trevor L. McDonell wrote:
Best would be to just mark the long-running ones as safe, leave the Cheers, |
I'm finding it difficult to dig up information on exactly how the cuda driver interacts with the OS (e.g. How does it block/wake pthreads?). |
@rrnewton notes:
If we make the right things strict in the library, the user shouldn't have to deal with adding |
This was an issue that cropped up in spite of single threaded use of the API. What are the issues with multi-threaded use of the API? Won't two threads calling "run" both grab the default context (the same context), leading to the same issue? Or is there something I'm forgetting? |
If two threads call So if you have a multi-threaded application, you probably want to use the The issue here was that the computations were dependent, and the second had already taken the context before realising the first was still to be evaluated --- deadlock. It might be enough to make Aside: it always calls |
@tmcdonell Yes, why have an MVar around the context? WIthout the MVar, all this wouldn't be an issue, or would it? |
I mangled the commit tag, but here it is: AccelerateHS/accelerate-cuda@9f019ce Thinking about why I did this, it occurs to me that what I really wanted was exclusive access to the device to avoid context switching (I recall trying this once and it was much slower than executing the two programs sequentially), but actually using a single context from different points is fine. |
Ok, great, but can't we close the issue in this case? Or is there an outstanding problem? |
The 'thread blocked' issues is resolved. I've created a new issue w.r.t. what Ryan mentions above regarding context synchronisation behaviour. |
Thanks. |
When using the CUDA backend, but not the interpreter, it is easy to get a "thread blocked indefinitely on MVar" exception by having one GPU computation depend on another. I presume this is due to the use of
withMVar
inrun
, so I worked around it with some seqs. Is that right?This seems like a bug, since if something works with the interpreter we would expect it to work with the CUDA backend. I can imagine it would be difficult to fix though. Could it be documented somewhere?
The text was updated successfully, but these errors were encountered: