LLVM does not support the __thread attribute for thread local storage and may generate incorrect code for global register variables. We want to allow building the runtime with LLVM-based compilers such as llvm-gcc and clang, particularly for MacOS. This patch changes the gct variable used by the garbage collector to use pthread_getspecific() for thread local storage when an llvm based compiler is used to build the runtime.
Based on a patch from David Terei. Some parts are a little ugly (e.g. defining things that only ASSERTs use only when DEBUG is defined), so we might want to tweak things a little. I've also turned off -Werror for didn't-inline warnings, as we now get a few such warnings.
This is a port of some of the changes from my private local-GC branch (which is still in darcs, I haven't converted it to git yet). There are a couple of small functional differences in the GC stats: first, per-thread GC timings should now be more accurate, and secondly we now report average and maximum pause times. e.g. from minimax +RTS -N8 -s: Tot time (elapsed) Avg pause Max pause Gen 0 2755 colls, 2754 par 13.16s 0.93s 0.0003s 0.0150s Gen 1 769 colls, 769 par 3.71s 0.26s 0.0003s 0.0059s
…l (#4850) The bug in this case was that we had a worker thread making a foreign call which invoked a callback (in this case it was performGC, I think). When the callback ended, boundTaskExiting() was setting task->stopped, but the Task is now per-OS-thread, so it is shared by the worker that made the original foreign call. When the foreign call returned, because task->stopped was set, the worker was not placed on the queue of spare workers. Somehow the worker woke up again, and found the spare_workers queue empty, which lead to a crash. Two bugs here: task->stopped should not have been set by boundTaskExiting (this broke when I split the Task and InCall structs, in 6.12.2), and releaseCapabilityAndQueueWorker() should not be testing task->stopped anyway, because it should only ever be called when task->stopped is false (this is now an assertion).
This is patch that adds support for interruptible FFI calls in the form of a new foreign import keyword 'interruptible', which can be used instead of 'safe' or 'unsafe'. Interruptible FFI calls act like safe FFI calls, except that the worker thread they run on may be interrupted. Internally, it replaces BlockedOnCCall_NoUnblockEx with BlockedOnCCall_Interruptible, and changes the behavior of the RTS to not modify the TSO_ flags on the event of an FFI call from a thread that was interruptible. It also modifies the bytecode format for foreign call, adding an extra Word16 to indicate interruptibility. The semantics of interruption vary from platform to platform, but the intent is that any blocking system calls are aborted with an error code. This is most useful for making function calls to system library functions that support interrupting. There is no support for pre-Vista Windows. There is a partner testsuite patch which adds several tests for this functionality.
…rsal In GHC 6.12.x I found a rare deadlock caused by this lock-order-reversal: AQ cap->lock startWorkerTask newTask AQ sched_mutex scheduleCheckBlackHoles AQ sched_mutex unblockOne_ wakeupThreadOnCapabilty AQ cap->lock so sched_mutex and cap->lock are taken in a different order in two places. This doesn't happen in the HEAD because we don't have scheduleCheckBlackHoles, but I thought it would be prudent to make this less likely to happen in the future by using a different mutex in newTask. We can clearly see that the all_tasks mutex cannot be involved in a deadlock, becasue we never call anything else while holding it.
The idea is that this leaves Tasks and OSThread in one-to-one correspondence. The part of a Task that represents a call into Haskell from C is split into a separate struct InCall, pointed to by the Task and the TSO bound to it. A given OSThread/Task thus always uses the same mutex and condition variable, rather than getting a new one for each callback. Conceptually it is simpler, although there are more types and indirections in a few places now. This improves callback performance by removing some of the locks that we had to take when making in-calls. Now we also keep the current Task in a thread-local variable if supported by the OS and gcc (currently only Linux).
The first phase of this tidyup is focussed on the header files, and in particular making sure we are exposinng publicly exactly what we need to, and no more. - Rts.h now includes everything that the RTS exposes publicly, rather than a random subset of it. - Most of the public header files have moved into subdirectories, and many of them have been renamed. But clients should not need to include any of the other headers directly, just #include the main public headers: Rts.h, HsFFI.h, RtsAPI.h. - All the headers needed for via-C compilation have moved into the stg subdirectory, which is self-contained. Most of the headers for the rest of the RTS APIs have moved into the rts subdirectory. - I left MachDeps.h where it is, because it is so widely used in Haskell code. - I left a deprecated stub for RtsFlags.h in place. The flag structures are now exposed by Rts.h. - Various internal APIs are no longer exposed by public header files. - Various bits of dead code and declarations have been removed - More gcc warnings are turned on, and the RTS code is more warning-clean. - More source files #include "PosixSource.h", and hence only use standard POSIX (1003.1c-1995) interfaces. There is a lot more tidying up still to do, this is just the first pass. I also intend to standardise the names for external RTS APIs (e.g use the rts_ prefix consistently), and declare the internal APIs as hidden for shared libraries.
Also move closeMutex() etc. into freeTaskManager, this is a free-ish thing
We were freeing the tasks in exitScheduler (stopTaskManager) before exitStorage (stat_exit), but the latter needs to walk down the list printing stats. Resulted in segfaults with commands like ghc -v0 -e main q.hs -H32m -H32m +RTS -Sstderr (where q.hs is trivial), but very sensitive to exact commandline and libc version or something.
In preparation for parallel GC, split up the monolithic GC.c file into smaller parts. Also in this patch (and difficult to separate, unfortunatley): - Don't include Stable.h in Rts.h, instead just include it where necessary. - consistently use STATIC_INLINE in source files, and INLINE_HEADER in header files. STATIC_INLINE is now turned off when DEBUG is on, to make debugging easier. - The GC no longer takes the get_roots function as an argument. We weren't making use of this generalisation.
See #805. This was here to catch bugs that resulted in an infinite number of worker threads being created. However, we can't put a reasonable bound on the number of worker threads, because legitimate programs may need to create large numbers of (probably blocked) worker threads. Furthermore, the OS probably has a bound on the number of threads that a process can create in any case.