-
Notifications
You must be signed in to change notification settings - Fork 59
Description
The RcppParallel provided Intel TBB library is being used in RStan for a while now. In benchmarks we found out that specifically on macOS the use of the tbbmalloc_proxy library speeds up Stan programs by ~20%. Loading the tbbmalloc_proxy library replaces all calls to the system malloc to the replacement from the TBB. The upside of doing so is that no source code needs to be changed at all in order to gain from the speed benefits of the TBB provided malloc, which is designed to work well will threaded programs (see here for details).
From running benchmarks with Stan programs it turned out that there are clear speedups of ~20% on macOS when using the TBB malloc while other platforms did not really gain in speed. This is why the TBB malloc is only enabled for Stan programs on macOS, but there the speed gains are really nice.
So I wonder if this would be of interest to enable (maybe optionally) loading the tbbmalloc_proxy with RcppParallel.
If that is an option, I am happy to provide a PR for such an optional feature. If there is anything to consider for such a feature, please let me know.