-
Notifications
You must be signed in to change notification settings - Fork 977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about multithreaded tables within multithreaded code #2031
Comments
See discussion in #1660, I'm not sure if it covers all you have in mind |
Hi, thanks! Yes that seems to be what I am referring to. Searching didn't find that thread. It is unclear from the thread what the conclusion was with regards to default behaviour. I will have to read up on the newer documentation of data.table and openMP. If the default is anything other than 1, my own package is going to have to jump through a lot of hoops to continue to work safely and to maintain interoperability with older versions of data.table. |
I've been catching up with the documentation and news and I found out there is a setDTthreads() function, so that makes my life a lot easier than I anticipated. Thank you! |
Can we get an update for this? for mlr and caret, I would worry that they are using explicit parallelism and data.table in some operations. So if data.table does not respect explicit parallelism, this could create a reasonably sized mess. Furthermore, I can see huge problems with setting the default threads to be greater than one. If users are unaware of this you could be taking up huge amounts of computing resources without their explicit permission or knowledge. |
@fruce-ki Correct - you shouldn't need to change your code if I've understood correctly.. As you saw in the news item, data.table automatically switches down to single-threaded when used inside explicit parallelism. That was hard to detect but I persevered. In fact the crash issue with fork and OpenMP was stated as no known solution in the articles I saw. The NEWS item from v1.9.8 (Nov 2016) was as follows:
@Stevo15025 Yes the default is to use all cores. I think you're being a little dramatic in your choice of words: reasonably sized mess, and huge problems. The vastly most common case is users on sole laptop/desktop or who have a server to themselves and they just want to benefit from parallelism. If anyone needs to limit the resources then they can, very easily, using Put it another way, if the default were 1 core we'd be deluged with users saying "hey, it's supposed to be parallel but it isn't" needing the reply "you have to setDTthreads(n)" then the reply "ok what do I set n to?" and needing the reply "well - it depends". Etc etc. This way we just leave it to OpenMP by default and they don't need to know one more thing: Take for example the 340 CRAN and Bioconductor packages using data.table. Any of them using On multi-user severs, you can limit cpu resources using OS commands; e.g. cpulimit at system level. Surely that is a better way to manage/limit shared resources than forcing all users to manage it themselves somehow at R level. That is unnecessarily inconvenient for desktop, laptop and sole-use servers. The goal is to 'just work' for the vast majority of users. Or if it must, Over time, I hope that explicit parallelism calling data.table will no longer be needed. The best place to do the parallelism is inside data.table automatically, not outside manually. If I've misunderstood or missed something, please reopen. If the predicted problems occur please let me know and I'll definitely think again. |
I've noticed since my last R update that loading data.table now pops up a message about multithreading and openMP. So I'm curious how the parallelised data.table works within code that explicitly forks.
Specifically, I want to avoid excessive forking. I use mclapply to parallelise certain calculations on my data.tables. I want to make sure no additional threads are spawned from within the child processes, as that would mess up resource management on our computing cluster.
The text was updated successfully, but these errors were encountered: