Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign up[Request] allow tmpDir to be supplied as argument: fread can run out of tmpfs space on unix during preprocessing #1139
Comments
|
I've also run into this issue. What is particularly frustrating is that if /dev/shm exists, then the value of TMPDIR is ignored and /dev/shm is used. It would be great if TMPDIR were respected. |
|
Agreed. This would really help my workflow. |
|
+1 It would save my life! |
|
It's not good that the |
|
Something like
|
Hi,
Recently I've encountered an issue for large compressed files that could stop the functioning of
freaddue to tmpfs out off space. Since currently (in the master branch)freadon unix system will use tmpfs (/dev/shm) as long as it exists, the size of tmpfs will limit the capability offreadto read potentially large files before any preprocessing can be done. This is more severe when multi-threading is used to simultaneously load multiple files for speed gain, say,mclapply(input_list, fread, mc.cores=4), where input list may be something likeEach gz file could have several GBs uncompressed. I don't need them all in my analysis and a preprocessing could be done to significantly reduce the size of each file. However, the preprocessing requires each file to be uncompressed to disk in the first place, occupying all the space available in tmpfs. (There are, of course, several work-a-rounds for this kind of situation but it could be great to directly address it in one R function call, which is
freadin discuss.)It hence could be nice if a user-input argument is allowed to force
tempfilelocation other than tmpfs on unix system. For exampledat <- fread("zcat file.gz", tmpDir="/data"). The performance may be a bit worse due to disk I/O but the raw data will not be limited by size of tmpfs, which is usually by far smaller than any disk device at hand. (On my machine I have 8 GBs in tmpfs and that's it.)A possible minor change to make this issue fixed on unix is to rewrite
fread.Ras everdark@4aaa745.I only test it on my local machine and it works fine. There could be some ramification that I don't take into account in this simple modification so I create this request issue to open the discussion. :) Did anybody else also encounter such tmpfs out-of-space issue?