Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] With fread, do not throw 32 bit large file support error if nrows is specified #949

Closed
xiaodaigh opened this issue Nov 13, 2014 · 4 comments

Comments

@xiaodaigh
Copy link

For the fread function I wonder if it's possible to not show the 32-bit large file support error if I actually specified the nrows? I am only interested in the first 5000 rows, say, and I need not read in the whole file on a 32bit Windows machine.

The error message:

#  "Error in data.table::fread("c:/testing/data.csv",  : 
#   Opened file ok, obtained its size on disk (1631.0MB), but couldn't memory map it. This is a 32bit 
# machine. You don't need more RAM per se but this fread function is tuned for 64bit addressability,
# at the expense of large file support on 32bit machines. You probably need more RAM to store the 
# resulting data.table, anyway. And most speed benefits of data.table are on 64bit with large RAM, 
# too. Please either upgrade to 64bit (e.g. a 64bit netbook with 4GB RAM can cost just £300), or 
# make a case for 32bit large file support to datatable-help."
@xiaodaigh xiaodaigh changed the title Do not though 32 bit large file support error if nrows is specified [Request] With fread, do not throw 32 bit large file support error if nrows is specified Nov 13, 2014
@AtroXWorf
Copy link

I'd also like this to be implemented. It should be handy for 'nrows=0' situations or when one ins only interested to read in small portions via nrows.

@mattdowle
Copy link
Member

Please retry with latest dev 1.10.5. From NEWS :

Memory maps lazily; e.g. reading just the first 10 rows with nrow=10 is 12s down to 0.01s from cold for the 9GB file. Large files close to your RAM limit may work more reliably too. The progress meter will commence sooner and more consistently.

There's a chance it will work now. If not, please file a new issue.

For Windows you can download a binary .zip for dev 1.10.5 by following instructions here.

@st-pasha
Copy link
Contributor

st-pasha commented Mar 4, 2018

On a 32-bit system it's not possible to memory-map a file bigger than 4GB in its entirety. It's possible to memory-map only part of the file though. Trying to map the whole file will raise an error

EOVERFLOW
On 32-bit architecture together with the large file extension
(i.e., using 64-bit off_t): the number of pages used for
length plus number of pages used for offset would overflow
unsigned long (32 bits).

Of course we can map only the first 4GB of the file, and then hope that that would be sufficient to read the amount of rows requested by the user. But what if not? It is possible to work around this by carefully managing the window of the file which is currently mapped; however it would increase the code complexity significantly. In addition, it is unclear how to test this functionality because finding a 32-bit machine is challenging nowadays.

@AtroXWorf
Copy link

I no longer have access to a 32-bit machine, so I can not test it anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants