[Request] With fread, do not throw 32 bit large file support error if nrows is specified #949

xiaodaigh · 2014-11-13T00:13:33Z

For the fread function I wonder if it's possible to not show the 32-bit large file support error if I actually specified the nrows? I am only interested in the first 5000 rows, say, and I need not read in the whole file on a 32bit Windows machine.

The error message:

#  "Error in data.table::fread("c:/testing/data.csv",  : 
#   Opened file ok, obtained its size on disk (1631.0MB), but couldn't memory map it. This is a 32bit 
# machine. You don't need more RAM per se but this fread function is tuned for 64bit addressability,
# at the expense of large file support on 32bit machines. You probably need more RAM to store the 
# resulting data.table, anyway. And most speed benefits of data.table are on 64bit with large RAM, 
# too. Please either upgrade to 64bit (e.g. a 64bit netbook with 4GB RAM can cost just Â£300), or 
# make a case for 32bit large file support to datatable-help."

AtroXWorf · 2016-12-15T12:20:01Z

I'd also like this to be implemented. It should be handy for 'nrows=0' situations or when one ins only interested to read in small portions via nrows.

mattdowle · 2018-03-03T11:40:05Z

Please retry with latest dev 1.10.5. From NEWS :

Memory maps lazily; e.g. reading just the first 10 rows with nrow=10 is 12s down to 0.01s from cold for the 9GB file. Large files close to your RAM limit may work more reliably too. The progress meter will commence sooner and more consistently.

There's a chance it will work now. If not, please file a new issue.

For Windows you can download a binary .zip for dev 1.10.5 by following instructions here.

st-pasha · 2018-03-04T01:40:44Z

On a 32-bit system it's not possible to memory-map a file bigger than 4GB in its entirety. It's possible to memory-map only part of the file though. Trying to map the whole file will raise an error

EOVERFLOW
On 32-bit architecture together with the large file extension
(i.e., using 64-bit off_t): the number of pages used for
length plus number of pages used for offset would overflow
unsigned long (32 bits).

Of course we can map only the first 4GB of the file, and then hope that that would be sufficient to read the amount of rows requested by the user. But what if not? It is possible to work around this by carefully managing the window of the file which is currently mapped; however it would increase the code complexity significantly. In addition, it is unclear how to test this functionality because finding a 32-bit machine is challenging nowadays.

AtroXWorf · 2018-03-04T19:09:25Z

I no longer have access to a 32-bit machine, so I can not test it anymore.

xiaodaigh changed the title ~~Do not though 32 bit large file support error if nrows is specified~~ [Request] With fread, do not throw 32 bit large file support error if nrows is specified Nov 13, 2014

arunsrinivasan added the feature request label Dec 4, 2014

arunsrinivasan added the fread label Sep 4, 2015

st-pasha mentioned this issue Jul 7, 2017

Master task for fread bugs / proposals #2247

Closed

mattdowle closed this as completed Mar 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Request] With fread, do not throw 32 bit large file support error if nrows is specified #949

[Request] With fread, do not throw 32 bit large file support error if nrows is specified #949

xiaodaigh commented Nov 13, 2014

AtroXWorf commented Dec 15, 2016

mattdowle commented Mar 3, 2018

st-pasha commented Mar 4, 2018

AtroXWorf commented Mar 4, 2018

[Request] With fread, do not throw 32 bit large file support error if nrows is specified #949

[Request] With fread, do not throw 32 bit large file support error if nrows is specified #949

Comments

xiaodaigh commented Nov 13, 2014

AtroXWorf commented Dec 15, 2016

mattdowle commented Mar 3, 2018

st-pasha commented Mar 4, 2018

AtroXWorf commented Mar 4, 2018