Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread's nrows=0 errors if input is an empty table #2512

Closed
franknarf1 opened this issue Dec 8, 2017 · 1 comment
Closed

fread's nrows=0 errors if input is an empty table #2512

franknarf1 opened this issue Dec 8, 2017 · 1 comment
Milestone

Comments

@franknarf1
Copy link
Contributor

@franknarf1 franknarf1 commented Dec 8, 2017

I have several tables on disk. For each table, I want to get the column names to check them against expected names, using names(fread(fn, nrows=0)) as suggested in the ?fread documentation for nrows=. However, for any empty table, this gives an error:

# note that this example will write to your current directory
library(data.table)
DT0 = data.table(a = numeric(), b = numeric())
fn0 = "test0.csv"

fwrite(DT0, fn0)

fread(fn0) 
# works fine
fread(fn0, nrows=0)
# Error in fread(fn, nrows = 0) : 
#   Internal error in line 1848 of fread.c, please report on data.table GitHub:  allocnrow(1) < nrowLimit(0)

It works fine if the table is nonempty, though:

# note that this example will write to your current directory
library(data.table)
DT = data.table(a = numeric(1), b = numeric(1))
fn = "test.csv"

fwrite(DT, fn)

fread(fn, nrows=0) 
# works fine

Tested on...

data.table 1.10.5 IN DEVELOPMENT built 2017-12-08 20:14:33 UTC; travis

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.10.5

Verbose output ...

> fread(fn0, nrows=0, verbose=TRUE)
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 8 threads (omp_get_max_threads()=8, nth=8)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  show progress = 1
  0/1 column will be read as boolean
[02] Opening the file
  Opening file test0.csv
  File opened, size = 5 bytes.
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  No \n has been found in the data (the entire input was scanned) so \r-only line endings are allowed. This is unusual.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<a,b>>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep ...
  sep=','  with 1 lines of 2 fields using quote rule 0
  Detected 2 columns on line 1. This line is either column names or first data row. Line starts as: <<a,b>>
  Quote rule picked = 0
  fill=false and the most number of columns found is 2
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (3 bytes from row 1 to eof) / (2 * 3 jump0size) == 0
  Type codes (jump 000)    : AA  Quote rule 0
  'header' determined to be true because there are no number fields in the first and only row
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : 11
[10] Allocate memory for the datatable
  Allocating 2 column slots (2 - 0 dropped) with 1 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=0
Error in fread(fn0, nrows = 0, verbose = TRUE) : 
  Internal error in line 1848 of fread.c, please report on data.table GitHub:  allocnrow(1) < nrowLimit(0)

(For now, I'll fiddle with readLines and strsplit since I think I'll only be facing csvs and won't have to deal with quotes... I mean delimitation of columns in x="'a,b', AB\n1,2" is a lot better handled by names(fread(x, quote="\'", nrow=0)) than I could hack together like strsplit(readLines(textConnection(x), n=1), ", *")[[1]].)

@st-pasha st-pasha added this to the v1.10.6 milestone Dec 14, 2017
@st-pasha
Copy link
Contributor

@st-pasha st-pasha commented Dec 14, 2017

Minimal reproducible example:

> fread("a,b\n", nrows=0)
Error in fread("a,b\n", nrows = 0) : 
  Internal error in line 1848 of fread.c, please report on data.table GitHub:  allocnrow(1) < nrowLimit(0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants