Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with fread #2299

etienne-s opened this issue Aug 14, 2017 · 1 comment

Segfault with fread #2299

etienne-s opened this issue Aug 14, 2017 · 1 comment


Copy link

etienne-s commented Aug 14, 2017

I get a segfault when using fread on this file: test.txt

I use the latest dev version, the problem does not occur with the CRAN version.

The file can be fixed by adding a separator at the end of line 116.

Not sure whether it is related to an existing issue or not.

@st-pasha st-pasha added the fread label Aug 14, 2017
Copy link

st-pasha commented Aug 14, 2017

On my MacOS machine, I get the following:

> fread("~/Downloads/test.txt")
Error in fread("~/Downloads/test.txt") : 
  Expecting 7 cols but row 0 contains only 6 cols (sep='|'). Consider fill=true. <<"aa aa aa aa aa aa aa aa !"|1|aa.aa.aa|"aa aa aa aa aa aa aa aa ! 1 aaûaa 1 aa aa1 aa : aa aa aa'aa aa aa aaé aa aa aa aa ! aa aa aa aa !"|aa aa aa aa aa aa aa aa ! ,|>>

This is a valid error message (except that the line number is incorrect) -- the line shown does have 6 fields instead of expected 7; and adding option fill=TRUE does read the file correctly. There is no segfault.

On a Windows machine however, I can confirm that running fread("test.txt") produces a segfault, whereas fread("test.txt", fill=T) reads the data correctly. Here's the log leading up to the segfault:

> fread("Downloads/test.txt", verbose=T)
Input contains no \n. Taking this to be a filename to open
[1] Check arguments
  Using 8 threads (omp_get_max_threads()=8, nth=8)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  show progress = 1
[2] Opening the file
  Opening file Downloads/test.txt
  File opened, size = 95.7KB (zd bytes).
  Memory mapping ... ok
[3] Detect and skip BOM
[4] Detect end-of-line character(s)
  Detected eol as \r\n (CRLF) in that order, the Windows standard.
[6] Skipping initial rows if needed
  Positioned on line 1 starting: <<aa|aa aa|aa|aa|aa|aa aa|aa aa>>
[7] Detect separator, quoting rule, and ncolumns
  Detecting sep ...
  sep=','  with 1 lines of 3 fields using quote rule 2
  sep='|'  with 100 lines of 7 fields using quote rule 0
  Detected 7 columns on line 1. This line is either column names or first data row. Line starts as: <<aa|aa aa|aa|aa|aa|aa aa|aa aa>>
  Quote rule picked = 0
[8] Determine column names
  All the fields on line 1 are character fields. Treating as the column names.
[9] Detect column types
  Number of sampling jump points = 11 because (97943 bytes from row 1 to eof) / (2 * 4859 jump0size) == 10
  Type codes (jump 000)    : 6266622  Quote rule 0
Bumping quote rule from 0 to 1 due to field 1 on line 7 of sampling jump 1 starting <<"aa : aaAcaa aa aa aa aa aa "aa aa aa" ?"|1|aa-aa.aa|"-"|||>>
Bumping quote rule from 1 to 2 due to field 1 on line 7 of sampling jump 1 starting <<"aa : aaAcaa aa aa aa aa aa "aa aa aa" ?"|1|aa-aa.aa|"-"|||>>
  Type codes (jump 001)    : 6266622  Quote rule 2
  Type codes (jump 010)    : 6266622  Quote rule 2
  Sampled 1039 rows (handled \n inside quoted fields) at 11 jump points
  Bytes from first data row on line 2 to the end of last row: 97943
  Line length: mean=61.00 sd=19.55 min=22 max=146
  Estimated number of rows: 97943 / 61.00 = 1606
  Initial alloc = 3212 rows (1606 + 100%) using bytes/max(mean-2*sd,min) clamped between [1.1*estn, 2.0*estn]
[10] Apply user overrides on column types
After 0 type and 0 drop user overrides : 6266622
[11] Allocate memory for the datatable
  Allocating 7 column slots (7 - 0 dropped) with 3212 rows
[12] Read the data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

2 participants