Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread skip in v1.11.0+ returns error when it used to work in v1.10.x #3006

Closed
brattono opened this issue Aug 21, 2018 · 6 comments
Closed

fread skip in v1.11.0+ returns error when it used to work in v1.10.x #3006

brattono opened this issue Aug 21, 2018 · 6 comments
Milestone

Comments

@brattono
Copy link

I have output from a vehicle monitoring system that is exported in .csv format. It has a set of data headers (2 columns for 25 rows), a blank row, followed by time/speed/distance data in 6 columns.

When using data.table 1.10 the following command worked:
fread("otmr1a", skip=26L)

In version 1.11.4 the same command gives:

fread("otmr1a.csv", skip = 26)
Error in fread("otmr1a.csv", skip = 26) : 
  skip=26 but the input only has 1 line

If the skip is removed, fread works perfectly:

fread("otmr1a.csv")
                  V1               V2       V3      V4 V5   V6
    1:          Name            Value                         
    2: Configuration        345                         
    3:      Creation 17/04/2018 08:02                         
    4:   Customer Id     MTR                         
    5: Distance Unit  Centimeter (cm)                         
   ---                                                        
86113:       4982808       26/02/2018 10:50:03 3945.04  0 -100
86114:       4982809       26/02/2018 10:50:03 3945.04  0 -100
86115:       4982810       26/02/2018 10:50:03 3945.04  0 -100
86116:       4982811       26/02/2018 10:50:04 3945.04  0 -100
86117:       4982812       26/02/2018 10:50:04 3945.04  0 -100

Because fread works on the file, the easy work-round is to select rows after import:

fread("otmr1a.csv")[26:.N]
              V1         V2       V3            V4               V5                   V6
    1: Record Id       Date     Time Distance (mi) SYS_SPEED (km/h) CIU_TraBrkEffRef (%)
    2:   4896722 26/02/2018 05:50:08       3862.23                0                    0
    3:   4896723 26/02/2018 05:50:09       3862.23                0                    0
    4:   4896724 26/02/2018 05:50:09       3862.23                0                    0
    5:   4896725 26/02/2018 05:50:09       3862.23                0                    0

The line end is "^M" from a Windows package:
Name Value ^MConfiguration 345 ^MCreation 17/04/2018 08:02 ^MCustomer Id MTR ^MDistance Unit Centimeter (cm) ^MDistance Unit (Display) Mile (mi) ^MDistance Unit (User) Mile (mi) ^MEnd Distance "4,743.8927 mi" ^MEnd Time 03:12.0 ^MFile Z:\Documents\OTMR\_2018_03_14-17_03_00003.tel\INT_TDATA ^MIssue Number I19 ^MMemory Type INT_TDATA ^MName TELOC Dataset ^MRecords "6,512,724" ^MSerial Number 17028710 ^MSoftware Version 2402.04.24.01 ^MStart Distance 61.5482 mi ^MStart Time 57:55.2 ^MTime Zone Coordinated Universal Time (UTC) ^MTime Zone (User) Coordinated Universal Time (UTC) ^MTotal distance counter "4,743.8927 mi" ^MVehicle Id Unknown ^MVehicle Type Class345 ^MWheel Diameter 31.496 in ^M ^MRecord Id Date Time Distance (mi) SYS_SPEED (km/hCIU_TraBrkEffRef (%)^M4896722 26/02/2018 05:50:08 3862.23 0 0^M4896723 26/02/2018 05:50:09 3862.230^M4896724 26/02/2018 05:50:09 3862.23 0 0^M4896725 26/02/2018 05:50:09 3862.23 0 0^M4896726 26/02/2018 05:50:10 3862.23 0 0^M4896727 26/02/2018 05:50:11 3862.23 0 0^M

If I use fread to load the package, and then save using fwrite, then the problem disappears on reloading. This means that if I use R to create a small version of the dataset to provide a reproducible example, the problem disappears.

I wasn't sure if it was linked in any way to:
#2857
#2943

Happy to provide original data sets if that helps but I can't put it on public sites beyond the sample provided above.

# Output of sessionInfo()

R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] fasttime_1.0-2    ggplot2_3.0.0     data.table_1.11.4
@tbrycekelly
Copy link

tbrycekelly commented Sep 11, 2018

I am having the same issue as you described. My file is formatted by scientific equipment and fread without skip will only give me a portion of the data I actually want, so even the workaround doesn't work.

I want to skip the first 29 lines with "fread('Raw Data/FRRF/20170601-212720.csv', sep = ',', skip = 29, nrows = 21)":

Error in fread("Raw Data/FRRF/20170601-212720.csv", skip = 10, sep = ","): skip=10 but the input only has 1 line

File path:,
File note:,
File date:,02/06/17
File time:,12:34:19

FRRf3 Ka:,11800
[Chl] multiplier:,6.0
QR threshold:,6.0
C derivation:,qP
TauPQ:,250
TauSQ:,500

Act2 SN:,15-0003-002
Act2 LED:,White

FRRf3 SN:,14-9727-007
450 nm:,1.00
530 nm:,0.50
624 nm:,0.80

,A,B,C,D
Alpha:,0.371,0.382,0.377,0.395
Ek:,328.5,314.4,273.2,271.8
Pm:,187.7,75.97,65.80,165.2
Em:,,,367.5,
AlphaW:,2.000,2.000,2.000,2.000
SErP:,0.125,0.207,0.425,0.144

,LED combination A (450 nm alone),,,,,,Measured,Fit,,,Fo,Fm,Fv/Fm
,Saq,E,Start,s,[Chl],ADC,rP,rP,JPII,JVPII,F',Fm',Fq'/Fm',C,p,RSigma,Sigma,CSQ,TauES,NPQ,NSV,,QR,Qo,Qm,QoSE,QmSE,QSE,QSE ratio,,Qo points,Qo slope,Qo intercept,Qm points,Qm slope,Qm intercept
,1,0,00:35,35,10.79,40,0.000000,0.000000,0.000000,0.000000,1.799,2.866,0.372,,0.316,0.0512,5.116,0.563,2717,0.093,1.794,,82.13,1.804,2.817,0.005158,0.0112,0.0123,0.460,,12,0.0334,1.804,36,0.000765,2.790
,2,22,06:37,397,11.59,41,8.046,7.899,59.08,0.0396,1.932,3.046,0.366,0.017,0.144,0.0446,4.459,0.563,3229,0.029,1.689,,27.43,1.942,2.972,0.0256,0.0275,0.0376,0.932,,13,0.0339,1.942,36,0.000204,2.965
,3,49,07:40,460,12.36,43,16.73,16.90,143.3,0.0824,2.060,3.129,0.341,0.097,0.290,0.0486,4.857,0.466,3459,0.002,1.644,,23.59,2.068,3.085,0.0319,0.0290,0.0431,1.102,,17,0.0318,2.068,36,0.001049,3.047
,4,81,08:43,523,12.58,44,26.78,26.65,240.9,0.132,2.097,3.133,0.331,0.127,0.289,0.0494,4.939,0.470,3280,0.000,1.641,,25.39,2.078,3.106,0.0316,0.0254,0.0405,1.246,,15,0.0351,2.078,36,0.001085,3.067
,5,118,09:46,586,12.52,44,37.75,36.80,355.6,0.186,2.087,3.068,0.320,0.144,0.098,0.0500,5.005,0.526,2768,0.021,1.676,,24.35,2.129,2.998,0.0183,0.0306,0.0357,0.597,,13,0.0303,2.129,36,-0.000850,3.028
,6,163,10:49,649,12.92,43,44.56,47.70,462.0,0.219,2.154,2.965,0.273,0.252,0.328,0.0471,4.707,0.418,2909,0.057,1.735,,22.37,2.127,2.944,0.0200,0.0306,0.0366,0.654,,12,0.0282,2.127,36,0.001041,2.907
,7,216,11:52,712,12.74,42,57.60,58.76,667.3,0.284,2.124,2.896,0.267,0.260,0.248,0.0513,5.130,0.420,2589,0.082,1.776,,19.74,2.128,2.864,0.0255,0.0272,0.0373,0.937,,12,0.0263,2.128,36,0.000737,2.837
,8,279,12:55,775,12.60,43,66.26,69.78,849.7,0.326,2.099,2.753,0.238,0.319,0.229,0.0506,5.057,0.402,2832,0.138,1.868,,15.45,2.081,2.725,0.0324,0.0262,0.0417,1.236,,12,0.0266,2.081,36,0.001086,2.686
,9,353,13:59,839,12.09,43,85.45,81.09,1001,0.421,2.014,2.658,0.242,0.289,0.071,0.0471,4.710,0.560,2512,0.179,1.935,,13.36,2.053,2.628,0.0323,0.0284,0.0431,1.137,,12,0.0190,2.053,36,0.000594,2.607
,10,442,15:02,902,12.09,42,89.16,93.77,1281,0.439,2.015,2.524,0.202,0.387,0.032,0.0481,4.813,0.502,2192,0.242,2.038,,13.93,2.014,2.503,0.0201,0.0288,0.0351,0.697,,12,0.0200,2.014,36,0.000815,2.473
,11,547,16:05,965,11.91,42,86.99,105.6,1378,0.428,1.984,2.359,0.159,0.494,0.164,0.0418,4.184,0.451,2141,0.328,2.180,,8.933,2.003,2.335,0.0277,0.0248,0.0371,1.117,,17,0.009353,2.003,36,-0.000086,2.338
,12,671,17:08,1028,11.44,44,99.05,116.4,1831,0.488,1.906,2.236,0.148,0.513,0.220,0.0453,4.530,0.325,1923,0.401,2.300,,7.643,1.900,2.237,0.0348,0.0271,0.0441,1.282,,17,0.0109,1.900,36,0.000630,2.214
,13,819,18:11,1091,10.93,43,115.9,126.0,2516,0.571,1.821,2.122,0.142,0.515,0.169,0.0510,5.101,0.502,1808,0.477,2.424,,6.612,1.864,2.130,0.0253,0.0313,0.0402,0.808,,35,0.006111,1.864,36,0.001375,2.080
,14,995,19:14,1154,10.44,44,120.5,134.5,,0.593,1.740,1.980,0.121,0.564,,,,,,0.582,2.597,,4.880,1.740,1.980,0.0351,0.0345,0.0492,1.017,,12,0.0110,1.740,36,0.000220,1.972
,15,1204,20:17,1217,10.26,47,143.3,141.9,,0.706,1.711,1.942,0.119,0.566,,,,,,0.614,2.648,,4.463,1.711,1.942,0.0258,0.0449,0.0518,0.574,,12,0.006058,1.711,36,0.001182,1.899
,16,1451,21:20,1280,9.649,50,180.0,148.5,,0.886,1.608,1.836,0.124,0.528,,,,,,0.707,2.801,,4.196,1.608,1.836,0.0383,0.0384,0.0543,0.998,,12,0.0126,1.608,36,-0.000459,1.853
,17,1745,22:23,1343,9.490,54,175.5,154.5,,0.864,1.582,1.759,0.101,0.605,,,,,,0.782,2.924,,3.050,1.582,1.759,0.0390,0.0429,0.0580,0.907,,12,0.0132,1.582,36,-0.001115,1.799
,18,2094,23:26,1406,9.429,55,146.4,160.0,,0.721,1.571,1.690,0.070,0.717,,,,,,0.855,3.044,,1.720,1.571,1.690,0.0420,0.0543,0.0687,0.774,,13,0.008529,1.571,36,-0.001363,1.739
,19,2508,24:29,1469,9.456,62,259.4,165.3,,1.277,1.576,1.758,0.103,0.594,,,,,,0.783,2.926,,2.982,1.576,1.758,0.0458,0.0402,0.0610,1.140,,29,0.003529,1.576,36,0.000880,1.726
,20,3000,25:32,1532,9.412,70,177.7,170.2,,0.875,1.569,1.667,0.059,0.758,,,,,,0.879,3.084,,1.707,1.569,1.667,0.0418,0.0400,0.0579,1.044,,13,0.003480,1.569,36,0.002068,1.593
,21,0,26:35,1595,9.755,28,0.000000,0.000000,0.000000,0.000000,1.626,2.413,0.326,,0.277,0.0496,4.960,0.566,2077,0.298,2.131,,44.19,1.628,2.372,0.0129,0.0108,0.0168,1.189,,12,0.0255,1.628,36,0.000490,2.354

@MichaelChirico
Copy link
Member

@tbrycekelly can you run length(readLines("Raw Data/FRRF/20170601-212720.csv")) and report the result

@tbrycekelly
Copy link

Sure, @MichaelChirico.
length(readLines('Raw Data/FRRF/20170601-212720.csv'))
returns 107

@tbrycekelly
Copy link

tbrycekelly commented Sep 12, 2018

And I should include that without the skip parameter this is what fread finds:

V1 V2
File path:
File note:
File date: 01/06/17
File time: 21:27:20

@solmonta
Copy link

solmonta commented Nov 9, 2018

I have exactly the same problem: a colleague exported a .csv from Windows excel, and when I tried to read in with fread in combination with the skip option I got:

Error in fread(file = "data.csv", blank.lines.skip = T, :
skip=10 but the input only has 1 line

without skip it is no problem.

I am using version 1.11.4
I never had this message before when I was using older versions of data.table on the same data set.

Is this maybe a bug?

@tbrycekelly
Copy link

tbrycekelly commented Nov 14, 2018 via email

@mattdowle mattdowle added this to the 1.12.0 milestone Nov 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants