Hi Arun,
We met at the SatRday in Budapest. Here is the problem I was talking you about.
I attached an anonymized sample of the data with the row that contains the error (the characters in the data are completely replaced with "a"-s, and "1"-s, the dates are replaced with a random date, but the underlying file structure is the same: It is a tab separated file, with no quotation marks used to delimit the data int the columns, and one row is broken, and the data is placed it two separate rows. There are 11 rows and 51 columns in the sample file, and the error is in the 6th and 7th rows.
I think you were right, that the problem is likely with the end of line character, the error message reads:
Error in fread("data/dt_anonymized_test.txt") :
Expected sep (' ') but new line or EOF ends field 39 on line 6 when reading data: 1970.03.24 1111111111111 aaaaaaaaaaa aaaaaaaaaa aaa aaaa aaaa. aaaa11 1970.03.24 aaaaaa aaaaaa aa 1970.03.24 1970.03.24 1111-1111111 aaa aaaa 1111111.11 111111.11 111111.11 1111111 111111 111111 1.11 1 1 1 1111111 111111 111111 1111111 111111 111111 1111111 1111111 1111111 111111 111 1111111111111111 1111111111111.11.11
In addition the sample file gives the following warning:
In addition: Warning message:
In fread("data/dt_anonymized_test.txt") :
Bumped column 39 to type character on data row 6, field contains '1111111111111.11.11'. Coercing previously read values in this column from logical, integer or numeric back to character which may not be lossless; e.g., if '00' and '000' occurred before they will now be just '0', and there may be inconsistencies with treatment of ',,' and ',NA,' too (if they occurred in this column before the bump). If this matters please rerun and set 'colClasses' to 'character' for this column. Please note that column type detection uses the first 5 rows, the middle 5 rows and the last 5 rows, so hopefully this message should be very rare. If reporting to datatable-help, please rerun and include the output from verbose=TRUE.
I included the later, because the real data also gives this kinds of error, but I think that part is less important for me.
I use data.table version 1.9.6 from CRAN.
Thanks beforehand for your help!
dt_anonymized_test.txt
Hi Arun,
We met at the SatRday in Budapest. Here is the problem I was talking you about.
I attached an anonymized sample of the data with the row that contains the error (the characters in the data are completely replaced with "a"-s, and "1"-s, the dates are replaced with a random date, but the underlying file structure is the same: It is a tab separated file, with no quotation marks used to delimit the data int the columns, and one row is broken, and the data is placed it two separate rows. There are 11 rows and 51 columns in the sample file, and the error is in the 6th and 7th rows.
I think you were right, that the problem is likely with the end of line character, the error message reads:
In addition the sample file gives the following warning:
I included the later, because the real data also gives this kinds of error, but I think that part is less important for me.
I use
data.tableversion1.9.6from CRAN.Thanks beforehand for your help!
dt_anonymized_test.txt