Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[R-Forge #5358] fread quoted strings not always handled properly #489
Submitted by: James Sams; Assigned to: Nobody; R-Forge link
I have a file with three fields: two
# "Expected sep (',') but '"' ends field 2 on line 828 when reading data:".
(Actual data not used due to confidentiality concerns.)
IME, there are two ways that CSV-type files will handle embedded quotes with backslash escape (") and by doubling them up, as is done here (""). Well, at least two unambiguous ways. Note that it isn't uncommon to see this field without the outer quotes. The reason for this, as I understand it, is that some programs will only include the outer quotes if the field contains the designated field separator. Otherwise, these programs will rely on the escaping mechanism (either backslash or doubling) to handle single or double quotes, etc. Of course, csv files aren't standardized; so, there may be other cases. Hopefully this is helpful information though.
I see several other bug reports about
Embedded quotes and doubled-up quotes should now be handled in v1.9.4 inside a quoted field or not. Report seems to be from much earlier this year. There's still a problem if an embedded newline occurs after a double-up quote. Check and add more tests on this one, document, add to README and close.