Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread() does not always detect separator correctly when commas in text fields? #923

Closed
raymondben opened this issue Oct 29, 2014 · 2 comments
Assignees
Labels
Milestone

Comments

@raymondben
Copy link

Should detect sep as "\t":

fread(sprintf("\"a\"\t\"b\"\n\"this,that\"\t2\n"),verbose=TRUE)
# Input contains a \n (or is ""). Taking this to be text input (not a filename)
# Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
# Using line 2 to detect sep (the last non blank line in the first 'autostart') ... sep=','
# [truncated]

Removing the embedded comma detects sep correctly:

fread(sprintf("\"a\"\t\"b\"\n\"this_that\"\t2\n"),verbose=TRUE)
# Input contains a \n (or is ""). Taking this to be text input (not a filename)
# Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
# Using line 2 to detect sep (the last non blank line in the first 'autostart') ... sep='\t'
sessionInfo()
# R version 3.1.1 (2014-07-10)
# Platform: x86_64-pc-linux-gnu (64-bit)

# locale:
#  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C               LC_TIME=en_AU.UTF-8       
#  [4] LC_COLLATE=en_AU.UTF-8     LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
#  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
# [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
# [1] data.table_1.9.4 Defaults_1.1-1  

# loaded via a namespace (and not attached):
# [1] chron_2.3-45  plyr_1.8.1    Rcpp_0.11.2   reshape2_1.4  stringr_0.6.2 tools_3.1.1  
@arunsrinivasan
Copy link
Member

Seems to have been fixed in 1.9.5. Please test and write back if there are still issues.

@zlskidmore
Copy link

I am seeing a similar issue reading in a file with sep="auto" produces output like this:

test <- fread("~/Desktop/tmp1.txt")
head(test)
                                                        V1 V2       V3    V4  V5
1: 1\t50000\t10673\tJUNC00000004\t12\t?\t50000\t10673\t255  0 0\t2\t83 55\t0 178
2: 1\t40000\t10533\tJUNC00000003\t42\t?\t40000\t10533\t255  0 0\t2\t89 55\t0 178
3: 1\t20000\t10343\tJUNC00000001\t57\t?\t20000\t10343\t255  0 0\t2\t95 55\t0 178

sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.10.4

loaded via a namespace (and not attached):
[1] tools_3.3.0

I've attached a test file to reproduce this.
tmp1.txt

setting the separator fixes the problem so i'm not sure if this is a bug, but maybe a warning should be produced?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants