Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR for fread: if select is used, colClasses need only correspond to the columns in select #1426

Closed
MichaelChirico opened this issue Nov 10, 2015 · 4 comments · Fixed by #3547
Closed

Comments

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Nov 10, 2015

I never filed a FR for a question I raised a year ago on SO.

The current canon for using select and colClasses simultaneously is (IMO) unwieldy.

Consider:

#file to read
ffile <- paste0(paste(paste0("V", 1:20), collapse = ","),
              "\na,b,c,d,e,1,2,3,4,5,1.1,1.2,1.3,1.4,1.5,",
              "TRUE,FALSE,TRUE,FALSE,TRUE")

#columns to take
sel <- c(2, 10, 13:15, 20)
#types of all columns
tps <- rep(c("character", "integer",
           "numeric", "logical"),
         rep(5, 4))

Here's the best I could come up with as a programmatic way to use fread:

DT <- fread(ffile, select = paste0("V", sel),
            colClasses =
              sapply(unique(tps[sel]),
                     function(x) paste0("V", sel[which(tps[sel] == x)])))

(gross; could be spelled out explicitly as the following, but this is generally unsatisfying:)

DT <- fread(ffile, select = paste0("V", sel),
            colClasses =
              list(character = "V1", integer = "V10",
                   numeric = c("V13","V14","V15"),
                   logical = c("V20")))

To me it would make much more sense to be able to simply write:

DT <- fread(ffile, select = paste0("V", sel), colClasses = tps[sel])

But this currently produces the error:

Error in fread(ffile, select = paste0("V", sel), colClasses = tps[sel]) :
colClasses is unnamed and length 6 but there are 20 columns. See ?data.table for colClasses usage.

Parsimonious, and as far as I can tell unambiguous. Any reason why this wouldn't work?

@MichaelChirico
Copy link
Member Author

@MichaelChirico MichaelChirico commented Nov 10, 2015

What about just reversing the order of dealing with select vs. colClasses -- switch Lines #1141 - 1157 with Lines #1059 - 1112, and reset ncol to length(select) within the select condition?

@dselivanov
Copy link

@dselivanov dselivanov commented Nov 20, 2015

+1 for this, also annoying me. Will try to check.

@arunsrinivasan
Copy link
Member

@arunsrinivasan arunsrinivasan commented Nov 20, 2015

Quite a few issues marked for v1.9.8 already. Can't take a look anytime soon. Glad if you could look into this. Thanks.

@arunsrinivasan arunsrinivasan added this to the v2.0.0 milestone Nov 26, 2015
@st-pasha st-pasha added enhancement and removed bug labels Jul 6, 2017
@renkun-ken
Copy link
Member

@renkun-ken renkun-ken commented Nov 13, 2017

Really need this.

@mattdowle mattdowle removed this from the Candidate milestone May 10, 2018
@mattdowle mattdowle added this to the 1.12.4 milestone May 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants