Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colClasses correspond to select #3547

Merged
merged 14 commits into from
May 10, 2019
Merged

colClasses correspond to select #3547

merged 14 commits into from
May 10, 2019

Conversation

mattdowle
Copy link
Member

@mattdowle mattdowle commented May 7, 2019

Closes #1426
Follow up to #2545

  • avoid needing new option (datatable.colClassesSelectOrder was attempted in this PR)
  • retain backwards compatibility with no need to migrate
  • avoid any ambiguities in syntax / intention
  • allow colClasses to be specified inside select directly in one place

@mattdowle mattdowle added this to the 1.12.4 milestone May 7, 2019
@codecov
Copy link

codecov bot commented May 7, 2019

Codecov Report

Merging #3547 into master will increase coverage by 0.1%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #3547     +/-   ##
=========================================
+ Coverage   97.32%   97.42%   +0.1%     
=========================================
  Files          66       66             
  Lines       12629    12676     +47     
=========================================
+ Hits        12291    12350     +59     
+ Misses        338      326     -12
Impacted Files Coverage Δ
src/freadR.c 100% <100%> (+3.45%) ⬆️
R/fread.R 100% <100%> (+0.39%) ⬆️
src/init.c 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6cfa03f...4d3b7f1. Read the comment docs.

@MichaelChirico
Copy link
Member

MichaelChirico commented May 8, 2019

neat idea!

@MichaelChirico
Copy link
Member

on closer read I'm a bit confused. can't we just allow select to be used exactly as colClasses?

just pass colClasses to select instead of colClasses and it will also subset to only the mentioned columns?

@mattdowle
Copy link
Member Author

mattdowle commented May 8, 2019

That is what it does. It's exactly as colClasses when colClasses is a vector. When select is used like this, colClasses can't be used at the same time. Makes sense?
When colClasses is list form, it's to set the type of columns in sets: it's not really a natural ordering of column selection; e.g. if select=colClasses=list(numeric=c(1,4,6), character=2:3) and the returned columns are the ones specified (in appearance order in the file, presumably) then how would a different order of those selected columns be specified? That can be achieved as the PR stands now using fread(..., select=c(1,6,2,3,4), colClasses=list(numeric=c(1,4,6), character=2:3).

@MichaelChirico
Copy link
Member

Am I misreading this error?

STOP("When select= is a list it must be two items: select=list(<which columns>, <their types>). But it has %d items.", LENGTH(selectArg));

@MichaelChirico
Copy link
Member

As colClasses, I'd write this:

fread(file, select=list(c("ID","userClass"), c("character","userClass")))

as

fread(file, select=list(character = 'ID', userClass = 'userClass'))

@mattdowle
Copy link
Member Author

mattdowle commented May 8, 2019

Why not :

fread(file, select=c(ID="character", someCol='userClass'))

I expanded my comment above a bit regarding the selection order.
It's "a named vector (like colClasses)" not "like colClasses". Better wording needed then?
Otherwise how to get the classes into select if we don't use list? If we use list then there's an ambiguity as to whether it's like the list form of colClasses, or whether it's the new list of two vectors (and ensuing edge-case-1 ambiguities). If we drop the new two-item list method, then how to achieve passing a vector of select and a corresponding vector of colClasses? Tried that but the ambiguity came when length(select)==length(colClasses)==ncol.

We can take our time to consider this one. I won't merge yet.

@MichaelChirico
Copy link
Member

Why not

for the same reason as colClasses, which accepts both c(column = 'type') and list(type = 'column') format

@mattdowle
Copy link
Member Author

mattdowle commented May 8, 2019

Or, instead of :

fread(file, select=list(c("ID","someCol"), c("character","integer64")))

how about :

fread(file, select=c("ID","someCol"), selectClasses=c("character","integer64"))

When selectClasses is used, select must be provided and colClasses should not be provided.
This would free up select=list(...) to work like the list form of colClasses and the order of columns returned would be the appearance order in the file, but select= could be used as well as colClasses to change the returned order (i.e. fread(..., select=c(1,6,2,3,4), colClasses=list(numeric=c(1,4,6), character=2:3)) would still work). In addition to select named vector still working without colClasses or selectClasses.

@mattdowle
Copy link
Member Author

@MichaelChirico Better?
(The uniqlist coverage seems spurious -- I'll look at that while you consider.)

@MichaelChirico
Copy link
Member

I think what I was missing is the part where select is also used to specify column order, which creates ambiguity in the use-as-colClasses case, is that right?

But if order matters, you're just consigned to using select-as-vector now?

(haven't gotten a chance to read carefully)

man/fread.Rd Outdated Show resolved Hide resolved
R/fread.R Outdated Show resolved Hide resolved
src/freadR.c Outdated Show resolved Hide resolved
@MichaelChirico
Copy link
Member

Added comments; front-end API looks good 👍

@mattdowle mattdowle merged commit fe2bfe7 into master May 10, 2019
@mattdowle mattdowle deleted the select_colClasses branch May 10, 2019 02:50
Copy link
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FR for fread: if select is used, colClasses need only correspond to the columns in select
2 participants