Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.SDcols should be able to understand colA:colB #748

Closed
arunsrinivasan opened this issue Jul 31, 2014 · 3 comments
Closed

.SDcols should be able to understand colA:colB #748

arunsrinivasan opened this issue Jul 31, 2014 · 3 comments
Assignees
Milestone

Comments

@arunsrinivasan
Copy link
Member

require(data.table)
DT <- data.table(a=1,b=2,c=3,d=4,e=5)

So far, .SDcols can do:

DT[, .SD, .SDcols=2:4]
# or
DT[, .SD, .SDcols=c("b", "c", "d")]
# or
DT[, .SD, .SDcols=-c("a", "e")]
# or
DT[, .SD, .SDcols=-c(1,5)]

What'd be also advantageous is to do:

DT[, .SD, .SDcols = b:d]

to allow selection of adjacent columns by names using : operator.

@arunsrinivasan
Copy link
Member Author

And so could by, now that I think of it:

DT[, .N, by=b:d]

would be easier when by columns are adjacent.

@rsaporta
Copy link
Contributor

rsaporta commented Aug 2, 2014

maybe something like this
(not sure where to best insert these functions... if stand alone or if inside [.data.table):

check_for_seq checks the form, interprets if of the expected form, otherwise, passes it through
convert_to_col_number_and_check_valid does as the name implies

check_for_seq <- function(input, x) {
## If input is of the form
##      from:to
## this function will interpret from and to to reference columns
## from, to can be integers or column names
## they do NOT need to be of the same type
##
## If the input is NOT of the above form, the input is returned unchanged
##

  arg <- as.list(substitute(input))

  ## if input is not of the expected form, return it unchanged
  if (!(arg[[1]] == ':' & length(arg) == 3)) 
    return(input)

  from <- convert_to_col_number_and_check_valid ( arg[[2]], x, showWarnings=TRUE )
  to   <- convert_to_col_number_and_check_valid ( arg[[3]], x, showWarnings=TRUE )

  return( names(x)[from:to] )

}


convert_to_col_number_and_check_valid <- function(cn, x, showWarnings=TRUE) {
## returns the column number of x corresponding to cn
## while performing several checks to ensure validity
##
## cn : a column name or column number
##      if it is numeric, then we simply check to ensure it is within seq(x)
##      if it is factor,  it is coerced to character with optional warning
##      otherwise, it is presumed a character and any coercian will be done by `==`
##                 in the line  which(cn == names(x))


  ## x must have valid names
  if (is.null(names(x)))
    stop ("x must have valid names. They are names(x) is NULL")

  ## if input is.name, we need to deparse it
  if (is.name(cn))
    cn <- deparse(cn)

  ## factors are too ambiguous. Do we interpret the character value 
  if (is.factor(cn)) {
    if (isTRUE(showWarnings))
      warning("input is a factor and will be coerced to character")
    cn <- as.character(cn)
  }

  ## match to col index number
  if (is.numeric(cn))
    col <- cn
  else 
    col <- which(cn == names(x))


  ## ERROR CHECK: confirm there is one and exactly one match
  ## --------------- ##
    extra_error_msg <- "\n\nNOTE: using 'b':'d' is a new feature.\nIf you feel this is error should not have occured, please report it."

    if (! length(col))  ## No Match
      stop ("'", cn, "' is not a name of a column of the data.table.", extra_error_msg)
    if (length(col) > 1)  ## More than one match
      stop ("'", cn, "' matches more than one (", length(col), ") columns of the data.table.\nConsider using make.names()", extra_error_msg)
    if (!any(col == seq(x))) ## outside range of x
      stop ("'", cn, "' is byeond the range of the seq(x) which is from (1:", length(x), ").", extra_error_msg)
  ## --------------- ##

  return(col)
}

@arunsrinivasan
Copy link
Member Author

Nice that you've taken up on this. I just wrote it down so that I don't forget. 733cef8 isn't necessary IMHO. There's already several cases we test for .SDcols in [.data.table. It just needs one more case for A:B.
I don't think it's necesary to test any other case here.. like: 1:B or A:5 or 'A':'B'. Just A:B, and maybe -(A:B).

@arunsrinivasan arunsrinivasan self-assigned this Mar 5, 2015
@arunsrinivasan arunsrinivasan added this to the v1.9.6 milestone Mar 5, 2015
arunsrinivasan added a commit that referenced this issue Jul 11, 2015
@rick, feel free to include/roll back in case you have other use cases I'm not aware of.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants