Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add setdroplevels() to replace droplevels(in.place=TRUE) #6014

Closed
MichaelChirico opened this issue Mar 19, 2024 · 13 comments · Fixed by #6316
Closed

Add setdroplevels() to replace droplevels(in.place=TRUE) #6014

MichaelChirico opened this issue Mar 19, 2024 · 13 comments · Fixed by #6316
Labels
breaking-change issues whose solution would require breaking existing behavior consistency feature request
Milestone

Comments

@MichaelChirico
Copy link
Member

          Let's add setdroplevels then. We could have dot prefixed non exported function that had copy art and then exported ones that goes via this .droplevels

Originally posted by @jangorecki in #5907 (comment)

We'll need to go through a deprecation cycle for droplevels(in.place=TRUE), but we should return to the principle that only set* functions and := can do in-place modification.

Luckily, I don't see any usage of this on GitHub, so we don't need to worry too much about breaking downstreams:

https://github.com/search?q=lang%3AR+%2Fdroplevels%5B%28%5D%2F+%2Fin%5B.%5Dplace%5Cs*%3D%2F+-path%3Afdroplevels.R&type=code

Therefore I recommend we start the deprecation cycle in 1.16.0.

@MichaelChirico MichaelChirico added this to the 1.16.0 milestone Mar 19, 2024
@Nj221102
Copy link
Contributor

Nj221102 commented Apr 9, 2024

Hi i was going through this issue and want to work on it, and i noticed that we only have to replace droplevels(in.place=TRUE) instead of whole droplevels(), so i think we will the retain the current droplevels as a way to do droplevels(in.place= FALSE), i m thinking of implementing something like this :-

  1. Implement logic in current droplevels to inform people to use setdroplevels for inplace modification
droplevels.data.table = function(x, except = NULL, exclude, in.place = FALSE, ...){
   stopifnot(is.logical(in.place))
   if(in.place) error ( please use setdroplevels for droplevel(in.place= TRUE), data.table only allows inplace modification usin g the set functions. )

and setup the setdroplevels for only doing the inplace modifications.

alternatively we can also try

  1. just changing the name of droplevels to setdroplevels and update the documentation accordingly , as no one seems to be using it right now.

@jangorecki
Copy link
Member

Not error but warning, at most

@Nj221102
Copy link
Contributor

Nj221102 commented Apr 9, 2024

Not error but warning, at most

so droplevels(in.place = TRUE) should still work but throw a warning suggesting to use the set function
am i getting this right?

@jangorecki
Copy link
Member

Or, first a message, then in next release cycle a warning.

@sluga
Copy link
Contributor

sluga commented Apr 9, 2024

I like the principle (limiting in-place operations to := and set (well, and let)), but setdroplevels sounds very off to me with its two verbs, set & drop.

Having the function be called setlevels and then having an argument keep='used-only' or something like that sounds better to me. Admittedly it's a bit weird if that's the only functionality offered, but I do think there's potential for the function to expand in the future, i.e. do other operations targeting levels. And in that case the API is going to be much nicer and consistent with other set* functions (e.g. setkey handles both adding & removing a key, vs. having a separate setdropkey function).

@Nj221102
Copy link
Contributor

Nj221102 commented Apr 9, 2024

@jangorecki @MichaelChirico Hi, i was implementing the setdroplevels() similar to how droplevels() works just without the in.place arguement and the logic around it .


droplevels.data.table = function(x, except = NULL, exclude, in.place = FALSE, ...){
    stopifnot(is.logical(in.place))
    if(in.place) messagef("Please note: The function droplevels() with 'in.place = TRUE' is being deprecated. Instead, consider using setdroplevels() for in-place modification. Data.table only allows in-place modification using the set functions.")
    if (nrow(x)==0L) return(x)
    ix = vapply(x, is.factor, NA)
    if(!is.null(except)){
        stopifnot(is.numeric(except), except <= length(x))
        ix[except] = FALSE
    }
    if(!sum(ix)) return(x)
    if(!in.place) x = copy(x)
    for(nx in names(ix)[ix==TRUE]){
        if (missing(exclude)) set(x, i = NULL, j = nx, value = fdroplevels(x[[nx]]))
        else set(x, i = NULL, j = nx, value = fdroplevels(x[[nx]], exclude = exclude))
    }
    return(x)
}

setdroplevels = function(x, except = NULL, exclude, ...){
    if (nrow(x)==0L) return(x)
    ix = vapply(x, is.factor, NA)
    if(!is.null(except)){
        stopifnot(is.numeric(except), except <= length(x))
        ix[except] = FALSE
    }
    if(!sum(ix)) return(x)
    for(nx in names(ix)[ix==TRUE]){
        if (missing(exclude)) set(x, i = NULL, j = nx, value = fdroplevels(x[[nx]]))
        else set(x, i = NULL, j = nx, value = fdroplevels(x[[nx]], exclude = exclude))
    }
    return(x)
}

but despite being identical to each other, while doing some operation like

setdroplevels(factor())
droplevels(factor())

setdroplevels gives the error : argument is of length zero and droplevels don't

> setdroplevels(factor())
Error in if (nrow(x) == 0L) return(x) : argument is of length zero
> droplevels(factor())
factor()
Levels: 

I am confused about , why is this happening?, and tried searching online but wasn't successful in finding anything useful there either.

@sluga
Copy link
Contributor

sluga commented Apr 9, 2024

@Nj221102 The droplevels call isn't using droplevels.data.table().

@MichaelChirico
Copy link
Member Author

MichaelChirico commented Apr 9, 2024

I like the principle (limiting in-place operations to := and set (well, and let)), but setdroplevels sounds very off to me with its two verbs, set & drop.

Having the function be called setlevels and then having an argument keep='used-only' or something like that sounds better to me. Admittedly it's a bit weird if that's the only functionality offered, but I do think there's potential for the function to expand in the future, i.e. do other operations targeting levels. And in that case the API is going to be much nicer and consistent with other set* functions (e.g. setkey handles both adding & removing a key, vs. having a separate setdropkey function).

See also #2219. I would be a fan of setlevels(x, levels, drop), that way we can do:

setlevels(x, levels)       # provide a vector of levels
setlevels(x, drop=TRUE)    # equivalent to droplevels(x, in.place=TRUE)
setlevels(x, levels, drop) # error, I think?

Actually, to get closer to droplevels.data.table() / setnafill(), I guess we'd want setlevels(x, levels, drop, cols=seq_along(x)).

@ben-schwen
Copy link
Member

ben-schwen commented Apr 10, 2024

Wouldn't it be cleaner/more equivalent to common R setting to just use NULL as argument value for dropping?

setlevels(x, levels)
setlevels(x, levels=NULL) # to droplevels

@Nj221102
Copy link
Contributor

@Nj221102 The droplevels call isn't using droplevels.data.table().

I m bit confused, like if droplevels isn't using droplevels.data.table(), then what is it using ?, fdroplevels?

@ben-schwen
Copy link
Member

ben-schwen commented Apr 10, 2024

@Nj221102 The droplevels call isn't using droplevels.data.table().

I m bit confused, like if droplevels isn't using droplevels.data.table(), then what is it using ?, fdroplevels?

You should check out how method dispatch is working in R especially S3 classes. You are calling droplevels on a vector and hence it is dispatched to the default method

@MichaelChirico
Copy link
Member Author

Wouldn't it be cleaner/more equivalent to common R setting to just use NULL as argument value for dropping?

setlevels(x, levels)
setlevels(x, levels=NULL) # to droplevels

I don't like this, I think it's trying to be too cute & overload the signature. Adding a new argument will make for more readable code. setlevels(x, levels=NULL) does not have a clear interpretation (strip all levels? convert to character?).

Actually, to get closer to droplevels.data.table() / setnafill(), I guess we'd want setlevels(x, levels, drop, cols=seq_along(x)).

Re-read ?droplevels and noticed droplevels.data.frame has exclude= instead of cols=. I'm not sure which we should include... cols= matches our other functions better. The argument for exclude= is consistency with base data.frame methods. I guess the question is what the typical use case is for keeping levels of some-but-not-all factor columns? There are only ~14 usages of droplevels(exclude=) on CRAN (besides those supplying exclude=NA or exclude=NULL), so it's not like this is a particularly frequent request anyway.

@MichaelChirico MichaelChirico added feature request consistency breaking-change issues whose solution would require breaking existing behavior labels Jul 15, 2024
@MichaelChirico
Copy link
Member Author

setdroplevels sounds very off to me with its two verbs, set & drop.

Honestly I'm not that worried about this. We already have setalloccol(). It's two parts set (by-reference) droplevel (known function from base). I think it should be easy enough to parse for data.table users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change issues whose solution would require breaking existing behavior consistency feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants