Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implicit type promotion in fcoalesce and fifelse #4101

Open
BenoitLondon opened this issue Dec 11, 2019 · 17 comments
Open

implicit type promotion in fcoalesce and fifelse #4101

BenoitLondon opened this issue Dec 11, 2019 · 17 comments

Comments

@BenoitLondon
Copy link

BenoitLondon commented Dec 11, 2019

Not sure I understand how the fcoalesce function is meant to be used but I found it not very user-friendly.

e.g I have a data.table coming from some computation or database and if one column is full of NA it will be of type Logical and then cant be used directly inside fcoalesce.

> fcoalesce(NA, 1L, 1.0)
Error in fcoalesce(NA, 1L, 1) : 
  Item 2 is type integer but the first item is type logical. Please coerce before coalescing.
> fifelse(TRUE, NA, 1.0)
Error in fifelse(TRUE, NA, 1) : 
  'yes' is of type logical but 'no' is of type double. Please make sure that both arguments have the same type.
@BenoitLondon BenoitLondon changed the title [FEATURE REQUEST] implicit type promotion in fcoalesce [FEATURE REQUEST] implicit type promotion in fcoalesce and fifelse Dec 12, 2019
@jangorecki jangorecki changed the title [FEATURE REQUEST] implicit type promotion in fcoalesce and fifelse implicit type promotion in fcoalesce and fifelse Dec 13, 2019
@2005m
Copy link
Contributor

2005m commented Dec 13, 2019

I think it is easy to do for fifelse. The question is do we want to do it?

@jaapwalhout
Copy link

jaapwalhout commented Dec 14, 2019

@BenoitLondon I think these error messages could be expected as NA is by default of class logical in R (see also ?NA):

> class(NA)
[1] "logical"

Luckily you can force missing value to be of a certain type, by using NA_integer_, NA_real_, NA_complex_ or NA_character_. Applying this knowledge to your fifelse-example, the following works:

> fifelse(TRUE, NA_real_, 1.0)
[1] NA
> fifelse(TRUE, NA_integer_, 1.0)
[1] NA

@jangorecki
Copy link
Member

jangorecki commented Dec 15, 2019

I think that fixing data from the upstream provider is the most appropriate step, it doesn't have to be anything sophisticated.
Adding extra line to ensure class is expected one

dt = as.data.table(dbGetQuery("select * from users"))
dt[, `:=`(id=as.integer(id), name=as.character(name), rate=as.numeric(rate))]
myifelse(dt, ...)

Or a sketch of handling that more generally by a wrapper function having colClasses.

dbGetQuery2 = function(..., colClasses=NULL) {
  ans = as.data.table(dbGetQuery(...))
  if (is.null(colClasses)) return(ans)
  for (col in names(colClasses)) set(ans, i=NULL, j=col, value=do.call(paste0("as.", colClasses[[col]]), list(ans[[col]])))
  ans
}
dt = dbGetQuery2("select * from users", colClasses=c(id="integer", name="character", rate="numeric"))
myifelse(dt, ...)

This way any logical NA will go through as.[class] function.


This is really a bug in data delivery layer. The fact that it does not preserve schema information, but build schema based on data it reads. It is fine for a csv or other schema-less format, but then there is csvy which meant to address that gap by carrying schema in csv header. I would close it for now, we can still observe it will be coming back often or not.

@BenoitLondon
Copy link
Author

Ok I may have badly expressed myself... I m aware of these solutions, I m just saying they are not user friendly. Dplyr is implementing exactly this and base ifelse always behaved this way. I don't understand why you go the other way... If really you want to keep exact type checking maybe we could have type promoting versions ? tfcoalesce and tfifelse ?

@jaapwalhout
Copy link

@BenoitLondon I see that ifelse gives indeed another result:

> ifelse(TRUE, NA, 1.0)
[1] NA

It might therefore be worth considering to keep consistency with base R's ifelse (imho).

@2005m
Copy link
Contributor

2005m commented Dec 15, 2019

ifelse is slow plus it does not preserve the attributes. I am not really in favour of duplicating code and creating a new function however I can try to please everybody here (not sure it is a good thing though). I can suggest that we add an additional argument to fifelse to allow type promotion and in that case it will behave a bit more like base ifelse. The default flag will still be not to allow type promotion. What do you think?

@BenoitLondon
Copy link
Author

Was thinking about that, but again base ifelse behaves differently and dplyr is implementing type promotion by default AFAIK.
The user friendly way is to say "just replace your old ifelse by fifelse" you ll see no difference apart a massive speedup... This is not possible without type.promotion and an argument is quite cumbersome, always need to remember it and if you forget your code breaks...
I think another set of function with default type.promotion is the best solution, especially same will happen probably for case_when

@jangorecki jangorecki reopened this Dec 15, 2019
@philippechataignon
Copy link
Contributor

fifelse is equivalent to dplyr::if_else : both doesn't do type promotion and that's why they are predictable and fast.

> dplyr::if_else(T, NA, 1.0)
Erreur : `false` must be a logical vector, not a double vector
Run `rlang::last_error()` to see where the error occurred.

@BenoitLondon
Copy link
Author

Yeah for dplyr I was thinking of coalesce
tidyverse/dplyr#2254

@MichaelChirico
Copy link
Member

OTOH we allow DT[ , integer_col := 1]. In a similar vein I think it's natural to type bump fifelse(FALSEx10, integerx10, 1). Would have to double check the set/:= code (recent NEWS item) to see what the extent of that flexibility is (perhaps just limited to numeric-literal -> integer-literal)

@jangorecki
Copy link
Member

jangorecki commented Dec 16, 2019

It make sense to have a dedicated function that can be re-used in many places for such purpose. As suggested in #3913, so I would say that we should not try to implement the request separately, but first finish ##3913 and then re-use it here.

@MichaelChirico
Copy link
Member

agreed, especially since that will force consistency on how this happens

@HughParsonage
Copy link
Member

I think we should resist this feature request.

The user friendly way is to say "just replace your old ifelse by fifelse" you ll see no difference apart a massive speedup... This is not possible without type.promotion and an argument is quite cumbersome, always need to remember it and if you forget your code breaks...

Well it was never advertised as such -- indeed one of its primary features is that it doesn't do type promotion. From NEWS

New function fifelse(test, yes, no, na) has been implemented in C by Morgan Jacob ... unlike base::ifelse the output type is consistent with those of yes and no

A similar rationale was presented when hadley introduced dplyr::if_else.


You offer some cases where type promotion seems a sensible choice, but I don't why data.table should change, potentially at the cost of consistency, when it is simple to create your own function, tailored to your expectations. Type promotion is always idiosyncratic.

@BenoitLondon
Copy link
Author

Well I really love data.table and wish I could use only this package for data manipulation but it was missing three functions for me coalesce ifelse and case_when which are very handy but the handiness is lost if they behave differently from base/dplyr alternatives. As you say, I might be able to implement my versions but it s preferable to be in data.table. it is also a good case for data.table to be more accessible to new users and get more attention.
Another set of functions with type promotion enabled would please everybody?

@2005m
Copy link
Contributor

2005m commented May 9, 2020

@BenoitLondon , you might want to check kit::iif, my orginal implementation of fifelse but it also allows type promotion.

jangorecki added a commit that referenced this issue May 24, 2020
@jangorecki
Copy link
Member

Change is pretty straightforward utilizing memrecycle C function.

str(fifelse(TRUE, NA, 1.0))
# num NA
str(fifelse(FALSE, NA, 1.0))
# num 1

Docs mentioned by @HughParsonage still holds

unlike base::ifelse the output type is consistent with those of yes and no

The question is should we also introduce class promotion? for example promotion to i64 works fine, unless one of the objects is REAL, because then type match but class don't.

i64 = bit64::as.integer64
str(fifelse(TRUE, NA, i64(1)))
#integer64 NA 
str(fifelse(FALSE, NA, i64(1)))
#integer64 1 

str(fifelse(TRUE, 1.5, i64(1)))
#Error in fifelse(TRUE, 1.5, i64(1)) : 
#  'yes' has different class than 'no'. Please make sure that both arguments have the same class.
str(fifelse(FALSE, 1.5, i64(1)))
#Error in fifelse(FALSE, 1.5, i64(1)) : 
#  'yes' has different class than 'no'. Please make sure that both arguments have the same class.

@jangorecki
Copy link
Member

jangorecki commented May 26, 2020

The tricky thing about class promotion, as opposed to type promotion is to handle unrelated classes, like how to promote IDate and factor.
For type promotion we have an integer assigned to a type, and the higher value is the one that we need to align to. For class we clearly cannot do that, and I think we have to provide each supported class pair coercion explicitly in a dictionary, which won't be very robust.
But do we want to promote classes in the first place?

jangorecki added a commit that referenced this issue May 30, 2020
jangorecki added a commit that referenced this issue May 30, 2020
This reverts commit cc2cc0e.
@jangorecki jangorecki removed their assignment May 31, 2020
MichaelChirico added a commit that referenced this issue Mar 4, 2024
* nafill rework for more robust coerce

* initial change for #4101

* nafill simple fill coerce

* nafill const works for locf and nocb as well, closes #3594

* fix tests for #4101

* coverage

* placeholder for nafill #3992

* use coerceAs in froll

* enable disabled test

* nafill retain names, tests for fill list

* coerceAs gets copyArg so now can return its input

* better control of verbose, and better find class for coerceAs

* proper verbose to int

* verbose changes coverage

* rm unused anymore

* more precise verbose arg check

* memrecycle escape warnings and simple verbose for numcol==0

* coerceAs does not emit extra warning anymore

* coerceAs verbose, more tests

* use older nanotime api

* update error msg

* coverage

* codecov of coerceAs

* catch all verbose

* Revert "initial change for #4101"

This reverts commit 1fa2069.

* Revert "fix tests for #4101"

This reverts commit cc2cc0e.

* use coerceAs in fcast.c

* restore actual fix

* ws

* incomplete merge

* vestigial bad merge

* minimize diff

* coerceAs throws warning for string->double, and errors on list

* comment

* example without warning

* bad NEWS merge

* warning is still needed

---------

Co-authored-by: jangorecki <j.gorecki@wit.edu.pl>
Co-authored-by: Michael Chirico <michael.chirico@grabtaxi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants