We get different results for dplyr versus Acero if we call log on a column that contains 0, i.e.
library(arrow)
library(dplyr)
df <- tibble(x = 0:10)
# In dplyr/base R
df %>%
mutate(y = log(x)) %>%
collect()
#> # A tibble: 11 × 2
#> x y
#> <int> <dbl>
#> 1 0 -Inf
#> 2 1 0
#> 3 2 0.693
#> 4 3 1.10
#> 5 4 1.39
#> 6 5 1.61
#> 7 6 1.79
#> 8 7 1.95
#> 9 8 2.08
#> 10 9 2.20
#> 11 10 2.30
# In Acero
df %>%
arrow_table() %>%
mutate(y = log(x)) %>%
collect()
#> Error in `collect()`:
#> ! Invalid: logarithm of zero
This is because R defines log(0) as -Inf whereas Acero defines it as an error. Do we need to map base::log() to Acero's ln instead of ln_checked?
Reporter: Nicola Crane / @thisisnic
Note: This issue was originally created as ARROW-17490. Please see the migration documentation for further details.
We get different results for dplyr versus Acero if we call log on a column that contains 0, i.e.
This is because R defines
log(0)as-Infwhereas Acero defines it as an error. Do we need to mapbase::log()to Acero'slninstead ofln_checked?Reporter: Nicola Crane / @thisisnic
Note: This issue was originally created as ARROW-17490. Please see the migration documentation for further details.