Skip to content

[R] Differing results in log bindings #32751

@asfimport

Description

@asfimport

We get different results for dplyr versus Acero if we call log on a column that contains 0, i.e.

library(arrow)
library(dplyr)

df <- tibble(x = 0:10)

# In dplyr/base R
df %>%
  mutate(y = log(x)) %>%
  collect()
#> # A tibble: 11 × 2
#>        x        y
#>    <int>    <dbl>
#>  1     0 -Inf    
#>  2     1    0    
#>  3     2    0.693
#>  4     3    1.10 
#>  5     4    1.39 
#>  6     5    1.61 
#>  7     6    1.79 
#>  8     7    1.95 
#>  9     8    2.08 
#> 10     9    2.20 
#> 11    10    2.30

# In Acero
df %>%
  arrow_table() %>%
  mutate(y = log(x)) %>%
  collect()
#> Error in `collect()`:
#> ! Invalid: logarithm of zero

This is because R defines log(0) as -Inf whereas Acero defines it as an error. Do we need to map base::log() to Acero's ln instead of ln_checked?

Reporter: Nicola Crane / @thisisnic

Note: This issue was originally created as ARROW-17490. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions