Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-15701 [R] month() should allow integer inputs #12482

Closed
wants to merge 17 commits into from

Conversation

dragosmg
Copy link
Contributor

@dragosmg dragosmg commented Feb 21, 2022

Once this PR is merge the {arrow} binding for month() will be able to accept integer inputs. This also impact the semester() which now will be able to extract semesters from integer inputs.

library(lubridate)
library(dplyr)
library(arrow)

df <- tibble(
  month_as_int = 1:12
)

df %>% 
  mutate(m = month(month_as_int),
         s = semester(month_as_int))
#> # A tibble: 12 × 3
#>    month_as_int     m     s
#>           <int> <int> <int>
#>  1            1     1     1
#>  2            2     2     1
#>  3            3     3     1
#>  4            4     4     1
#>  5            5     5     1
#>  6            6     6     1
#>  7            7     7     2
#>  8            8     8     2
#>  9            9     9     2
#> 10           10    10     2
#> 11           11    11     2
#> 12           12    12     2

df %>% 
  arrow_table() %>% 
  mutate(m = month(month_as_int),
         s = semester(month_as_int)) %>% 
  collect()
#> # A tibble: 12 × 3
#>    month_as_int     m     s
#>           <int> <int> <int>
#>  1            1     1     1
#>  2            2     2     1
#>  3            3     3     1
#>  4            4     4     1
#>  5            5     5     1
#>  6            6     6     1
#>  7            7     7     2
#>  8            8     8     2
#>  9            9     9     2
#> 10           10    10     2
#> 11           11    11     2
#> 12           12    12     2

Created on 2022-03-02 by the reprex package (v2.0.1)

@github-actions
Copy link

@dragosmg dragosmg marked this pull request as ready for review March 2, 2022 09:23
call_binding("between", x, 1, 12),
x,
NA_integer_)
x <- build_expr("cast", x * 28L, options = cast_options(to_type = date32()))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cheeky implementation, but it works. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could actually short-circuit here and return the expression with simply if_else when label = FALSE, since that will return an integer itself, yeah?

I agree that this is a little bit unorthodox, but I think it works. An alternative would be to put this value into a date string with a call to the paste binding and then doing strptime on that. Both aren't super ideal — but I suspect that this integer multiplication is going to be more performant (though if we are basing a decision on that, we probably should measure it!).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done any benchmarking yet.

@dragosmg dragosmg force-pushed the lubridate_month_int_input branch 4 times, most recently from 3273d94 to ce79b61 Compare March 4, 2022 13:06
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good — a few comments about the structure of the function body.

Also, there's at least one call to Expression$create() that probably could be build_expr(), what do you think?

call_binding("between", x, 1, 12),
x,
NA_integer_)
x <- build_expr("cast", x * 28L, options = cast_options(to_type = date32()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could actually short-circuit here and return the expression with simply if_else when label = FALSE, since that will return an integer itself, yeah?

I agree that this is a little bit unorthodox, but I think it works. An alternative would be to put this value into a date string with a call to the paste binding and then doing strptime on that. Both aren't super ideal — but I suspect that this integer multiplication is going to be more performant (though if we are basing a decision on that, we probably should measure it!).

r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
@dragosmg dragosmg requested a review from jonkeane March 7, 2022 14:23
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two very minor, more stylistic nit-picky comments that you can take or leave

r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
@jonkeane jonkeane closed this in 1b77e6d Mar 9, 2022
@ursabot
Copy link

ursabot commented Mar 9, 2022

Benchmark runs are scheduled for baseline = a76794c and contender = 1b77e6d. 1b77e6d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.25% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.81% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@dragosmg dragosmg deleted the lubridate_month_int_input branch April 26, 2022 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants