Skip to content

Commit

Permalink
Merge pull request #67 from TidierOrg/summarize-autovec
Browse files Browse the repository at this point in the history
Modify `@summarize()` to enable auto-vectorization.
  • Loading branch information
Karandeep Singh committed Dec 5, 2023
2 parents adb12bc + 2d78972 commit 6a550d5
Show file tree
Hide file tree
Showing 4 changed files with 23 additions and 6 deletions.
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# TidierData.jl updates

## v0.13.5 - 2023-12-05
- `@summarize()` and `@summarise()` now perform auto-vectorization in the same way as `@mutate()`, meaning that the top-level macros are now all consistent in their treatment of auto-vectorization.
- Update documentation to describe new auto-vectorization behavior and give an example of how to modify the `TidierData.not_vectorized[]` array.

## v0.13.4 - 2023-11-28
- Macros used inside of verbs like `@mutate()` are now escaped, making it possible to work with Unitful units (e.g. `u"psi"`)

Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "TidierData"
uuid = "fe2206b3-d496-4ee9-a338-6a095c4ece80"
authors = ["Karandeep Singh"]
version = "0.13.4"
version = "0.13.5"

[deps]
Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc"
Expand Down
21 changes: 17 additions & 4 deletions docs/examples/UserGuide/autovec.jl
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
# In general, TidierData.jl uses a lookup table to decide which functions *not* to vectorize. For example, `mean()` is listed as a function that should never be vectorized. Also, any function used inside of `@summarize()` is also never automatically vectorized. Any function that is not included in this list *and* is used in a context other than `@summarize()` is automatically vectorized.
# TidierData.jl uses a lookup table to decide which functions *not* to vectorize. For example, `mean()` is listed as a function that should never be vectorized. Also, any function used inside of `across()` is also not automatically vectorized. Any function that is not included in this list *and* is used in a context other than `across()` is automatically vectorized.

# This "auto-vectorization" makes working with TidierData.jl more R-like and convenient. However, if you ever define your own function and try to use it, TidierData.jl may unintentionally vectorize it for you. To prevent auto-vectorization, you can prefix your function with a `~`.
# Which functions are not vectorized? The set of non-vectorized functions is contained in the array `TidierData.not_vectorized[]`. Let's take a look at this array. We will wrap it in a `string()` to make the output easier to read.

using TidierData
using RDatasets

string(TidierData.not_vectorized[])

# This "auto-vectorization" makes working with TidierData.jl more R-like and convenient. However, if you ever define your own function and try to use it, TidierData.jl may unintentionally vectorize it for you. To prevent auto-vectorization, you can prefix your function with a `~`.

df = DataFrame(a = repeat('a':'e', inner = 2), b = [1,1,1,2,2,2,3,3,3,4], c = 11:20)

Expand All @@ -23,6 +26,16 @@ end
@mutate(d = c - ~new_mean(c))
end

# Or you can modify the do-not-vectorize list like this:

push!(TidierData.not_vectorized[], :new_mean)

# Now `new_mean()` should behave just like `mean()` in that it is treated as non-vectorized.

@chain df begin
@mutate(d = c - new_mean(c))
end

# This gives us the correct answer. Notice that adding a `~` is not needed with `mean()` because `mean()` is already included on our look-up table of functions not requiring vectorization.

@chain df begin
Expand All @@ -41,4 +54,4 @@ end
@mutate(d = c - mean.(c))
end

# Note: `~` also works with operators, so if you want to *not* vectorize an operator, you can prefix it with `~`, for example, `a ~* b` will perform a matrix multiplication rather than element-wise multiplication. Remember that this is only needed outside of `@summarize()` because `@summarize()` never performs auto-vectorization.
# Note: `~` also works with operators, so if you want to *not* vectorize an operator, you can prefix it with `~`, for example, `a ~* b` will perform a matrix multiplication rather than element-wise multiplication.
2 changes: 1 addition & 1 deletion src/TidierData.jl
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,7 @@ macro summarize(df, exprs...)
any_found_n = any([i[2] for i in interpolated_exprs])
any_found_row_number = any([i[3] for i in interpolated_exprs])

tidy_exprs = parse_tidy.(tidy_exprs; autovec=false)
tidy_exprs = parse_tidy.(tidy_exprs; autovec=true) # use auto-vectorization inside `@summarize()`
df_expr = quote
if $any_found_n || $any_found_row_number
if $(esc(df)) isa GroupedDataFrame
Expand Down

2 comments on commit 6a550d5

@kdpsingh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/96494

Tip: Release Notes

Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.

@JuliaRegistrator register

Release notes:

## Breaking changes

- blah

To add them here just re-invoke and the PR will be updated.

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.13.5 -m "<description of version>" 6a550d5163cb36d7f6f92adfad6a6b96e39cbf9e
git push origin v0.13.5

Please sign in to comment.