Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/TidierOrg/TidierData.jl int…
Browse files Browse the repository at this point in the history
…o pr/cnrrobertson/104
  • Loading branch information
kdpsingh committed Jun 8, 2024
2 parents 8b1c05e + c9bc480 commit 0b346e5
Show file tree
Hide file tree
Showing 7 changed files with 29 additions and 12 deletions.
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# TidierData.jl updates

## v0.16.0 - 2024-06-07
- `unique()`, `mad()`, and `iqr()` are no longer auto-vectorized
- Bugfix: `@ungroup()` now preserves row-ordering (and is faster)
- Bugfix: `slice_sample()` now throws an error if no `n` or `prop` keyword argument is provided
- Bump minimum Julia version to 1.9

## v0.15.2 - 2024-04-19
- Update Chain.jl dependency version

Expand Down
4 changes: 2 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "TidierData"
uuid = "fe2206b3-d496-4ee9-a338-6a095c4ece80"
authors = ["Karandeep Singh"]
version = "0.15.2"
version = "0.16.0"

[deps]
Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc"
Expand All @@ -22,7 +22,7 @@ Reexport = "0.2, 1"
ShiftedArrays = "2"
Statistics = "1.6"
StatsBase = "0.34, 1"
julia = "1.6"
julia = "1.9"

[extras]
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
Expand Down
4 changes: 2 additions & 2 deletions docs/examples/UserGuide/conditionals.jl
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ end

# Although `if_else()` is convenient when evaluating a single condition, it can be cumbersome when evaluating multiple conditions because subsequent conditions need to be nested within the `no` condition for the preceding argument. For situations where multiple conditions need to be evaluated, `case_when()` is more convenient.

# Let's first consider a similar example from above and recreate it using `case_when()`. The following code creates a column `b` that assigns a value if 3 if `a >= 3` and otherwise leaves the value unchanged.
# Let's first consider a similar example from above and recreate it using `case_when()`. The following code creates a column `b` that assigns a value of 3 if `a >= 3` and otherwise leaves the value unchanged.

@chain df begin
@mutate(b = case_when(a >= 3 => 3,
Expand Down Expand Up @@ -72,4 +72,4 @@ end

# ## Do these functions work outside of TidierData.jl?

# Yes, both `if_else()` and `case_when()` work outside of TidierData.jl. However, you'll need to remember that if working with vectors, both the functions and conditions will need to be vectorized, and in the case of `case_when()`, the `=>` will need to be written as `.=>`. The reason this is not needed when using these functions inside of TidierData.jl is because they are auto-vectorized.
# Yes, both `if_else()` and `case_when()` work outside of TidierData.jl. However, you'll need to remember that if working with vectors, both the functions and conditions will need to be vectorized, and in the case of `case_when()`, the `=>` will need to be written as `.=>`. The reason this is not needed when using these functions inside of TidierData.jl is because they are auto-vectorized.
4 changes: 2 additions & 2 deletions docs/examples/UserGuide/slice.jl
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ end
# ## Sample 5 random rows in the data frame

@chain df begin
@slice_sample(5)
@slice_sample(n = 5)
end

# ## Slice the min
Expand Down Expand Up @@ -99,4 +99,4 @@ end

@chain df begin
@slice_head(n = 3)
end
end
16 changes: 13 additions & 3 deletions src/TidierData.jl
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ const code = Ref{Bool}(false) # output DataFrames.jl code?
const log = Ref{Bool}(false) # output tidylog output? (not yet implemented)

# The global do-not-vectorize "list"
const not_vectorized = Ref{Vector{Symbol}}([:getindex, :rand, :esc, :Ref, :Set, :Cols, :collect, :(:), :, :lag, :lead, :ntile, :repeat, :across, :desc, :mean, :std, :var, :median, :first, :last, :minimum, :maximum, :sum, :length, :skipmissing, :quantile, :passmissing, :cumsum, :cumprod, :accumulate, :is_float, :is_integer, :is_string, :cat_rev, :cat_relevel, :cat_infreq, :cat_lump, :cat_reorder, :cat_collapse, :cat_lump_min, :cat_lump_prop, :categorical, :as_categorical, :is_categorical])
const not_vectorized = Ref{Vector{Symbol}}([:getindex, :rand, :esc, :Ref, :Set, :Cols, :collect, :(:), :, :lag, :lead, :ntile, :repeat, :across, :desc, :mean, :std, :var, :median, :mad, :first, :last, :minimum, :maximum, :sum, :length, :skipmissing, :quantile, :passmissing, :cumsum, :cumprod, :accumulate, :is_float, :is_integer, :is_string, :cat_rev, :cat_relevel, :cat_infreq, :cat_lump, :cat_reorder, :cat_collapse, :cat_lump_min, :cat_lump_prop, :categorical, :as_categorical, :is_categorical, :unique, :iqr])

# The global do-not-escape "list"
# `in`, `∈`, and `∉` should be vectorized in auto-vec but not escaped
Expand Down Expand Up @@ -494,7 +494,17 @@ end
$docstring_ungroup
"""
macro ungroup(df)
:(DataFrame($(esc(df))))
df_expr = quote
if $(esc(df)) isa GroupedDataFrame
transform($(esc(df)); ungroup = true)
else
copy($(esc(df)))
end
end
if code[]
@info MacroTools.prettify(df_expr)
end
return df_expr
end

"""
Expand Down Expand Up @@ -542,7 +552,7 @@ macro distinct(df, exprs...)
# because if the original DataFrame is grouped, it must be ungrouped
# and then regrouped, so there's no need to make a copy up front.
# This is because `unique()` does not work on GroupDataFrames.
local df_copy = DataFrame($(esc(df)))
local df_copy = transform($(esc(df)); ungroup = true)
if $any_found_n
transform!(df_copy, nrow => :TidierData_n)
end
Expand Down
5 changes: 3 additions & 2 deletions src/docstrings.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1320,14 +1320,15 @@ julia> @semi_join(df1, df2, "a" = "a")

const docstring_pivot_wider =
"""
@pivot_wider(df, names_from, values_from)
@pivot_wider(df, names_from, values_from[, values_fill])
Reshapes the DataFrame to make it wider, increasing the number of columns and reducing the number of rows.
# Arguments
- `df`: A DataFrame.
- `names_from`: The name of the column to get the name of the output columns from.
- `values_from`: The name of the column to get the cell values from.
- `values_fill`: The value to replace a missing name/value combination (default is `missing`)
# Examples
```jldoctest
Expand Down Expand Up @@ -3409,4 +3410,4 @@ julia> @relocate(df, B:C) # bring columns to the front
4 │ 9 D 4 B 4 D
5 │ 10 E 5 C 5 E
```
"""
"""
2 changes: 1 addition & 1 deletion src/slice.jl
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ macro slice_sample(df, exprs...)
as_integer(floor(n() * $expr_dict[:prop]));
replace=$replace))
else
@slice($(esc(df)), sample(1:n(), 1; replace=$replace))
throw("Please provide either an `n` or a `prop` value as a keyword argument.")
end
end

Expand Down

0 comments on commit 0b346e5

Please sign in to comment.