-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
require AbstractVector from subset selectors #2744
Changes from 3 commits
15445aa
dfcc796
94e38e9
3592e45
392b8c1
5509d34
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,6 +31,21 @@ function _and_missing(x::Any...) | |
"but only true, false, or missing are allowed")) | ||
end | ||
|
||
# we are guaranteed that ByRow returns a vector | ||
# this workaround is needed for 0-argument ByRow | ||
assert_bool_vec(fun::ByRow) = fun | ||
|
||
function assert_bool_vec(@nospecialize(fun)) | ||
return function(x...) | ||
val = fun(x...) | ||
if !(val isa AbstractVector) | ||
throw(ArgumentError("functions passed to `subset` must return " * | ||
"an AbstractVector.")) | ||
bkamins marked this conversation as resolved.
Show resolved
Hide resolved
|
||
end | ||
return val | ||
end | ||
end | ||
|
||
# Note that _get_subset_conditions will have a large compilation time | ||
# if more than 32 conditions are passed as `args`. | ||
function _get_subset_conditions(df::Union{AbstractDataFrame, GroupedDataFrame}, | ||
|
@@ -39,7 +54,7 @@ function _get_subset_conditions(df::Union{AbstractDataFrame, GroupedDataFrame}, | |
conditions = Any[if a isa ColumnIndex | ||
a => Symbol(:x, i) | ||
elseif a isa Pair{<:Any, <:Base.Callable} | ||
first(a) => last(a) => Symbol(:x, i) | ||
first(a) => assert_bool_vec(last(a)) => Symbol(:x, i) | ||
else | ||
throw(ArgumentError("condition specifier $a is not supported by `subset`")) | ||
end for (i, a) in enumerate(args)] | ||
|
@@ -77,7 +92,8 @@ Each argument passed in `args` can be either a single column selector or a | |
described for [`select`](@ref). | ||
|
||
Note that as opposed to [`filter`](@ref) the `subset` function works on whole | ||
columns (or all rows in groups for `GroupedDataFrame`). | ||
columns and always makes a subset of rows (or all rows in groups for | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does "makes a subset of rows" mean? Also it's not mentioned for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it is intended to mean that selection happens row-wise always (not whole groups) - this was the original reason why this PR is made. In particular this is a crucial difference between There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. Then maybe this?
And I'd mention that the result should be a vector of Boolean in the first paragraph as that's really an essential information. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK - added |
||
`GroupedDataFrame`) and must return a vector. | ||
|
||
If `skipmissing=false` (the default) `args` are required to produce vectors | ||
containing only `Bool` values. If `skipmissing=true`, additionally `missing` is | ||
|
@@ -180,7 +196,11 @@ Each argument passed in `args` can be either a single column selector or a | |
described for [`select`](@ref). | ||
|
||
Note that as opposed to [`filter!`](@ref) the `subset!` function works on whole | ||
columns (or all rows in groups for `GroupedDataFrame`). | ||
columns (or all rows in groups for `GroupedDataFrame`) and must return a vector. | ||
If you want to filter rows of an `AbstractDataFrame` or groups of | ||
`GroupedDataFrame` by a function that returns `Bool` value use [`filter!`](@ref) | ||
or [`filter`](@ref) (note that [`filter!`](@ref) is not supported for | ||
`GroupedDataFrame`). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sync this with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will sync once we finalize There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
||
If `skipmissing=false` (the default) `args` are required to produce vectors | ||
containing only `Bool` values. If `skipmissing=true`, additionally `missing` is | ||
|
@@ -191,7 +211,10 @@ If `ungroup=false` the resulting data frame is re-grouped based on the same | |
grouping columns as `gdf` and a `GroupedDataFrame` is returned. | ||
|
||
If `GroupedDataFrame` is subsetted then it must include all groups present in the | ||
`parent` data frame, like in [`select!`](@ref). | ||
`parent` data frame, like in [`select!`](@ref). Also note that in general | ||
the parent of the passed `GroupedDataFrame` is mutated in place, which means | ||
that the `GroupedDataFrame` is not safe to use as most likely it will | ||
get corrupted due to row removal in the parent. | ||
|
||
See also: [`subset`](@ref), [`filter!`](@ref), [`select!`](@ref) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this if bumping to 1.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated and added more explanations. Fortunately the change we make will now throw an error so it will be easy to catch.