-
Notifications
You must be signed in to change notification settings - Fork 373
allow scalars in subset and subset! as conditions #3032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I have thought about this PR. Rationale:
What do you think? |
|
After having slept over the issue I came to the conclusion that the behavior "disallow vector for |
|
bump (from my perspective this PR is finished 😄). I have now rebased it and added some small cleanups in the documentaion. |
| # we special case 0-length cond, as in this case broadcasting does not | ||
| # guarantee setting a proper eltype for the result | ||
| if isempty(cond) | ||
| if eltype(cond) !== Bool | ||
| throw(ArgumentError("passed conditions produce $(eltype(cond)) " * | ||
| "as element type of the result while only " * | ||
| "Bool is allowed.")) | ||
| end | ||
| else | ||
| @assert eltype(cond) === Bool | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. Contrary to what the comment says, the check is dependent on the eltype even for empty vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When cond is non-empty then broadcasting mechanism guarantees that in line 148 must hold. That is why we have @assert there.
Conversely, if cond is empty broadcasting lets incorrect element type slip through:
julia> subset(DataFrame(), [] => ByRow(() -> "aaa"))
ERROR: ArgumentError: passed conditions produce Union{} as element type of the result while only Bool is allowed.
As you can see here although the condition returns String it is not correctly identified as incorrect by _and.(cols...) call because the output vector is empty.
Is it clear now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But won't that error also be thrown when cond is empty and broadcast uses eltype Any because inference fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it is possible. Can you think of an example? Here are things that I have checked:
julia> DataFrames._and.([]) # supertype
0-element BitVector
julia> DataFrames._and.(Number[]) # supertype
0-element BitVector
julia> DataFrames._and.(Union{Bool, Missing}[]) # supertype
0-element BitVector
julia> DataFrames._and.(Int[]) # empty intersection
Union{}[]
julia> DataFrames._and.(Char[]) # empty intersection
Union{}[]
julia> DataFrames._and.(Union{Int, Missing}[]) # empty intersection
Union{}[]
and my rule is the best we can count on AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how it's possible that inferences guesses that the result is Bool in so difficult cases. In theory there's no guaranty that the return type is inferred, but since I can't come up with an example...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty container is not likely in practice, so I prefer to be on the safe side.
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
I think it’s the best solution too. Thanks! |
|
@nalimilan - could you please have a look at this PR? |
|
Thank you! |
Fixes #2740