-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better NaN handling with ecdf #537
Conversation
@@ -9,6 +8,7 @@ struct ECDF{T <: AbstractVector{<:Real}, W <: AbstractWeights{<:Real}} | |||
end | |||
|
|||
function (ecdf::ECDF)(x::Real) | |||
isnan(x) && return NaN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use the element type of ECDF
to to ensure type stability of e.g. Float32
s and Dual
so probably something like T(NaN)
for ECDF{T}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, there are already other type instabilities present:
weightsum = evenweights ? length(ecdf.sorted_values) : sum(ecdf.weights)
I'm fine fixing these here or in a different PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, fixes were easy. I'll push in a sec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add tests, including @inferred
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind again, way harder to handle NaN in a type stable way than I thought! I'm gonna punt on this and ask this to be merged as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The weightsum
type instability is not visible outside of the function for Float32
since partialsum/weightsum
will return a Float64
. So NaN
is more correct than oftype(T, NaN)
. Not sure about Dual
, would need to try that.
But please add new tests.
Codecov Report
@@ Coverage Diff @@
## master #537 +/- ##
==========================================
+ Coverage 90.43% 90.44% +<.01%
==========================================
Files 21 21
Lines 2101 2103 +2
==========================================
+ Hits 1900 1902 +2
Misses 201 201
Continue to review full report at Codecov.
|
Sorry for saying this only now, but it would probably be better to only print a deprecation for now instead of throwing an error. That will possibly leave some time for users to adapt, and it won't require us to tag a new breaking release just for this. Breaking releases are painful as they require all dependent packages to update their upper bound, or users will keep using old StatsBase versions. |
Sure, every bug fix is technically breaking, but that's being awfully strict? This is simply fixing incorrect results: julia> ecdf([1,NaN])(1)
0.5 |
Ah, right, I thought the current behavior just skipped |
The test failure on 32 bit seems to be a time out because of deprecations |
Ref: #413
New behaviors:
Side note: I didn't actually touch the Project.toml file...did Pkg make that change for me?