-
Notifications
You must be signed in to change notification settings - Fork 115
Adding geometric distribution for GLM.jl #480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t, etc. The probability that a batsman can make a successful hit on the ball before he gets out by the bowler, is calculated efficiently using the geometric probability distribution function. Now each player will have their own distribution depending on their batting ability. So, the probability of successfully playing a ball can be modelled as a function of several covariates of the player. Note that, Geometric distribution is a special case Negative Binomial distribution when the number of failures is 1.
…ace `approx` function by `≈` in test cases.
nalimilan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Have you checked the results against another implementation?
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Codecov Report
@@ Coverage Diff @@
## master #480 +/- ##
==========================================
+ Coverage 85.12% 85.28% +0.16%
==========================================
Files 7 7
Lines 827 836 +9
==========================================
+ Hits 704 713 +9
Misses 123 123
Continue to review full report at Codecov.
|
src/glmtools.jl
Outdated
| devresid(::Gamma, y, μ::Real) = -2 * (log(y / μ) - (y - μ) / μ) | ||
| function devresid(::Geometric, y, μ::Real) | ||
| v = 2 * (xlogy(y, y / μ) + xlogy(y + 1, (μ + 1) / (y + 1))) | ||
| return μ == 0 ? oftype(v, NaN) : v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably move this check above to shortcircuit faster when mu==0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - I will update the code similar to the following:
μ == 0 ? NaN : 2 * (xlogy(y, y / μ) + xlogy(y + 1, (μ + 1) / (y + 1)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
μ == 0 && return oftype(μ, NaN)
return 2 * (xlogy(y, y / μ) + xlogy(y + 1, (μ + 1) / (y + 1)))is probably how I would do it, but it's really a matter of personal style and what @nalimilan thinks 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it sounds better to adapt the NaN type to that of the inputs. But I'm not sure whether it should be the type of μ, that of y, or promote_type(typeof(μ), typeof(y))...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
promote_type(typeof(μ), typeof(y)) seems the way to go. Given that we have an expression with y/mu and xlogy never returns a narrower type than the inputs, we can expect the result to be at least as wide as y/mu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually promote_type(typeof(μ), typeof(y)) will give a failure if both types are Integer as conversion of NaN won't work. So we need to do float(promote_type(typeof(μ), typeof(y))).
palday
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to document where the numbers in the "Geometric LogLink" test set come from, but otherwise looks good to me.
I don't think we note this in the code usually, I think the implicit assumption is that values have been checked against R, either directly or indirectly (here, by checking against |
|
Bumping this - what remains to merge? |
nalimilan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, we kind of forgot about this. We just need to fix the discussion about the devresid return type.
| return 2 * (xlogy(y, y / μ) - xlogy(y + 1, (y + 1) / (μ + 1))) | ||
| end | ||
| devresid(::InverseGaussian, y, μ::Real) = abs2(y - μ) / (y * abs2(μ)) | ||
| function devresid(d::NegativeBinomial, y, μ::Real) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you adapt this method to use the same pattern for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed oftype(η, NaN) to convert(float(typeof(η)), NaN) wherever possible in inverselink functions since GLM.inverselink(GLM.IdentityLink(), 0) is throwing an error.
Also, the devresid function for Negative Binomial is defined as how it is defined for Geometric.
Thanks for your suggestion. The suggestion has been tested and committed. Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
… possible since `GLM.inverselink(GLM.IdentityLink(), 0)` is throwing an error.
| function inverselink(::InverseLink, η::Real) | ||
| μ = inv(η) | ||
| return μ, -abs2(μ), oftype(μ, NaN) | ||
| return μ, -abs2(μ), convert(float(typeof(μ)), NaN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are likely not needed as μ = inv(η) will (almost?) always be a float. Though I guess that doesn't hurt much.
|
Thanks! |
The following is the summary of the changes in the following source files:
LogLinkas canonical link function forGeometricdistributionglmvarfunction forGeometricdistributionmustartfunction forGeometricdistributiondevresidfunction forGeometricdistributionloglik_obsfunction forGeometricdistributionglmvarin docstringmustartin docstringGeometricdistribution withLogLinkGeometricdistribution withInverseLink