Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is sklearn_fork handling tweedie correctly? #67

Closed
ElizabethSantorellaQC opened this issue Apr 22, 2020 · 7 comments
Closed

Is sklearn_fork handling tweedie correctly? #67

ElizabethSantorellaQC opened this issue Apr 22, 2020 · 7 comments
Labels
bug Something isn't working question Further information is requested

Comments

@ElizabethSantorellaQC
Copy link
Contributor

if isinstance(self._family_instance, TweedieDistribution):
    if self._family_instance.power <= 0:
        self._link_instance = IdentityLink()
    if self._family_instance.power >= 1:
        self._link_instance = LogLink()

This looks wrong to me, based on this: https://stats.stackexchange.com/questions/137227/what-is-the-canonical-link-function-for-a-tweedie-glm
power == 1 should be the special case of the log link, and otherwise there should be something like a TweedieLink equal to mu ** (1 - p) / (1 - p), which doesn't currently exist.

@MarcAntoineSchmidtQC : Could you add a test for whether we're getting the right answer in the tweedie-1.5 benchmark? (Regarding Issue #43 )

@ElizabethSantorellaQC ElizabethSantorellaQC added bug Something isn't working question Further information is requested labels Apr 23, 2020
@lbittarello
Copy link
Member

The fork follows actuarial practice:

  • Identity for the normal.
  • Logit for the binomial.
  • Log for everything else.

The documentation mentions the use of the log for the Gamma and the Inverse Gaussian, but it left the Tweedie out, which we should presumably fix. Actuaries never use the canonical link function for the Tweedie or the Gamma, since they aren't terribly interpretable. I am fine with keeping current behavior.

@ElizabethSantorellaQC
Copy link
Contributor Author

@lbittarello Thanks for clarifying. The sklearn-fork logic still doesn't seem to follow the actuarial convention. Is this correct?

if power == 0:
    link = IdentityLink()
elif 0 < power < 1:
    error, no distribution exists (this currently happens downstream)
else:
    link = LogLink()

@Quantco Quantco deleted a comment from ElizabethSantorellaQC Apr 23, 2020
@lbittarello
Copy link
Member

Are you referring to the error if power is between zero and one? That is correct: the Tweedie family is not defined in that interval (i.e. no distribution exists).

@lbittarello
Copy link
Member

And power == 0 is just the Normal, as you probably know.

@ElizabethSantorellaQC
Copy link
Contributor Author

What I find odd about the sklearn-fork code is that the link is identity if power < 0; based on your comment it sounds like it should be log

@lbittarello
Copy link
Member

To be honest, there is no actuarial practice for power < 0 because actuaries never use it. 😂 Sorry, I may have been misleading there.

@ElizabethSantorellaQC
Copy link
Contributor Author

Conclusion: It's fine. This will be further clarified in the code by #68

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants