Is sklearn_fork handling tweedie correctly? #67

ElizabethSantorellaQC · 2020-04-22T23:53:53Z

if isinstance(self._family_instance, TweedieDistribution):
    if self._family_instance.power <= 0:
        self._link_instance = IdentityLink()
    if self._family_instance.power >= 1:
        self._link_instance = LogLink()

This looks wrong to me, based on this: https://stats.stackexchange.com/questions/137227/what-is-the-canonical-link-function-for-a-tweedie-glm
power == 1 should be the special case of the log link, and otherwise there should be something like a TweedieLink equal to mu ** (1 - p) / (1 - p), which doesn't currently exist.

@MarcAntoineSchmidtQC : Could you add a test for whether we're getting the right answer in the tweedie-1.5 benchmark? (Regarding Issue #43 )

The text was updated successfully, but these errors were encountered:

lbittarello · 2020-04-23T07:28:00Z

The fork follows actuarial practice:

Identity for the normal.
Logit for the binomial.
Log for everything else.

The documentation mentions the use of the log for the Gamma and the Inverse Gaussian, but it left the Tweedie out, which we should presumably fix. Actuaries never use the canonical link function for the Tweedie or the Gamma, since they aren't terribly interpretable. I am fine with keeping current behavior.

ElizabethSantorellaQC · 2020-04-23T13:52:06Z

@lbittarello Thanks for clarifying. The sklearn-fork logic still doesn't seem to follow the actuarial convention. Is this correct?

if power == 0:
    link = IdentityLink()
elif 0 < power < 1:
    error, no distribution exists (this currently happens downstream)
else:
    link = LogLink()

lbittarello · 2020-04-23T14:17:09Z

Are you referring to the error if power is between zero and one? That is correct: the Tweedie family is not defined in that interval (i.e. no distribution exists).

lbittarello · 2020-04-23T14:18:41Z

And power == 0 is just the Normal, as you probably know.

ElizabethSantorellaQC · 2020-04-23T14:19:00Z

What I find odd about the sklearn-fork code is that the link is identity if power < 0; based on your comment it sounds like it should be log

lbittarello · 2020-04-23T14:19:54Z

To be honest, there is no actuarial practice for power < 0 because actuaries never use it. 😂 Sorry, I may have been misleading there.

ElizabethSantorellaQC · 2020-04-23T19:08:28Z

Conclusion: It's fine. This will be further clarified in the code by #68

ElizabethSantorellaQC added bug Something isn't working question Further information is requested labels Apr 23, 2020

Quantco deleted a comment from ElizabethSantorellaQC Apr 23, 2020

ElizabethSantorellaQC closed this as completed Apr 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is sklearn_fork handling tweedie correctly? #67

Is sklearn_fork handling tweedie correctly? #67

ElizabethSantorellaQC commented Apr 22, 2020

lbittarello commented Apr 23, 2020

ElizabethSantorellaQC commented Apr 23, 2020

lbittarello commented Apr 23, 2020

lbittarello commented Apr 23, 2020

ElizabethSantorellaQC commented Apr 23, 2020

lbittarello commented Apr 23, 2020

ElizabethSantorellaQC commented Apr 23, 2020

Is sklearn_fork handling tweedie correctly? #67

Is sklearn_fork handling tweedie correctly? #67

Comments

ElizabethSantorellaQC commented Apr 22, 2020

lbittarello commented Apr 23, 2020

ElizabethSantorellaQC commented Apr 23, 2020

lbittarello commented Apr 23, 2020

lbittarello commented Apr 23, 2020

ElizabethSantorellaQC commented Apr 23, 2020

lbittarello commented Apr 23, 2020

ElizabethSantorellaQC commented Apr 23, 2020