[Minor] In the objective function, drop terms that are not dependent on the parameters. #169

tbenthompson · 2020-06-04T16:24:32Z

[EDIT] See the conversation below.

Currently, in _eta_mu_deviance, we compute the deviance and then later multiply by 0.5 and add L1 and L2 penalty terms to compute an objective function value. This isn't actually strictly speaking the objective function value, but it should differ only by a constant dependent on y. Computing the deviance is more complicated for most distribution/link function pairs than computing the log-likelihood. For example, for Poisson, the LL is:

y[i] * eta[i] - mu[i]

whereas the deviance as currently implemented is:

        if y[i] == 0:
            unit_deviance = 2 * (-y[i] + mu_out[i])
        else:
            unit_deviance = 2 * ((y[i] * (log(y[i]) - eta_out[i] - 1)) + mu_out[i])

Since we don't actually need a deviance, we should compute the log-likelihood.

The text was updated successfully, but these errors were encountered:

lbittarello · 2020-06-04T16:37:13Z

On the other hand, the Tweedie LL does not have a closed-form solution (Dunn and Smyth, 2005). Approximating the series is rather expensive.

tbenthompson · 2020-06-04T16:40:07Z

Oh interesting. So, the deviance has a closed form, but the LL does not? Currently the deviance is implemented as:

    def unit_deviance(self, y, mu):
        p = self.power
........
            # return 2 * (np.maximum(y,0)**(2-p)/((1-p)*(2-p))
            #    - y*mu**(1-p)/(1-p) + mu**(2-p)/(2-p))
            return 2 * (
                np.power(np.maximum(y, 0), 2 - p) / ((1 - p) * (2 - p))
                - y * np.power(mu, 1 - p) / (1 - p)
                + np.power(mu, 2 - p) / (2 - p)
            )

lbittarello · 2020-06-04T16:40:53Z

Exactly, the normalization term in the Tweedie LL does not have a closed form. The deviance gets rid of it.

tbenthompson · 2020-06-04T16:43:03Z

The only place we actually use the deviance/LL is in the line search where we determine whether the step size is safe or not using a backtracking line search (https://en.wikipedia.org/wiki/Backtracking_line_search). In that setting, we're always subtracting one objective value from another.

In that sense, any constant offset in the objective function is fine. So, the "ugly" but performant solution here might to use deviance when it's convenient and log-likelihood when that is convenient and be careful to include some comments to make sure it's clear what the heck is going on.

@lbittarello what do you think of that?

lbittarello · 2020-06-04T16:46:54Z

Fine by me. Just wanted to warn you of the monsters ahead. :)

By the way, I am also not sure if the Gamma LL is less expensive than the deviance. We currently have:

def _gamma_unit_deviance(power, dispersion, y, mu):
    return 2 * (np.log(mu) - np.log(y) + y / mu - 1)


def _gamma_unit_loglikelihood(power, dispersion, y, mu):
    log_y = np.log(y)
    normalization = (
        (log_y - np.log(dispersion)) / dispersion - log_y - loggamma(1 / dispersion)
    )
    return normalization - y / (dispersion * mu) - np.log(mu) / dispersion

LightGBM uses something else, which we call raw log loss for lack of a better name:

def _gamma_unit_raw_logloss(power, dispersion, y, mu):
    return y / mu + np.log(mu)

lbittarello · 2020-06-04T16:49:21Z

what the heck

what the ~~heck~~ hack

tbenthompson · 2020-06-04T16:58:07Z

You're point about Gamma is great. Beyond just deviance/log-likelihood, we can just drop anything that's not dependent on the parameters like LightGBM is doing. That's even better. And we can just call it a "raw_logloss". Thanks!

tbenthompson · 2020-06-04T17:00:08Z

what the heck

what the ~~heck~~ hack

Luca, haxor extraordinaire.

tbenthompson · 2020-06-04T18:07:17Z

Did this for Poisson in #170

I'm going to close this issue and fold it into #151 since the work is very similar.

tbenthompson changed the title ~~Use log-likelihood instead of deviance in the line search.~~ [Minor] Use log-likelihood instead of deviance in the line search. Jun 4, 2020

tbenthompson changed the title ~~[Minor] Use log-likelihood instead of deviance in the line search.~~ [Minor] In the objective function, drop terms that are not dependent on the parameters. Jun 4, 2020

tbenthompson mentioned this issue Jun 4, 2020

Simplify gradient/hessian/deviance for Poisson. #170

Merged

tbenthompson mentioned this issue Jun 4, 2020

[Major] Implement custom objective, gradient, hessian computations. #151

Closed

tbenthompson closed this as completed Jun 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Minor] In the objective function, drop terms that are not dependent on the parameters. #169

[Minor] In the objective function, drop terms that are not dependent on the parameters. #169

tbenthompson commented Jun 4, 2020 •

edited

lbittarello commented Jun 4, 2020

tbenthompson commented Jun 4, 2020

lbittarello commented Jun 4, 2020

tbenthompson commented Jun 4, 2020

lbittarello commented Jun 4, 2020

lbittarello commented Jun 4, 2020

tbenthompson commented Jun 4, 2020

tbenthompson commented Jun 4, 2020

tbenthompson commented Jun 4, 2020

[Minor] In the objective function, drop terms that are not dependent on the parameters. #169

[Minor] In the objective function, drop terms that are not dependent on the parameters. #169

Comments

tbenthompson commented Jun 4, 2020 • edited

lbittarello commented Jun 4, 2020

tbenthompson commented Jun 4, 2020

lbittarello commented Jun 4, 2020

tbenthompson commented Jun 4, 2020

lbittarello commented Jun 4, 2020

lbittarello commented Jun 4, 2020

tbenthompson commented Jun 4, 2020

tbenthompson commented Jun 4, 2020

tbenthompson commented Jun 4, 2020

tbenthompson commented Jun 4, 2020 •

edited