Skip to content

Fix huber loss#178

Merged
guolinke merged 6 commits intolightgbm-org:masterfrom
henry0312:fix_huber_loss
Jan 9, 2017
Merged

Fix huber loss#178
guolinke merged 6 commits intolightgbm-org:masterfrom
henry0312:fix_huber_loss

Conversation

@henry0312
Copy link
Copy Markdown
Contributor

@henry0312 henry0312 commented Jan 9, 2017

TODO:

@msftclas
Copy link
Copy Markdown

msftclas commented Jan 9, 2017

Hi @henry0312, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!
You've already signed the contribution license agreement. Thanks!

The agreement was validated by Microsoft and real humans are currently evaluating your PR.

TTYL, MSBOT;

// deprecated
return new RegressionL2loss(config);
} else if (type == std::string("regression_l2") || type == std::string("mean_squared_error") || type == std::string("mse")) {
return new RegressionL2loss(config);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge these two if?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I don't understand what you mean 😢
This PR will be merged after #175

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change

if () 
...
else if()
...

to

if()
...

. They are all regression objective, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they are.

you mean,

  if (type == std::string("regression") || type == std::string("regression_l2") || type == std::string("mean_squared_error") || type == std::string("mse")) {
    return new RegressionL2loss(config);
  }

right?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. and you can break if() to two line:

  if (type == std::string("regression") || type == std::string("regression_l2") 
      || type == std::string("mean_squared_error") || type == std::string("mse")) {
    return new RegressionL2loss(config);
  }

Comment thread src/objective/regression_objective.hpp Outdated
const double a = 2.0 * weights_[i]; // difference of two first derivatives, (zero to inf) and (zero to -inf).
const double b = 0.0;
const double c = (std::fabs(score[i]) + std::fabs(label_[i])) / 1.0e3;
hessians[i] = std::exp(-(x - b) * (x - b) / 2.0 * c * c) / a;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use an inline function to calculate hessian and avoid these repeated code.

Comment thread src/objective/regression_objective.hpp Outdated
const double a = 2.0; // difference of two first derivatives, (zero to inf) and (zero to -inf)
const double b = 0.0;
const double c = (std::fabs(score[i]) + std::fabs(label_[i])) / 1.0e3;
hessians[i] = std::exp(-(x - b) * (x - b) / 2.0 * c * c) / a;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, and you can put this function into utils/common.h

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!

@henry0312
Copy link
Copy Markdown
Contributor Author

rebased master.
However, there may be something wrong with ApproximateHessianWithGaussian.
I'll check.

@Laurae2
Copy link
Copy Markdown
Contributor

Laurae2 commented Jan 9, 2017

@henry0312 This topic has a bunch of Gradient/Hessian functions for helping to converge for a L1 loss function when a hessian is mandatory for convergence.

They were all tested in xgboost and are all converging fast. Without a hessian, the convergence is slow (which seems to be the case in LightGBM also as @guolinke mentioned in the previous PR for L1 loss). It would be good to have an example to check the convergence speed of the L1 loss.

The short "explanation" is that when a hessian is mandatory for good convergence and we want to use a L1 loss function, we should avoid at any cost a constant hessian. To fix this issue, we smooth the linear function using a curve so the hessian is never zeroed out (better and faster convergence speed). For instance, the Fair objective (from a Microsoft research) is used to approximate such L1 function requiring a hessian.

The R versions with their gradient/hessian formulas are below:

ln_cosh_obj <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  grad <- tanh(preds-labels)
  hess <- 1-grad*grad
  return(list(grad = grad, hess = hess))
}

ln_expexp_obj <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  x <- preds-labels
  grad <-  (exp(2*x)-1) / (exp(2*x)+1)
  hess <-  (4*exp(2*x)) / (exp(2*x) + 1)^2 
  return(list(grad = grad, hess = hess))
}

cauchyobj <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  c <- 2  #the lower the "slower/smoother" the loss is
  x <-  preds-labels
  grad <- x / (x^2/c^2+1)
  hess <- -c^2*(x^2-c^2)/(x^2+c^2)^2
  return(list(grad = grad, hess = hess))
}

fairobj <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  c <- 2  #the lower the "slower/smoother" the loss is
  x <-  preds-labels
  grad <- c*x / (abs(x)+c)
  hess <- c^2 / (abs(x)+c)^2
  return(list(grad = grad, hess = hess))
}

@henry0312
Copy link
Copy Markdown
Contributor Author

@Laurae2 the hessian of mae is delta function, therefore, if we can approximate it with something appropriate (e.g. gaussian function), converge and performance will be good, I guess.

@henry0312 henry0312 changed the title Fix huber loss [WIP] Fix huber loss Jan 9, 2017
Comment thread include/LightGBM/utils/common.h Outdated
inline static double ApproximateHessianWithGaussian(double y, double t, double w=1.0f) {
inline static double ApproximateHessianWithGaussian(const double y, const double t, const double g, const double w=1.0f) {
const double diff = y - t;
const double pi = M_PI;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the define of M_PI ? I cannot compile it on windows.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add #include<cmath>?

Copy link
Copy Markdown
Collaborator

@guolinke guolinke Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably fixed with 05e97f1

@henry0312
Copy link
Copy Markdown
Contributor Author

I finished fixing ApproximateHessianWithGaussian (also Huber loss).
L1 loss is approximated with Gaussian function like http://www.wolframalpha.com/input/?i=Integrate%5BErf%5Bx%5D%5D,+Integrate%5BErf%5Bx%2F10%5D%5D.

@henry0312 henry0312 changed the title [WIP] Fix huber loss Fix huber loss Jan 9, 2017
@henry0312
Copy link
Copy Markdown
Contributor Author

henry0312 commented Jan 9, 2017

Next, I will create Fair loss in an another PR, which @Laurae2 tells me.

@henry0312 henry0312 mentioned this pull request Jan 9, 2017
1 task
@guolinke guolinke merged commit 27d3eb3 into lightgbm-org:master Jan 9, 2017
@henry0312 henry0312 deleted the fix_huber_loss branch January 9, 2017 16:15
@Laurae2 Laurae2 mentioned this pull request Apr 6, 2017
@lock lock Bot locked as resolved and limited conversation to collaborators Mar 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants