Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantile Regression #719

Closed
david-waterworth opened this issue Mar 7, 2019 · 9 comments
Closed

Quantile Regression #719

david-waterworth opened this issue Mar 7, 2019 · 9 comments

Comments

@david-waterworth
Copy link
Contributor

david-waterworth commented Mar 7, 2019

I'd like to investigate quantile regression, i.e. Qalpha[y|X] instead of E[y|X]

sklearn's GradientBoostingRegressor has quantile as a loss function. For each loss function there's a class which implements __call__(), negative_gradient() and update_terminal_regions() which implements the loss function, it's first derivative and the associated leaf predictor repectively (i.e. L2 loss has uses mean() for the predictor, L1 median() and quantile uses the percentile(alpha) across the leaf targets.

I've looked at the custom loss examples in catboost, from what I see they require first and second derivative functions but I don't see a way to replace the leaf prediction, is this possible (ideally initially as python, then if the result in promising I'll attempt a C++ implementation)

Also in general, I don't see anything in the documentation indicating which leaf prediction function is used for different loss functions? Are you using mean for RMSE and median for MAE?

Also I just found this #37 which implies you've implementation quantile loss even if it's not exposed via the api? Is that the case? Is there an easy way I can change alpha from 0.5 to 0.95 say?

@ian-contiamo
Copy link

I can only answer your last question. You can specify the quantile loss as follows:

model = catboost.CatBoostRegressor(loss_function='Quantile:alpha=0.95', ...)

@david-waterworth
Copy link
Contributor Author

david-waterworth commented Mar 8, 2019

Thanks @ian-contiamo, when I fitted GradientBoostingRegressor(loss='quantile', alpha=0.95,...) to the residuals of my base CatBoost model I got a prediction which looks quite reasonable as an estimate of 95th percentile. When I used CatBoostRegressor(loss_function='Quantile:alpha=0.95', ...) the prediction appears to be the mean as it was close to zero so it doesn't appear to be adjusting the predictor function to be consistent with the loss which is odd. Hopefully @annaveronika can shed some light?

@david-waterworth david-waterworth changed the title Custom loss and prediction function Quantile Regression Mar 12, 2019
@david-waterworth
Copy link
Contributor Author

david-waterworth commented Mar 12, 2019

OK I think I've got to the bottom of this - quantile regression does work, but it converges very slowly if at all.

It's likely related to microsoft/LightGBM#1199, there's a good description here.

I'm not 100% sure, but if the leaf values are approximated by L'(X,y) / L''(X,y) then it's no surprise that it doesn't work so well for the quantile loss with high / low alpha due to the zero 2nd derivative.

Specifically I fitted the model

X = np.array([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2]).reshape(-1, 1)
y = np.array([1,2,3,4,5,6,7,8,9,10,21,22,23,24,25,26,27,28,29,30])

i.e. there's two clusters labeled 1 and 2, with target mean 5.5 and 25.5, and I attempted to fit using Quantile:alpha=0.95, I set learning_rate, max_depth to 1 and experimented with iterations.

Both lightgbm and sklearn's GradientBoostingRegressor converge in a single iteration to a solution y=8|X=1 y=28|X=2 (GradientBoostingRegressor) or y=8.5|X=1 y=28.5|X=2 (lightgbm)

catboost takes around 100 iterations to converge. The table below shows that after the first iteration all the leaf estimates are the same, and there's a large error (it appears that there's no split, I think it has split at the correct place though - I tried to verify using the standalone_evaluator but I couldn't easily debug the stl containers using vscode). At each step the estimates improve very slowly. I tried to set --leaf-estimation-method Newton but this isn't supported. Are there other parameters I can try?

y_pred[n], e[n], L[n] represent the predictions, error and loss function at iteration n

Quantile:alpha=0.75                    
X y y_pred[1] e[1] L[1] y_pred[10] e[10] L[10] y_pred[100] E[100] L[10]
1 1 0.58 0.42 0.32 4.31 -3.31 0.83 8.00 -7.00 1.75
1 2 0.58 1.42 1.07 4.31 -2.31 0.58 8.00 -6.00 1.50
1 3 0.58 2.42 1.82 4.31 -1.31 0.33 8.00 -5.00 1.25
1 4 0.58 3.42 2.57 4.31 -0.31 0.08 8.00 -4.00 1.00
1 5 0.58 4.42 3.32 4.31 0.69 0.52 8.00 -3.00 0.75
1 6 0.58 5.42 4.07 4.31 1.69 1.27 8.00 -2.00 0.50
1 7 0.58 6.42 4.82 4.31 2.69 2.02 8.00 -1.00 0.25
1 8 0.58 7.42 5.57 4.31 3.69 2.77 8.00 0.00 0.00
1 9 0.58 8.42 6.32 4.31 4.69 3.52 8.00 1.00 0.75
1 10 0.58 9.42 7.07 4.31 5.69 4.27 8.00 2.00 1.50
2 21 0.58 20.42 15.32 5.77 15.23 11.42 28.00 -7.00 1.75
2 22 0.58 21.42 16.07 5.77 16.23 12.17 28.00 -6.00 1.50
2 23 0.58 22.42 16.82 5.77 17.23 12.92 28.00 -5.00 1.25
2 24 0.58 23.42 17.57 5.77 18.23 13.67 28.00 -4.00 1.00
2 25 0.58 24.42 18.32 5.77 19.23 14.42 28.00 -3.00 0.75
2 26 0.58 25.42 19.07 5.77 20.23 15.17 28.00 -2.00 0.50
2 27 0.58 26.42 19.82 5.77 21.23 15.92 28.00 -1.00 0.25
2 28 0.58 27.42 20.57 5.77 22.23 16.67 28.00 0.00 0.00
2 29 0.58 28.42 21.32 5.77 23.23 17.42 28.00 1.00 0.75
2 30 0.58 29.42 22.07 5.77 24.23 18.17 28.00 2.00 1.50
                     
sum       223.85     164.15     18.50
mean       11.19     8.21     0.93

The following is quoted from (http://jmarkhou.com/lgbqr/), it would be nice if catboost could implement the bolded bullet as an option --compute-true-loss for example. I suspect this is what update_terminal_regions() from sklearn does (it's a method of the loss function class)

  • during the tree-growing process we’re using a second-order approximate loss function instead of the true one
  • this approximate loss can be seen as a noised up version of the true loss
    so, the best split under this approximate loss might not correspond to the best split under the true loss
    however, the splitting procedure is greedy, so even with the true loss it’s not going to be globally optimal anyways
  • furthermore, we generally force the tree to select non-greedily-optimal splits by randomly restricting the set of observations or dimensions to split on, as a form of regularization
    thus, using the approximate loss is effectively just another form of regularization during the tree growing process
  • we only optimize the true loss at the leaves, since leaves are the only places where the actual predicted values matter

@david-waterworth
Copy link
Contributor Author

david-waterworth commented Mar 14, 2019

I'm (slowly) starting to understand what's happening here. In this case, since the loss is quantile, Newton's method isn't used to update the leaves so the comment above doesn't really apply, Gradient method is used.

What's actually happening is the initial approx is zero. This means the derivs for every document at iteration 1 are equal to alpha (0.75) due to symmetry of the quantile loss function. The delta approximation is then (10 * alpha) / (10+3) where 10 is the number of values per leaf (since it split down the middle) and 3 is the L2 normalisation. So after a single iteration both leaves have value of 0.58

After a few iterations, we start to get -ve errors for some of the leaves and then the derivatives start to diverge.

The issue seems to be the poor initial approximation. I might experiment with starting from the mean to see if that provides more direction.

I also experimented with --leaf-estimation-iterations but values other than 1 aren't allowed for regression. I think maybe implementing this would help as well.

And finally I still think having the option to optimise the true loss for the leaves may be the best option?

@annaveronika
Copy link
Contributor

We have implemented Exact method for Quantile regression and also starting from best constant approximation for this mode.
Exact method is available on CPU only for now, and is set as a default method.
Starting from best approximation is off by default for now, but will be on in one of the next releases. For now please use boost_from_average=True flag for it (this name is used for compatability with other GBDT libraries).

And thank you very much for your input, we really appreciate it!

@annaveronika
Copy link
Contributor

annaveronika commented Oct 11, 2019

Closing this issue, keep updated, we will post in in release notices, when boost_from_average is True by default.

@david-waterworth
Copy link
Contributor Author

Thanks @annaveronika Note if you set leaf_estimation_method=Exact and task_type=GPU there's no warning, it (I guess) quietly falls back to Newton.

@annaveronika
Copy link
Contributor

Thanks @annaveronika Note if you set leaf_estimation_method=Exact and task_type=GPU there's no warning, it (I guess) quietly falls back to Newton.

Thanks for the report! We'll add an exception for this case.

@Sandy4321
Copy link

any updates, is catboos better now for quantile regression than lightgbm ?
microsoft/LightGBM#1199

microsoft/LightGBM#1182
I think now with the new updates that fix this issue, LightGBM is the fastest, quantiles-supporting boosted decision tree implementation available. Pretty exciting! I'm getting about 20x speedup with similar performance over sklearn in quantile workloads! Great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants