Quantile Regression #719

david-waterworth · 2019-03-07T05:09:48Z

I'd like to investigate quantile regression, i.e. Qalpha[y|X] instead of E[y|X]

sklearn's GradientBoostingRegressor has quantile as a loss function. For each loss function there's a class which implements __call__(), negative_gradient() and update_terminal_regions() which implements the loss function, it's first derivative and the associated leaf predictor repectively (i.e. L2 loss has uses mean() for the predictor, L1 median() and quantile uses the percentile(alpha) across the leaf targets.

I've looked at the custom loss examples in catboost, from what I see they require first and second derivative functions but I don't see a way to replace the leaf prediction, is this possible (ideally initially as python, then if the result in promising I'll attempt a C++ implementation)

Also in general, I don't see anything in the documentation indicating which leaf prediction function is used for different loss functions? Are you using mean for RMSE and median for MAE?

Also I just found this #37 which implies you've implementation quantile loss even if it's not exposed via the api? Is that the case? Is there an easy way I can change alpha from 0.5 to 0.95 say?

The text was updated successfully, but these errors were encountered:

ian-contiamo · 2019-03-07T08:55:00Z

I can only answer your last question. You can specify the quantile loss as follows:

model = catboost.CatBoostRegressor(loss_function='Quantile:alpha=0.95', ...)

david-waterworth · 2019-03-08T00:08:49Z

Thanks @ian-contiamo, when I fitted GradientBoostingRegressor(loss='quantile', alpha=0.95,...) to the residuals of my base CatBoost model I got a prediction which looks quite reasonable as an estimate of 95th percentile. When I used CatBoostRegressor(loss_function='Quantile:alpha=0.95', ...) the prediction appears to be the mean as it was close to zero so it doesn't appear to be adjusting the predictor function to be consistent with the loss which is odd. Hopefully @annaveronika can shed some light?

david-waterworth · 2019-03-12T02:37:42Z

OK I think I've got to the bottom of this - quantile regression does work, but it converges very slowly if at all.

It's likely related to microsoft/LightGBM#1199, there's a good description here.

I'm not 100% sure, but if the leaf values are approximated by L'(X,y) / L''(X,y) then it's no surprise that it doesn't work so well for the quantile loss with high / low alpha due to the zero 2nd derivative.

Specifically I fitted the model

X = np.array([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2]).reshape(-1, 1)
y = np.array([1,2,3,4,5,6,7,8,9,10,21,22,23,24,25,26,27,28,29,30])

i.e. there's two clusters labeled 1 and 2, with target mean 5.5 and 25.5, and I attempted to fit using Quantile:alpha=0.95, I set learning_rate, max_depth to 1 and experimented with iterations.

Both lightgbm and sklearn's GradientBoostingRegressor converge in a single iteration to a solution y=8|X=1 y=28|X=2 (GradientBoostingRegressor) or y=8.5|X=1 y=28.5|X=2 (lightgbm)

catboost takes around 100 iterations to converge. The table below shows that after the first iteration all the leaf estimates are the same, and there's a large error (it appears that there's no split, I think it has split at the correct place though - I tried to verify using the standalone_evaluator but I couldn't easily debug the stl containers using vscode). At each step the estimates improve very slowly. I tried to set --leaf-estimation-method Newton but this isn't supported. Are there other parameters I can try?

y_pred[n], e[n], L[n] represent the predictions, error and loss function at iteration n

Quantile:alpha=0.75
X	y	y_pred[1]	e[1]	L[1]	y_pred[10]	e[10]	L[10]	y_pred[100]	E[100]	L[10]
1	1	0.58	0.42	0.32	4.31	-3.31	0.83	8.00	-7.00	1.75
1	2	0.58	1.42	1.07	4.31	-2.31	0.58	8.00	-6.00	1.50
1	3	0.58	2.42	1.82	4.31	-1.31	0.33	8.00	-5.00	1.25
1	4	0.58	3.42	2.57	4.31	-0.31	0.08	8.00	-4.00	1.00
1	5	0.58	4.42	3.32	4.31	0.69	0.52	8.00	-3.00	0.75
1	6	0.58	5.42	4.07	4.31	1.69	1.27	8.00	-2.00	0.50
1	7	0.58	6.42	4.82	4.31	2.69	2.02	8.00	-1.00	0.25
1	8	0.58	7.42	5.57	4.31	3.69	2.77	8.00	0.00	0.00
1	9	0.58	8.42	6.32	4.31	4.69	3.52	8.00	1.00	0.75
1	10	0.58	9.42	7.07	4.31	5.69	4.27	8.00	2.00	1.50
2	21	0.58	20.42	15.32	5.77	15.23	11.42	28.00	-7.00	1.75
2	22	0.58	21.42	16.07	5.77	16.23	12.17	28.00	-6.00	1.50
2	23	0.58	22.42	16.82	5.77	17.23	12.92	28.00	-5.00	1.25
2	24	0.58	23.42	17.57	5.77	18.23	13.67	28.00	-4.00	1.00
2	25	0.58	24.42	18.32	5.77	19.23	14.42	28.00	-3.00	0.75
2	26	0.58	25.42	19.07	5.77	20.23	15.17	28.00	-2.00	0.50
2	27	0.58	26.42	19.82	5.77	21.23	15.92	28.00	-1.00	0.25
2	28	0.58	27.42	20.57	5.77	22.23	16.67	28.00	0.00	0.00
2	29	0.58	28.42	21.32	5.77	23.23	17.42	28.00	1.00	0.75
2	30	0.58	29.42	22.07	5.77	24.23	18.17	28.00	2.00	1.50

sum				223.85			164.15			18.50
mean				11.19			8.21			0.93

The following is quoted from (http://jmarkhou.com/lgbqr/), it would be nice if catboost could implement the bolded bullet as an option --compute-true-loss for example. I suspect this is what update_terminal_regions() from sklearn does (it's a method of the loss function class)

during the tree-growing process we’re using a second-order approximate loss function instead of the true one
this approximate loss can be seen as a noised up version of the true loss
so, the best split under this approximate loss might not correspond to the best split under the true loss
however, the splitting procedure is greedy, so even with the true loss it’s not going to be globally optimal anyways
furthermore, we generally force the tree to select non-greedily-optimal splits by randomly restricting the set of observations or dimensions to split on, as a form of regularization
thus, using the approximate loss is effectively just another form of regularization during the tree growing process
we only optimize the true loss at the leaves, since leaves are the only places where the actual predicted values matter

david-waterworth · 2019-03-14T03:57:55Z

I'm (slowly) starting to understand what's happening here. In this case, since the loss is quantile, Newton's method isn't used to update the leaves so the comment above doesn't really apply, Gradient method is used.

What's actually happening is the initial approx is zero. This means the derivs for every document at iteration 1 are equal to alpha (0.75) due to symmetry of the quantile loss function. The delta approximation is then (10 * alpha) / (10+3) where 10 is the number of values per leaf (since it split down the middle) and 3 is the L2 normalisation. So after a single iteration both leaves have value of 0.58

After a few iterations, we start to get -ve errors for some of the leaves and then the derivatives start to diverge.

The issue seems to be the poor initial approximation. I might experiment with starting from the mean to see if that provides more direction.

I also experimented with --leaf-estimation-iterations but values other than 1 aren't allowed for regression. I think maybe implementing this would help as well.

And finally I still think having the option to optimise the true loss for the leaves may be the best option?

annaveronika · 2019-10-11T13:59:41Z

We have implemented Exact method for Quantile regression and also starting from best constant approximation for this mode.
Exact method is available on CPU only for now, and is set as a default method.
Starting from best approximation is off by default for now, but will be on in one of the next releases. For now please use boost_from_average=True flag for it (this name is used for compatability with other GBDT libraries).

And thank you very much for your input, we really appreciate it!

annaveronika · 2019-10-11T14:00:18Z

Closing this issue, keep updated, we will post in in release notices, when boost_from_average is True by default.

david-waterworth · 2019-10-18T02:28:37Z

Thanks @annaveronika Note if you set leaf_estimation_method=Exact and task_type=GPU there's no warning, it (I guess) quietly falls back to Newton.

annaveronika · 2019-10-18T08:15:02Z

Thanks @annaveronika Note if you set leaf_estimation_method=Exact and task_type=GPU there's no warning, it (I guess) quietly falls back to Newton.

Thanks for the report! We'll add an exception for this case.

Sandy4321 · 2023-09-06T14:30:06Z

any updates, is catboos better now for quantile regression than lightgbm ?
microsoft/LightGBM#1199

microsoft/LightGBM#1182
I think now with the new updates that fix this issue, LightGBM is the fastest, quantiles-supporting boosted decision tree implementation available. Pretty exciting! I'm getting about 20x speedup with similar performance over sklearn in quantile workloads! Great work.

david-waterworth changed the title ~~Custom loss and prediction function~~ Quantile Regression Mar 12, 2019

david-waterworth mentioned this issue Mar 14, 2019

Disable LeavesEstimationIterations check #723

Closed

annaveronika closed this as completed Oct 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantile Regression #719

Quantile Regression #719

david-waterworth commented Mar 7, 2019 •

edited

ian-contiamo commented Mar 7, 2019

david-waterworth commented Mar 8, 2019 •

edited

david-waterworth commented Mar 12, 2019 •

edited

david-waterworth commented Mar 14, 2019 •

edited

annaveronika commented Oct 11, 2019

annaveronika commented Oct 11, 2019 •

edited

david-waterworth commented Oct 18, 2019

annaveronika commented Oct 18, 2019

Sandy4321 commented Sep 6, 2023

Quantile Regression #719

Quantile Regression #719

Comments

david-waterworth commented Mar 7, 2019 • edited

ian-contiamo commented Mar 7, 2019

david-waterworth commented Mar 8, 2019 • edited

david-waterworth commented Mar 12, 2019 • edited

david-waterworth commented Mar 14, 2019 • edited

annaveronika commented Oct 11, 2019

annaveronika commented Oct 11, 2019 • edited

david-waterworth commented Oct 18, 2019

annaveronika commented Oct 18, 2019

Sandy4321 commented Sep 6, 2023

david-waterworth commented Mar 7, 2019 •

edited

david-waterworth commented Mar 8, 2019 •

edited

david-waterworth commented Mar 12, 2019 •

edited

david-waterworth commented Mar 14, 2019 •

edited

annaveronika commented Oct 11, 2019 •

edited