Question: is it possible to use early stopping of Lightbm, Catboost? #251

benitocm · 2022-10-10T05:51:39Z

Hi,

i am using gbt algoritms as base regressor for the forecaster. I am interested in using the early stopping feature of those kind algos. Is it possible?

in the case of HistogramGradientboosting i think is easier becuse the early stopping is configures differently.

Thank you in advance

JoaquinAmatRodrigo · 2022-10-10T08:33:22Z

Hi @benitocm,
Early stopping should not be activated when using a GBT (or any other regressor) inside a Forecaster unless it uses a validation strategy that keeps the time order of the samples. In the case of HistGradientBoostingClassifier and LightGBM they seem to use a random sample, therefore it is not valid for tiem series.

benitocm · 2022-10-10T09:18:39Z

Hi,

When I have done from scratch, i have used timesSeriesSplit from sckilearn to enforce time constraints. In the case of HistogramGradientboosting, in the cross_val_score with a CV of TimeSeriesSplit expecting that the time constraints hold.

Do you think those approaches are not correct? If so, please i would appreciate your inputs.

In the case of darts library, a validation series can be provided to the fit method. Would something like this the only way to use early stopping?

Thanks very much for your time

Something related to this unit8co/darts#1154

JoaquinAmatRodrigo · 2022-10-11T07:29:25Z

If the time constraints holds, I think there is no problem using it. Could you add a small example so we can double check if the approach is correct?

benitocm · 2022-10-19T16:41:10Z

Hi,

In the case of HistogramGradientBoosting, I am using the cross_val_score method and I am assuming that scikitlearn is not doing any shuffling:

from sklearn.model_selection import TimeSeriesSplit

my_kfold = TimeSeriesSplit(n_splits=5, test_size=.2)

hgb_params= {
   'loss': 'squared_error',
   'scoring' : 'loss',
   'learning_rate' : 0.02 
   'verbose' : 0,
   'random_state': 42, 
   'max_iter' : 1000,
   'early_stopping' : True,     
   'validation_fraction' : .1,
   'n_iter_no_change' : 10
}

hgb_model = HistGradientBoostingRegressor(**hgb_params) 
neg_x_val_score = cross_val_score(hgb_model, x_base_df, y_base_df, scoring='neg_mean_squared_error', cv=my_kfold, n_jobs=5)

In the case of CatBoost, I am using the parameter has_time and doing the cross validation loop by myself:

 cv_results=defaultdict(list)
 for fold, (train_index, valid_index) in enumerate(my_kfold.split(x_df)):  
     x_train_df, x_valid_df = x_df.iloc[train_index,:], x_df.iloc[valid_index,:] 
     y_train_df, y_valid_df = y_df.iloc[train_index,:], y_df.iloc[valid_index,:]          
     cb_model = CatBoostRegressor(**cb_params)                  
     t1 = perf_counter()        
     if early_stopping_rounds != 0:
         _ = cb_model.fit( 
                 x_train_df, 
                 y_train_df, 
                 eval_set=(x_valid_df, y_valid_df), 
                 use_best_model=True, 
                 early_stopping_rounds=early_stopping_rounds,
                 callbacks=opt_callbacks     
         )            
     else:
         _ = cb_model.fit(x_train_df, y_train_df, callbacks=opt_callbacks )
         
     scores = calc_scores(cb_model, scoring, x_valid_df, y_valid_df)
     for k,v in scores.items():
         cv_results[k].append(v)
     
     cv_results['lr'].append(cb_model.learning_rate_)
     cv_results['n_trees'].append(cb_model.tree_count_)

benitocm · 2022-10-21T06:19:54Z

Hi again,

Maybe I made not myself clear. My goal of being able to use early stopping is taking advantage of it to guess a reasonable number of trees. In the case of Catboost (according to this), when you set up the number of trees, the algorithm itself selects a learning rate that would be very close to the optimal one. That makes easier that tuning (that is not the case with the other gbt algos)

Thanks very much

JavierEscobarOrtiz added the question Further information is requested label Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: is it possible to use early stopping of Lightbm, Catboost? #251

Question: is it possible to use early stopping of Lightbm, Catboost? #251

benitocm commented Oct 10, 2022

JoaquinAmatRodrigo commented Oct 10, 2022

benitocm commented Oct 10, 2022 •

edited

JoaquinAmatRodrigo commented Oct 11, 2022

benitocm commented Oct 19, 2022 •

edited

benitocm commented Oct 21, 2022

Question: is it possible to use early stopping of Lightbm, Catboost? #251

Question: is it possible to use early stopping of Lightbm, Catboost? #251

Comments

benitocm commented Oct 10, 2022

JoaquinAmatRodrigo commented Oct 10, 2022

benitocm commented Oct 10, 2022 • edited

JoaquinAmatRodrigo commented Oct 11, 2022

benitocm commented Oct 19, 2022 • edited

benitocm commented Oct 21, 2022

benitocm commented Oct 10, 2022 •

edited

benitocm commented Oct 19, 2022 •

edited