Example how to use catboost with the time series data #53

alexzaporozhets · 2017-07-23T07:03:16Z

Hi,

In the introduction/promo video (https://www.youtube.com/watch?v=00BMdlwKKXI) you have mentioned that Catboost can analyse the time series historical data for weather forecasts.

But I was not able to find anything like this in tutorials: https://github.com/catboost/catboost/tree/master/catboost/tutorials

annaveronika · 2017-07-23T10:04:23Z

We don't have any specific support for time series in catboost, so you need to find your own way to prepare data to use it in gradient boosting.

polya20 · 2017-11-30T14:15:07Z

shouldn't the video be changed, as its misleading?

annaveronika · 2017-11-30T14:23:39Z

No, it's a very common thing to use gradient boosting for time series. The way you are using it is up to your task.

JustM57 · 2017-12-12T17:33:55Z

The whole idea is about you must prepare data for boosting. The idea of tree-based methods is that you do cuts, in order to get maximum entropy. Boosting doesn't make any linear equations to the data. So if you have a training parameter which value is in [9, 11] boosting may do some cuts. But as soon as you check it on valid set, where this feature sails between [11, 13] - previous cuts doesn't work at all and we might get 0.5 prediction accuracy. Okay, you need to normalize the data, but in my opinion ordinary sklearn MinMax or Standard Scalers just reshape the data, so kind of shifting transformation may help. So we get deltas, which are way better for classification problems... Moreover we can try normalize these deltas by dividing it by value of an original feature.
My transformation looks like: (df.feature - df.feature.shift(-1))/df.feature
@annaveronika am I right?

rpcoelho17 · 2019-04-19T13:07:19Z

I'm wondering how to use the "has_time=True" how do I specify which column is my Date column to Catboost? I saw that there is a way to build a "Data format description" file but how do you pass this to the algorythm and does it use to improve the results? Could you give an example of how to implement this when one of your columns is a pandas Date type and has_time=True? (I also already have columns that explodes the dates into it's components).

alexzaporozhets changed the title ~~Example how to use catboost the time series data~~ Example how to use catboost with the time series data Jul 23, 2017

annaveronika closed this as completed Aug 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example how to use catboost with the time series data #53

Example how to use catboost with the time series data #53

alexzaporozhets commented Jul 23, 2017

annaveronika commented Jul 23, 2017

polya20 commented Nov 30, 2017

annaveronika commented Nov 30, 2017

JustM57 commented Dec 12, 2017

rpcoelho17 commented Apr 19, 2019 •

edited

Example how to use catboost with the time series data #53

Example how to use catboost with the time series data #53

Comments

alexzaporozhets commented Jul 23, 2017

annaveronika commented Jul 23, 2017

polya20 commented Nov 30, 2017

annaveronika commented Nov 30, 2017

JustM57 commented Dec 12, 2017

rpcoelho17 commented Apr 19, 2019 • edited

rpcoelho17 commented Apr 19, 2019 •

edited