Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example how to use catboost with the time series data #53

Closed
alexzaporozhets opened this issue Jul 23, 2017 · 5 comments
Closed

Example how to use catboost with the time series data #53

alexzaporozhets opened this issue Jul 23, 2017 · 5 comments

Comments

@alexzaporozhets
Copy link

Hi,

In the introduction/promo video (https://www.youtube.com/watch?v=00BMdlwKKXI) you have mentioned that Catboost can analyse the time series historical data for weather forecasts.

But I was not able to find anything like this in tutorials: https://github.com/catboost/catboost/tree/master/catboost/tutorials

@alexzaporozhets alexzaporozhets changed the title Example how to use catboost the time series data Example how to use catboost with the time series data Jul 23, 2017
@annaveronika
Copy link
Contributor

We don't have any specific support for time series in catboost, so you need to find your own way to prepare data to use it in gradient boosting.

@polya20
Copy link

polya20 commented Nov 30, 2017

shouldn't the video be changed, as its misleading?

@annaveronika
Copy link
Contributor

No, it's a very common thing to use gradient boosting for time series. The way you are using it is up to your task.

@JustM57
Copy link

JustM57 commented Dec 12, 2017

The whole idea is about you must prepare data for boosting. The idea of tree-based methods is that you do cuts, in order to get maximum entropy. Boosting doesn't make any linear equations to the data. So if you have a training parameter which value is in [9, 11] boosting may do some cuts. But as soon as you check it on valid set, where this feature sails between [11, 13] - previous cuts doesn't work at all and we might get 0.5 prediction accuracy. Okay, you need to normalize the data, but in my opinion ordinary sklearn MinMax or Standard Scalers just reshape the data, so kind of shifting transformation may help. So we get deltas, which are way better for classification problems... Moreover we can try normalize these deltas by dividing it by value of an original feature.
My transformation looks like: (df.feature - df.feature.shift(-1))/df.feature
@annaveronika am I right?

@rpcoelho17
Copy link

rpcoelho17 commented Apr 19, 2019

I'm wondering how to use the "has_time=True" how do I specify which column is my Date column to Catboost? I saw that there is a way to build a "Data format description" file but how do you pass this to the algorythm and does it use to improve the results? Could you give an example of how to implement this when one of your columns is a pandas Date type and has_time=True? (I also already have columns that explodes the dates into it's components).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants