Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion for Catboost Tutorial #1117

Open
dennislamcv1 opened this issue Dec 20, 2019 · 8 comments
Open

Suggestion for Catboost Tutorial #1117

dennislamcv1 opened this issue Dec 20, 2019 · 8 comments

Comments

@dennislamcv1
Copy link

Problem: Request for a Catboost Tutorial for Regression problems
catboost version: Any version
Operating System: WIndows
CPU: i7

GPU: None

Hi Yandex, I am currently learning how to use Catboost for ML projects. Would love to have a tutorial on Regression problems using real data set consists of mixture of categorical and numerical features.

Please do not use those generic datasets like Boston Housing et al. You can select one proper dataset from Kaggle et al as an example.

The tutorial must geared for first time users hence comments and guidance in Jupyter Notebook will be helpful.

Thanks much.

@annaveronika
Copy link
Contributor

This is a great idea for contributions!

@dvddn
Copy link

dvddn commented Feb 19, 2020

Is this still relevant?
I'd love to contribute.

@annaveronika
Copy link
Contributor

Yes, tutorials are always welcome! We have tutorials classification problems, but I think we don't have one for regression. So please, contribute!

@dennislamcv1
Copy link
Author

@dennislamcv1
Copy link
Author

How to upload Jupyter notebooks in this thread?

@annaveronika
Copy link
Contributor

You could make a tutorial with Boston out of this battle for regression.
I looked briefly on the tutorial, it needs some fixes:

'metric_period':200,
'od_type':"Iter",
'od_wait':20,

This is a weird combination. This means you only calculate metrics on every 200 iteration, you probably wanted to print it out overy 200 iteration instead. To do that use verbose=200.

Actually you can see that this is not something you shouled use, because there is a warining:
"Warning: Overfitting detector is active, thus evaluation metric is calculated
on every iteration. 'metric_period' is ignored for evaluation metric."

'loss_function':'RMSE',
'eval_metric':'RMSE',

If loss_function is RMSE, eval_metric is RMSE by default, you don't need to set it.

  1. 'learning_rate':0.001,
    'depth':3

I'm not sure why this combination is used, I would suggest to first train with default parameters, then look if there is overfitting or underfitting and adjust learning_rate accordingly.
After that you can try changing other parameters and see if it improves the quality.

And for regression it might be useful to experiment with CTR settings, for example set TargetBorderCount to 2 or 3 instead of 1.

@dennislamcv1
Copy link
Author

OK noted with thanks. That's what the instructor passed me this file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants