Skip to content

Benchmarks for model builders from XGBoost and LightGBM models #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Oct 9, 2020

Conversation

RukhovichIV
Copy link

No description provided.

@RukhovichIV
Copy link
Author

RukhovichIV commented Oct 2, 2020

@PetrovKP, @Alexsandruss, @ShvetsKS,
Added LightGBM and XGBoost benchmarks as a new "lib".
I tried to use common functions from other libs (e. g. from ./xgboost/bench.py), but they're unavailable from running scope, so I made a new file (./modelbuilders/bench.py), which consists of copied, slightly shortened, minimum required functions from ./xgboost/bench.py. I only added my get_accuracy() function, because it's much shorter, than what I found there.
I also pushed a MR with new configs to our rep. It works great together.
I also added single-precision-histogram and enable-experimental-json-serialization parameters to xgboost benchmarks

.gitignore Outdated
@@ -11,3 +11,4 @@ __work*
# Datasets
dataset
*.csv
*.npy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add EOF

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,509 @@
import argparse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

'lgbm_predict', 'lgbm_to_daal', 'daal_compute'],
times=[t_creat_train, t_creat_test, t_train, t_lgbm_pred, t_trans, t_daal_pred],
accuracy_type=metric_name, accuracies=[0, 0, train_metric, test_metric_xgb, 0, test_metric_daal],
data=[X_train, X_test, X_train, X_test, X_train, X_test])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eof

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added too

import json


def columnwise_score(y, yp, score_func):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bench.py file should be similar in all folder (sklearn, daal4py, etc.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Added same bench.py as in other folders.
I also added the utils.py file with function, that I use in both my benchmarks

'lgbm_predict', 'lgbm_to_daal', 'daal_compute'],
times=[t_creat_train, t_creat_test, t_train, t_lgbm_pred, t_trans, t_daal_pred],
accuracy_type=metric_name, accuracies=[0, 0, train_metric, test_metric_xgb, 0, test_metric_daal],
data=[X_train, X_test, X_train, X_test, X_train, X_test])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add newline

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done it


print_output(library='modelbuilders', algorithm=f'xgboost_{task}_and_modelbuilder',
stages=['xgb_train_dmatrix_create', 'xgb_test_dmatrix_create', 'xgb_training', 'xgb_prediction',
'xgb_to_daal_conv', 'daal_prediction'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use flake8 to correct formatting:
pip/conda install flake8
flake8 <folder or file names>
You can ignore 'too long line' if line is complicated

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done it for every file, which I added / changed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also used autopep8 formatting, so every line is less or equal 100 characters now

help='Count DMatrix creation in time measurements')
parser.add_argument('--single-precision-histogram', default=False, action='store_true',
help='Build histograms instead of double precision')
parser.add_argument('--enable-experimental-json-serialization', default=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default=False is better if this feature affects perf.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true by default in XGBoost, so, as we discussed, I decided to leave it as is

@Alexsandruss Alexsandruss merged commit 296a991 into IntelPython:master Oct 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants