Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opinion on LightGBM scalability #1225

Closed
Laurae2 opened this issue Jan 30, 2018 · 6 comments
Closed

Opinion on LightGBM scalability #1225

Laurae2 opened this issue Jan 30, 2018 · 6 comments
Assignees
Labels

Comments

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 30, 2018

@guolinke What's your opinion on the current LightGBM scalability? Visual Studio has a strange behavior without CPU affinity, which makes it faster than if it had CPU affinity setup.

https://medium.com/@Laurae2/getting-the-most-of-xgboost-and-lightgbm-speed-compiler-cpu-pinning-374c38d82b86

I did some tests using R, and Visual Studio is still best for LightGBM as we saw before.

image

Note that when increasing the max depth/leaves to a large number, it takes forever to learn using MinGW.

Does Visual Studio has optimizations on Windows to be better without CPU pinning/affinity? It seems strange a roaming CPU is faster than with a pinned CPU.

image

@guolinke
Copy link
Collaborator

@Laurae2 It seems you are using a very sparse dataset.
Do you have the results on dense dataset ?

@Laurae2
Copy link
Contributor Author

Laurae2 commented Jan 30, 2018

@guolinke Do you have any large dense dataset with enough features? (something like 1M x 1K, or smaller)

Higgs is too small in number of features.

@guolinke
Copy link
Collaborator

guolinke commented Jan 30, 2018

@Laurae2 I think you can generate some new features in higgs dataset.
e.g. sum/sub/mul/div of two features.

@Laurae2
Copy link
Contributor Author

Laurae2 commented Feb 3, 2018

I started one run today using mul of all features. 33.3 GB matrix. Will come back later for results.

@Laurae2
Copy link
Contributor Author

Laurae2 commented Feb 3, 2018

@guolinke The difference is huge with CPU pinning on Higgs 33.3GB.

image

@guolinke
Copy link
Collaborator

guolinke commented Feb 4, 2018

@Laurae2 so for dense dataset, the pinned solution is better.
I guess the reason is the effect of memory bandwidth (latency).
For dense dataset, LightGBM prefers the high memory bandwidth, so pin solution is better.
While for sparse dataset, the memory bandwidth is not the bottleneck.

@Laurae2 Laurae2 closed this as completed Feb 6, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Mar 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants