Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some questions #1

Closed
guolinke opened this issue May 14, 2017 · 42 comments
Closed

some questions #1

guolinke opened this issue May 14, 2017 · 42 comments

Comments

@guolinke
Copy link

  1. I see the result of xgboost is 400 iterations, but lightgbm is 500 ? is this excepted ?
  2. The accuracy result is got by the last iteration, or the best on all iterations ?

I find the accuracy on Bosch have a quite big gap. Did you know why ? The two round finding for NAs ?

BTW, I run it in CLI version, it seems the gap is not so big.

@Laurae2
Copy link
Owner

Laurae2 commented May 14, 2017

I see the result of xgboost is 400 iterations, but lightgbm is 500 ? is this excepted ?

Number of iterations is different depending on hyperparameters used. Results must be taken with caution.

The accuracy result is got by the last iteration, or the best on all iterations ?

Using best iteration.

I find the accuracy on Bosch have a quite big gap. Did you know why ? The two round finding for NAs ?

I think NAs are the reason for such difference. I have some preprocessing to set them as 0 for both xgboost / LightGBM. All values are pushed to be positive except NAs. Then, they are converted to sparse matrices:

As there are many NAs/0s for both Bosch and Higgs in my case, not handling NAs on both sides per feature will hit the metric.

When I presented my results, I expected LightGBM would do better on Higgs (because it's a synthetic dataset), but my preprocessing hit its performance while xgboost is able to use the preprocessing to get an edge. On Higgs, LightGBM converged too fast, not being able to use the extra NA information xgboost can use.

BTW, I run it in CLI version, it seems the gap is not so big.

I have some preprocessing used for NAs which hurt the performance.

@Laurae2
Copy link
Owner

Laurae2 commented May 14, 2017

As the trees grow deeper, making use of NAs become essential.

You can see the issue in the convergence table below:

image

Full pic:

image

N.B: parameters reported are the indexes you can find here under "Hyperparameters used": https://sites.google.com/view/lauraepp/benchmarks

@guolinke
Copy link
Author

@Laurae2
OK, I see, NAs support is important.
Our goal is to make a fast and high accuracy GBDT tool. Thanks for your benchmark 👍

@Laurae2
Copy link
Owner

Laurae2 commented May 14, 2017

@guolinke I will re-run benchmarks later with:

  • Exact xgboost
  • New fast histogram xgboost (75% faster than previous version)
  • LightGBM (I'm still thinking about which version to use, or if I wait for NA implementation in LightGBM)

Setup I will use:

  • i7-7700K overclocked to 5GHz, for 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 threads on Bosch
  • Dual Xeon 10 core each (20 cores), for 1 + 2 + 3 + 4 + 5 + 10 + 20 + 40 threads Bosch

I will give up Higgs because you told me in microsoft/LightGBM#512 that LightGBM is parallelizing over columns. Got similar issue in xgboost.

All of the runs I will use -O3 -mtune=native for maximum performance. Will do xgboost first before LightGBM.

@guolinke
Copy link
Author

@Laurae2 can you update the accuracy results since LightGBM is capable of missing value handle now.

@Laurae2
Copy link
Owner

Laurae2 commented May 26, 2017

@guolinke Do you know which exact commit do you want me to use for benchmarks? My current xgboost benchmarks are going to end next week and I will be able to do full runs on LightGBM (ETA 1 week for full runs).

Some results below:

Bosch with simple test:

Algorithm Time (s) Perf (AUC) Best Iteration
xgboost depthwise 662.934 0.7194800 603
xgboost lossguide 656.224 0.7194800 603
LightGBM master 720.430 0.7176366 427
LightGBM v2 673.870 0.7147095 763
LightGBM v1 642.500 0.7167827 716

Setup:

  • CPU: i7-4600U
  • RAM: 16GB 1600MHz (2x 8GB)
  • OS: Windows 8.1 Pro
  • R: compiled with MinGW/gcc 7.1
  • Optimization: -O2 -mtune=native (Haswell)
  • No callbacks used (could be a source of issues for timings, fixing this and coming later with more results...)

Parameters:

  • Depth = 6
  • Leaves = 63
  • Bins = 255
  • Gamma = 1
  • Hessian = 1

LightGBM run:

# SET YOUR WORKING DIRECTORY
library(R.utils)
setwd("D:/Data Science/Bosch_mini")

library(lightgbm)
train <- lgb.Dataset("bosch_train_lgb.data")
test <- lgb.Dataset("bosch_test_lgb.data")

gc(verbose = FALSE)
set.seed(11111)
Laurae::timer_func_print({temp_model <- lgb.train(params = list(num_threads = 4,
                                                                learning_rate = 0.02,
                                                                max_depth = 6,
                                                                num_leaves = 63,
                                                                max_bin = 255,
                                                                min_gain_to_split = 1,
                                                                min_sum_hessian_in_leaf = 1,
                                                                min_data_in_leaf = 1,
                                                                bin_construct_sample_cnt = 1000000L),
                                                  data = train,
                                                  nrounds = 800,
                                                  valids = list(test = test),
                                                  objective = "binary",
                                                  metric = "auc",
                                                  verbose = 2)})
library(data.table)
perf <- as.numeric(rbindlist(temp_model$record_evals$test$auc))
max(perf)
which.max(perf)

xgboost run:

# SET YOUR WORKING DIRECTORY
library(R.utils)
setwd("D:/Data Science/Bosch_mini")

library(xgboost)
train <- xgb.DMatrix("bosch_train_xgb.data")
test <- xgb.DMatrix("bosch_test_xgb.data")

gc(verbose = FALSE)
set.seed(11111)
Laurae::timer_func_print({temp_model <- xgb.train(params = list(nthread = 4,
                                                                eta = 0.02,
                                                                max_depth = 6,
                                                                max_leaves = 63,
                                                                max_bin = 255,
                                                                gamma = 1,
                                                                min_child_weight = 1,
                                                                objective = "binary:logistic",
                                                                booster = "gbtree",
                                                                tree_method = "hist",
                                                                grow_policy = "lossguide"),
                                                  data = train,
                                                  watchlist = list(test = test),
                                                  eval_metric = "auc",
                                                  nrounds = 800,
                                                  verbose = 2)})
max(temp_model$evaluation_log$test_auc)
which.max(temp_model$evaluation_log$test_auc)

Raw log:

xgboost fast histogram depthwise:

[14:35:40] amalgamation/../src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 0 pruned nodes, max_depth=6
[800]	test-auc:0.719032 
The function ran in 662934.079 milliseconds.
[1] 662934.1
> max(temp_model$evaluation_log$test_auc)
[1] 0.71948
> which.max(temp_model$evaluation_log$test_auc)
[1] 603

xgboost fast histogram lossguide:

[14:49:44] amalgamation/../src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 0 pruned nodes, max_depth=6
[800]	test-auc:0.719032 
The function ran in 656224.842 milliseconds.
[1] 656224.8
> max(temp_model$evaluation_log$test_auc)
[1] 0.71948
> which.max(temp_model$evaluation_log$test_auc)
[1] 603

LightGBM master:

devtools::install_github("Microsoft/LightGBM/R-package")
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=19 and max_depth=6
[800]:	test's auc:0.716801 
The function ran in 720430.291 milliseconds.
[1] 720430.3
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$auc))
> max(perf)
[1] 0.7176366
> which.max(perf)
[1] 427

LightGBM v2.0:

devtools::install_github("Microsoft/LightGBM/R-package@v2.0")
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=18 and max_depth=6
[800]:	test's auc:0.714139 
The function ran in 673870.612 milliseconds.
[1] 673870.6
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$auc))
> max(perf)
[1] 0.7147095
> which.max(perf)
[1] 763

LightGBM v1.0:

devtools::install_github("Microsoft/LightGBM/R-package@v1")
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 24
[800]:	test's auc:0.716018 
The function ran in 642500.495 milliseconds.
[1] 642500.5
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$auc))
> max(perf)
[1] 0.7167827
> which.max(perf)
[1] 716

@guolinke
Copy link
Author

@Laurae2 strange, I think the master will be much faster than v2.0 and v1.
Can you try construct the dataset first? by train$construct() .

@guolinke
Copy link
Author

@Laurae2
XGBoost will construct the dataset when create the DMatrix class. But LightGBM use the lazy init, the dataset will be created when calling lgb.train .

@Laurae2
Copy link
Owner

Laurae2 commented May 26, 2017

@guolinke For LightGBM I created the binned dataset before calling lgb.train (I use a dummy training so the next one starts immediately).

For xgboost, I can't control when the binned dataset is created (for fast histogram, it is created everytime xgb.train runs). xgb.DMatrix is slow because the generated dataset is 7x larger than LightGBM...

image

I will re-run with your train$construct() and use a custom callback for AUC. Results are weird.

@guolinke
Copy link
Author

@Laurae2 another issue is the eval_metric = "auc" . auc metric is slow due to it need to eval in single thread. XGBoost use std::parallel_sort to speed up this. (I also will add this soon)

@guolinke
Copy link
Author

guolinke commented May 26, 2017

@Laurae2 Did you re-create the lgb.Dataset by the master branch ? its structure of Dataset is far different with V1 and V2.0 .
You cannot use the generated dataset binaries by v1 or v2.0 for the benchmark of master branch now...

@Laurae2
Copy link
Owner

Laurae2 commented May 26, 2017

@guolinke I recreate new datasets everytime I change xgboost/LightGBM version. LightGBM v2 datasets cause crashes for LightGBM v1.

I will reupdate soon with:

  • xgboost 2017-05-02: https://github.com/Laurae2/ez_xgb (depthwise + lossguide - strangely they lead to identical performance when depth is too large, lossguide is better with a maximum depth of 5)
  • LightGBM v1 (Microsoft/LightGBM@v1)
  • LightGBM v2.0 (Microsoft/LightGBM@v2.0)
  • LightGBM current master branch (microsoft/LightGBM@97ca38d)

xgboost depthwise vs lossguide AUC strangeness below: https://public.tableau.com/views/gbt_benchmarks/AUC-Data?:showVizHome=no

image

@guolinke
Copy link
Author

@Laurae2
In my benchmarks, the master is about 50% faster than v2.0 on bosch dataset...

BTW, can you also try to without setting bin_construct_sample_cnt . In my experiment, set it to a large value will cause the loss on accuracy, and also much slower..

@Laurae2
Copy link
Owner

Laurae2 commented May 26, 2017

@guolinke Using $construct() seems to lead to whole different datasets, interesting (number of bins change, performance out of one iteration using MSE also changes significantly).

I will follow what you recommended me when I will test LightGBM on larger tests.

So far I will do this for the short test (should be over next hour) for LightGBM:

  • Constructed dataset before training starts
  • Use bin_construct_sample_cnt = 100000

By the way, do you know how to override LightGBM feature selection? (bypassing feature removal during training). I would like to keep the number of features constant when data is sparse on a new large test I would like to add (about 10,000 selected features). Or should I leave LightGBM choose features?

It seems I can reproduce the number of bins / features selected, so I think this is not too much an issue.

@guolinke
Copy link
Author

@Laurae2 I think maybe the construct may have some bugs...

It is hard to override LightGBM feature selection, there are many related codes..
But I think it isn't a issue.

@guolinke
Copy link
Author

@Laurae2
can you try use lgb.Dataset.create.valid to create validation data ?
also use data/higgs_sparse.rds to re-generate lgb.Dataset here ?

@guolinke
Copy link
Author

guolinke commented May 26, 2017

@Laurae2 I also worry about the cpu usage of MinGW version as well.
I think we can have the small benchmark first, and run the large one when these issues are fixed.

@Laurae2
Copy link
Owner

Laurae2 commented May 26, 2017

@guolinke -O3 optimization makes LightGBM master dataset load crash. I'll create an issue on Microsoft/LightGBM about it when I find a way to reproduce it using a smaller dataset. I'll keep using -O2 for this small test.

On MinGW + LightGBM I'm at 90% CPU, on MinGW + xgboost CPU usage is slightly lower at around 80%.

@guolinke
Copy link
Author

@Laurae2 OK. I will paste some my test results tomorrow (about 12 hour later).

@Laurae2
Copy link
Owner

Laurae2 commented May 26, 2017

@guolinke free_raw_data = FALSE is ok to use? I can't train a model when using construct without free_raw_data = FALSE.

@guolinke
Copy link
Author

@Laurae2 it is ok to use.

@Laurae2
Copy link
Owner

Laurae2 commented May 26, 2017

New results, using -O2 because LightGBM master dataset load crashes with -O3:

NEW (custom callback):

Algorithm Time (s) Perf (AUC) Best Iteration
xgboost depthwise 694.043 0.7194797 603
xgboost lossguide 696.006 0.7194797 603
LightGBM master 655.582 0.7180479 796
LightGBM v2 675.591 0.7146346 636
LightGBM v1 670.116 0.7167827 716

OLD (no custom callback):

Algorithm Time (s) Perf (AUC) Best Iteration
xgboost depthwise 662.934 0.7194800 603
xgboost lossguide 656.224 0.7194800 603
LightGBM master 720.430 0.7176366 427
LightGBM v2 673.870 0.7147095 763
LightGBM v1 642.500 0.7167827 716

xgboost depthwise:

[18:30:30] amalgamation/../src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 0 pruned nodes, max_depth=6
[800]	test-auc:0.719032 
The function ran in 694043.139 milliseconds.
[1] 694043.1
> max(temp_model$evaluation_log$test_auc)
[1] 0.7194797
> which.max(temp_model$evaluation_log$test_auc)
[1] 603

xgboost lossguide:

[18:17:27] amalgamation/../src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 0 pruned nodes, max_depth=6
[800]	test-auc:0.719032 
The function ran in 696006.274 milliseconds.
[1] 696006.3
> max(temp_model$evaluation_log$test_auc)
[1] 0.7194797
> which.max(temp_model$evaluation_log$test_auc)
[1] 603

LightGBM v1:

[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 24
[800]:	test's auc:0.716018 
The function ran in 670116.550 milliseconds.
[1] 670116.5
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$auc))
> max(perf)
[1] 0.7167827
> which.max(perf)
[1] 716

LightGBM v2:

[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 19
[800]:	test's auc:0.71364 
The function ran in 675590.982 milliseconds.
[1] 675591
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$auc))
> max(perf)
[1] 0.7146346
> which.max(perf)
[1] 636

LightGBM master:

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=23 and max_depth=6
[800]:	test's auc:0.717726 
The function ran in 655582.692 milliseconds.
[1] 655582.7
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$auc))
> max(perf)
[1] 0.7180479
> which.max(perf)
[1] 796

@guolinke
Copy link
Author

@Laurae2 why the best iteration changes so much in LightGBM ?

@Laurae2
Copy link
Owner

Laurae2 commented May 26, 2017

@guolinke I think it is because dataset becomes a lot different when using bin_construct_sample_cnt = 100000 instead of bin_construct_sample_cnt = 1000000.

@guolinke
Copy link
Author

OK.
I try use the cli version to compare the v2.0 and master first.
Parameters: data=bosch.train num_threads=16 learning_rate=0.02 max_depth=6 num_leaves=63 min_gain_to_split=1 min_sum_hessian_in_leaf=1 min_data_in_leaf=1 num_trees=800 metric=binary_logloss test=bosch.test app=binary

v2.0 result:

[LightGBM] [Info] No further splits with positive gain, best gain: -1.#INF00
[LightGBM] [Info] Trained a tree with leaves=14 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0307989
[LightGBM] [Info] 406.397653 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

master result:

[LightGBM] [Info] No further splits with positive gain, best gain: -1.#INF00
[LightGBM] [Info] Trained a tree with leaves=24 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0303121
[LightGBM] [Info] 164.233118 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

master is far faster than v2.0 .

@guolinke
Copy link
Author

R version:

Script:

library(data.table)
library(Matrix)
library(lightgbm)
library(R.utils)

# Do xgboost / LightGBM
train_sparse <- readRDS(file = "../data/bosch_sparse.rds")
label <- readRDS(file = "../data/bosch_label.rds")

# Split
train_1 <- train_sparse[1:1000000, ]
train_2 <- label[1:1000000]
test_1 <- train_sparse[1000001:1183747, ]
test_2 <- label[1000001:1183747]

#library(sparsity)
#write.svmlight(train_1, train_2, "bosch.train")
#write.svmlight(test_1, test_2, "bosch.test")

# For LightGBM
train  <- lgb.Dataset(data = train_1, label = train_2)
test <- lgb.Dataset(data = test_1, label = test_2, reference=train)
train$construct()
test$construct()

Laurae::timer_func_print({temp_model <- lgb.train(params = list(num_threads = 16,
                                                                learning_rate = 0.02,
                                                                max_depth = 6,
                                                                num_leaves = 63,
                                                                max_bin = 255,
                                                                min_gain_to_split = 1,
                                                                min_sum_hessian_in_leaf = 1,
                                                                min_data_in_leaf = 1),
                                                  data = train,
                                                  nrounds = 800,
                                                  valids = list(test = test),
                                                  objective = "binary",
                                                  metric = "binary_logloss",
                                                  verbose = 2)})

library(data.table)
perf <- as.numeric(rbindlist(temp_model$record_evals$test$binary_logloss))
min(perf)
which.min(perf)

v2.0 result:

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=6
[800]:  test's binary_logloss:0.0308336
The function ran in 416997.480 milliseconds.
[1] 416997.5
> library(data.table)
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$binary_logloss))
> min(perf)
[1] 0.03083347
> which.min(perf)
[1] 797
>

master result:

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=30 and max_depth=6
[800]:  test's binary_logloss:0.0303229
The function ran in 190090.959 milliseconds.
[1] 190091
>
> library(data.table)
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$binary_logloss))
> min(perf)
[1] 0.03031625
> which.min(perf)
[1] 730

@Laurae2
Copy link
Owner

Laurae2 commented May 28, 2017

@guolinke When I'll get my bigger server with the 20 cores available I will re-test again.

@guolinke
Copy link
Author

@Laurae2
You also can use log_loss as metric, since the time cost of auc is much larger.

@guolinke
Copy link
Author

guolinke commented May 29, 2017

@Laurae2
even when I run it on i7-6700 machine, the master branch is still much faster than v2.0 branch.
My version is based on mingw as well... Not sure why your result is slow...
I guess maybe the dataset you create is still the v2.0 version.

@Laurae2
Copy link
Owner

Laurae2 commented May 29, 2017

@guolinke weird, because I wipe the binary datasets and recreate them manually everytime.

Note that I'm using a custom R installation with gcc 7.1, unlike default R with gcc 4.9. I wonder if there are differences for LightGBM on gcc 7.1 vs gcc 4.9.

@guolinke
Copy link
Author

@Laurae2 Can your result on master branch much faster the v2.0 branch now ?

@Laurae2
Copy link
Owner

Laurae2 commented May 31, 2017

@guolinke I'll access my laptop in about 11h. At that time I will be able to check speed of v2.0 and master. (I doubt my 20 core server will be available)

@Laurae2
Copy link
Owner

Laurae2 commented May 31, 2017

@guolinke I think MinGW is better for low amount of cores while VS might be better (untested) on more cores. Using 4 threads on i7-4600U, no CPU throttling. I must test on more cores when my 20 core server is free.

Algorithm Time (s)
v2.0 CLI (O3) 645.962
v2.0 R (O2) 673.568
master CLI (O3) 588.445
master R (O2) 607.209
Visual Studio 615.863

Configuration:

  • MinGW 7.1 (gcc 7.1)
  • Visual Studio 2017

v2.0 CLI (O3)

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=27 and max_depth=6
[LightGBM] [Info] Iteration:799, valid_1 binary_logloss : 0.0305089
[LightGBM] [Info] 645.962530 seconds elapsed, finished iteration 799
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=36 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0305101
[LightGBM] [Info] 646.770656 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

v2.0 R (O2)

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=13 and max_depth=6
[799]:	test's binary_logloss:0.0305097 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=20 and max_depth=6
[800]:	test's binary_logloss:0.0305099 
The function ran in 673568.532 milliseconds.
[1] 673568.5

master CLI

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=13 and max_depth=6
[LightGBM] [Info] Iteration:799, valid_1 binary_logloss : 0.0304543
[LightGBM] [Info] 587.834080 seconds elapsed, finished iteration 799
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0304542
[LightGBM] [Info] 588.445944 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

master R

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=21 and max_depth=6
[799]:	test's binary_logloss:0.0304331 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=23 and max_depth=6
[800]:	test's binary_logloss:0.0304333 
The function ran in 607209.395 milliseconds.
[1] 607209.4

master CLI Visual Studio 2017

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=6
[LightGBM] [Info] Iteration:799, valid_1 binary_logloss : 0.030431
[LightGBM] [Info] 615.327239 seconds elapsed, finished iteration 799
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=11 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0304305
[LightGBM] [Info] 615.863466 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

@Laurae2
Copy link
Owner

Laurae2 commented Jun 1, 2017

2x 10 core CPU (Dual Xeon Ivy Bridge, 3.3/2.7 GHz), 40 threads:

master = microsoft/LightGBM@1d5867b

Algorithm Time (s) Threads
v2.0 R (O2) 244.542 40
master R (O2) 174.157 40
master CLI (O3) 164.336 40
master CLI (O3) 225.045 20
Visual Studio 139.214 40
Visual Studio 162.792 20

When using many cores, Visual Studio is significantly faster. But for MinGW, it is not good.

So far, master branch is really fast (but I don't have the wide gap you have). Core scaling is way better with the master branch.

v2.0 R (O2), CPU usage approx 25%

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=13 and max_depth=6
[799]:	test's binary_logloss:0.0305097 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=20 and max_depth=6
[800]:	test's binary_logloss:0.0305099 
The function ran in 244542.445 milliseconds.
[1] 244542.4

master R (O2): CPU usage approx 55%

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=14 and max_depth=6
[799]:	test's binary_logloss:0.0304216 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=24 and max_depth=6
[800]:	test's binary_logloss:0.0304225 
The function ran in 174157.502 milliseconds.
[1] 174157.5

master CLI (O3) with 40 threads: CPU usage approx 45%

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=13 and max_depth=6
[LightGBM] [Info] Iteration:799, valid_1 binary_logloss : 0.0304543
[LightGBM] [Info] 164.173974 seconds elapsed, finished iteration 799
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0304542
[LightGBM] [Info] 164.336424 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

master CLI Visual Studio 2017 with 40 threads: CPU usage approx 100%

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=6
[LightGBM] [Info] Iteration:799, valid_1 binary_logloss : 0.030431
[LightGBM] [Info] 139.108287 seconds elapsed, finished iteration 799
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=11 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0304305
[LightGBM] [Info] 139.214094 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

master CLI (O3) with 20 threads: CPU usage approx 33%

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=13 and max_depth=6
[LightGBM] [Info] Iteration:799, valid_1 binary_logloss : 0.0304543
[LightGBM] [Info] 224.811707 seconds elapsed, finished iteration 799
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0304542
[LightGBM] [Info] 225.045553 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

master CLI Visual Studio 2017 with 20 threads: CPU usage approx 50%

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=6
[LightGBM] [Info] Iteration:799, valid_1 binary_logloss : 0.030431
[LightGBM] [Info] 162.647329 seconds elapsed, finished iteration 799
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=11 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0304305
[LightGBM] [Info] 162.792439 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

@Laurae2
Copy link
Owner

Laurae2 commented Jun 10, 2017

@guolinke Can I run the long tests on microsoft/LightGBM@3089f0b with Visual Studio 2017, or do you have any specific commit you would like me to benchmark? My full xgboost runs (including exact method which take forever) are ending this week.

@guolinke
Copy link
Author

@Laurae2
I think the latest or the microsoft/LightGBM@582ded5 is better.

@Laurae2
Copy link
Owner

Laurae2 commented Jun 10, 2017

@guolinke I'll use microsoft/LightGBM@a8673bd (latest master branch) then.

@Laurae2
Copy link
Owner

Laurae2 commented Jun 15, 2017

@guolinke I have one server which finished running my benchmarks. I'll repost here when I get time to create a dashboard you will be able to explore (probably tomorrow).

@guolinke
Copy link
Author

@Laurae2
Okay, Thanks very much.

BTW, I find the speed of LightGBM in VM(Azure) will be about 2x-3x slower than "real" machine, in multi-threading, when using the same CPU.

@Laurae2
Copy link
Owner

Laurae2 commented Jun 18, 2017

@guolinke Here for the new benchmarks, tested on i7-7700K and 20 core Xeon: https://sites.google.com/view/lauraepp/new-benchmarks

On VMs I noticed if the host machine is not rebooted frequently, the CPU performance sinks. I remember a server I did not reboot for 1 year whose performance was 50% of original performance. A simple reboot got it back to 100% (Cinebench R15 score of 350 instead of 700).

@guolinke
Copy link
Author

@Laurae2
Thanks. Now the accuracy of LightGBM seems is much better.

@Laurae2
Copy link
Owner

Laurae2 commented Jun 18, 2017

@guolinke I improved the chart layout if you want to check more in details, because I put too many charts on single pages. Now I separated them.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants