some questions #1

guolinke · 2017-05-14T09:07:28Z

I see the result of xgboost is 400 iterations, but lightgbm is 500 ? is this excepted ?
The accuracy result is got by the last iteration, or the best on all iterations ?

I find the accuracy on Bosch have a quite big gap. Did you know why ? The two round finding for NAs ?

BTW, I run it in CLI version, it seems the gap is not so big.

Laurae2 · 2017-05-14T09:44:21Z

I see the result of xgboost is 400 iterations, but lightgbm is 500 ? is this excepted ?

Number of iterations is different depending on hyperparameters used. Results must be taken with caution.

The accuracy result is got by the last iteration, or the best on all iterations ?

Using best iteration.

I find the accuracy on Bosch have a quite big gap. Did you know why ? The two round finding for NAs ?

I think NAs are the reason for such difference. I have some preprocessing to set them as 0 for both xgboost / LightGBM. All values are pushed to be positive except NAs. Then, they are converted to sparse matrices:

xgboost can use NAs properly (they are these 0s when using sparse matrices)
but LightGBM can't ([Feature] Let data inform node assignment of missing values microsoft/LightGBM#122)

As there are many NAs/0s for both Bosch and Higgs in my case, not handling NAs on both sides per feature will hit the metric.

When I presented my results, I expected LightGBM would do better on Higgs (because it's a synthetic dataset), but my preprocessing hit its performance while xgboost is able to use the preprocessing to get an edge. On Higgs, LightGBM converged too fast, not being able to use the extra NA information xgboost can use.

BTW, I run it in CLI version, it seems the gap is not so big.

I have some preprocessing used for NAs which hurt the performance.

Laurae2 · 2017-05-14T09:46:21Z

As the trees grow deeper, making use of NAs become essential.

You can see the issue in the convergence table below:

Full pic:

N.B: parameters reported are the indexes you can find here under "Hyperparameters used": https://sites.google.com/view/lauraepp/benchmarks

guolinke · 2017-05-14T10:03:21Z

@Laurae2
OK, I see, NAs support is important.
Our goal is to make a fast and high accuracy GBDT tool. Thanks for your benchmark 👍

Laurae2 · 2017-05-14T18:04:44Z

@guolinke I will re-run benchmarks later with:

Exact xgboost
New fast histogram xgboost (75% faster than previous version)
LightGBM (I'm still thinking about which version to use, or if I wait for NA implementation in LightGBM)

Setup I will use:

i7-7700K overclocked to 5GHz, for 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 threads on Bosch
Dual Xeon 10 core each (20 cores), for 1 + 2 + 3 + 4 + 5 + 10 + 20 + 40 threads Bosch

I will give up Higgs because you told me in microsoft/LightGBM#512 that LightGBM is parallelizing over columns. Got similar issue in xgboost.

All of the runs I will use -O3 -mtune=native for maximum performance. Will do xgboost first before LightGBM.

guolinke · 2017-05-26T11:56:23Z

@Laurae2 can you update the accuracy results since LightGBM is capable of missing value handle now.

Laurae2 · 2017-05-26T13:59:20Z

@guolinke Do you know which exact commit do you want me to use for benchmarks? My current xgboost benchmarks are going to end next week and I will be able to do full runs on LightGBM (ETA 1 week for full runs).

Some results below:

Bosch with simple test:

Algorithm	Time (s)	Perf (AUC)	Best Iteration
xgboost depthwise	662.934	0.7194800	603
xgboost lossguide	656.224	0.7194800	603
LightGBM master	720.430	0.7176366	427
LightGBM v2	673.870	0.7147095	763
LightGBM v1	642.500	0.7167827	716

Setup:

CPU: i7-4600U
RAM: 16GB 1600MHz (2x 8GB)
OS: Windows 8.1 Pro
R: compiled with MinGW/gcc 7.1
Optimization: -O2 -mtune=native (Haswell)
No callbacks used (could be a source of issues for timings, fixing this and coming later with more results...)

Parameters:

Depth = 6
Leaves = 63
Bins = 255
Gamma = 1
Hessian = 1

LightGBM run:

# SET YOUR WORKING DIRECTORY
library(R.utils)
setwd("D:/Data Science/Bosch_mini")

library(lightgbm)
train <- lgb.Dataset("bosch_train_lgb.data")
test <- lgb.Dataset("bosch_test_lgb.data")

gc(verbose = FALSE)
set.seed(11111)
Laurae::timer_func_print({temp_model <- lgb.train(params = list(num_threads = 4,
                                                                learning_rate = 0.02,
                                                                max_depth = 6,
                                                                num_leaves = 63,
                                                                max_bin = 255,
                                                                min_gain_to_split = 1,
                                                                min_sum_hessian_in_leaf = 1,
                                                                min_data_in_leaf = 1,
                                                                bin_construct_sample_cnt = 1000000L),
                                                  data = train,
                                                  nrounds = 800,
                                                  valids = list(test = test),
                                                  objective = "binary",
                                                  metric = "auc",
                                                  verbose = 2)})
library(data.table)
perf <- as.numeric(rbindlist(temp_model$record_evals$test$auc))
max(perf)
which.max(perf)

xgboost run:

# SET YOUR WORKING DIRECTORY
library(R.utils)
setwd("D:/Data Science/Bosch_mini")

library(xgboost)
train <- xgb.DMatrix("bosch_train_xgb.data")
test <- xgb.DMatrix("bosch_test_xgb.data")

gc(verbose = FALSE)
set.seed(11111)
Laurae::timer_func_print({temp_model <- xgb.train(params = list(nthread = 4,
                                                                eta = 0.02,
                                                                max_depth = 6,
                                                                max_leaves = 63,
                                                                max_bin = 255,
                                                                gamma = 1,
                                                                min_child_weight = 1,
                                                                objective = "binary:logistic",
                                                                booster = "gbtree",
                                                                tree_method = "hist",
                                                                grow_policy = "lossguide"),
                                                  data = train,
                                                  watchlist = list(test = test),
                                                  eval_metric = "auc",
                                                  nrounds = 800,
                                                  verbose = 2)})
max(temp_model$evaluation_log$test_auc)
which.max(temp_model$evaluation_log$test_auc)

Raw log:

xgboost fast histogram depthwise:

[14:35:40] amalgamation/../src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 0 pruned nodes, max_depth=6
[800]	test-auc:0.719032 
The function ran in 662934.079 milliseconds.
[1] 662934.1
> max(temp_model$evaluation_log$test_auc)
[1] 0.71948
> which.max(temp_model$evaluation_log$test_auc)
[1] 603

xgboost fast histogram lossguide:

[14:49:44] amalgamation/../src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 0 pruned nodes, max_depth=6
[800]	test-auc:0.719032 
The function ran in 656224.842 milliseconds.
[1] 656224.8
> max(temp_model$evaluation_log$test_auc)
[1] 0.71948
> which.max(temp_model$evaluation_log$test_auc)
[1] 603

LightGBM master:

devtools::install_github("Microsoft/LightGBM/R-package")
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=19 and max_depth=6
[800]:	test's auc:0.716801 
The function ran in 720430.291 milliseconds.
[1] 720430.3
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$auc))
> max(perf)
[1] 0.7176366
> which.max(perf)
[1] 427

LightGBM v2.0:

devtools::install_github("Microsoft/LightGBM/R-package@v2.0")
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=18 and max_depth=6
[800]:	test's auc:0.714139 
The function ran in 673870.612 milliseconds.
[1] 673870.6
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$auc))
> max(perf)
[1] 0.7147095
> which.max(perf)
[1] 763

LightGBM v1.0:

devtools::install_github("Microsoft/LightGBM/R-package@v1")
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 24
[800]:	test's auc:0.716018 
The function ran in 642500.495 milliseconds.
[1] 642500.5
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$auc))
> max(perf)
[1] 0.7167827
> which.max(perf)
[1] 716

guolinke · 2017-05-26T14:03:08Z

@Laurae2 strange, I think the master will be much faster than v2.0 and v1.
Can you try construct the dataset first? by train$construct() .

guolinke · 2017-05-26T14:04:28Z

@Laurae2
XGBoost will construct the dataset when create the DMatrix class. But LightGBM use the lazy init, the dataset will be created when calling lgb.train .

Laurae2 · 2017-05-26T14:09:30Z

@guolinke For LightGBM I created the binned dataset before calling lgb.train (I use a dummy training so the next one starts immediately).

For xgboost, I can't control when the binned dataset is created (for fast histogram, it is created everytime xgb.train runs). xgb.DMatrix is slow because the generated dataset is 7x larger than LightGBM...

I will re-run with your train$construct() and use a custom callback for AUC. Results are weird.

guolinke · 2017-05-26T14:09:33Z

@Laurae2 another issue is the eval_metric = "auc" . auc metric is slow due to it need to eval in single thread. XGBoost use std::parallel_sort to speed up this. (I also will add this soon)

guolinke · 2017-05-26T14:15:06Z

@Laurae2 Did you re-create the lgb.Dataset by the master branch ? its structure of Dataset is far different with V1 and V2.0 .
You cannot use the generated dataset binaries by v1 or v2.0 for the benchmark of master branch now...

Laurae2 · 2017-05-26T14:24:48Z

@guolinke I recreate new datasets everytime I change xgboost/LightGBM version. LightGBM v2 datasets cause crashes for LightGBM v1.

I will reupdate soon with:

xgboost 2017-05-02: https://github.com/Laurae2/ez_xgb (depthwise + lossguide - strangely they lead to identical performance when depth is too large, lossguide is better with a maximum depth of 5)
LightGBM v1 (Microsoft/LightGBM@v1)
LightGBM v2.0 (Microsoft/LightGBM@v2.0)
LightGBM current master branch (microsoft/LightGBM@97ca38d)

xgboost depthwise vs lossguide AUC strangeness below: https://public.tableau.com/views/gbt_benchmarks/AUC-Data?:showVizHome=no

guolinke · 2017-05-26T14:37:27Z

@Laurae2
In my benchmarks, the master is about 50% faster than v2.0 on bosch dataset...

BTW, can you also try to without setting bin_construct_sample_cnt . In my experiment, set it to a large value will cause the loss on accuracy, and also much slower..

Laurae2 · 2017-05-26T14:52:50Z

@guolinke Using $construct() seems to lead to whole different datasets, interesting (number of bins change, performance out of one iteration using MSE also changes significantly).

I will follow what you recommended me when I will test LightGBM on larger tests.

So far I will do this for the short test (should be over next hour) for LightGBM:

Constructed dataset before training starts
Use bin_construct_sample_cnt = 100000

By the way, do you know how to override LightGBM feature selection? (bypassing feature removal during training). I would like to keep the number of features constant when data is sparse on a new large test I would like to add (about 10,000 selected features). Or should I leave LightGBM choose features?

It seems I can reproduce the number of bins / features selected, so I think this is not too much an issue.

guolinke · 2017-05-26T14:59:03Z

@Laurae2 I think maybe the construct may have some bugs...

It is hard to override LightGBM feature selection, there are many related codes..
But I think it isn't a issue.

guolinke · 2017-05-26T15:12:19Z

@Laurae2
can you try use lgb.Dataset.create.valid to create validation data ?
also use data/higgs_sparse.rds to re-generate lgb.Dataset here ?

guolinke · 2017-05-26T15:14:50Z

@Laurae2 I also worry about the cpu usage of MinGW version as well.
I think we can have the small benchmark first, and run the large one when these issues are fixed.

Laurae2 · 2017-05-26T15:20:24Z

@guolinke -O3 optimization makes LightGBM master dataset load crash. I'll create an issue on Microsoft/LightGBM about it when I find a way to reproduce it using a smaller dataset. I'll keep using -O2 for this small test.

On MinGW + LightGBM I'm at 90% CPU, on MinGW + xgboost CPU usage is slightly lower at around 80%.

guolinke · 2017-05-26T15:23:26Z

@Laurae2 OK. I will paste some my test results tomorrow (about 12 hour later).

Laurae2 · 2017-05-26T15:24:00Z

@guolinke free_raw_data = FALSE is ok to use? I can't train a model when using construct without free_raw_data = FALSE.

guolinke · 2017-05-26T15:25:30Z

@Laurae2 it is ok to use.

Laurae2 · 2017-05-26T16:35:05Z

New results, using -O2 because LightGBM master dataset load crashes with -O3:

NEW (custom callback):

Algorithm	Time (s)	Perf (AUC)	Best Iteration
xgboost depthwise	694.043	0.7194797	603
xgboost lossguide	696.006	0.7194797	603
LightGBM master	655.582	0.7180479	796
LightGBM v2	675.591	0.7146346	636
LightGBM v1	670.116	0.7167827	716

OLD (no custom callback):

Algorithm	Time (s)	Perf (AUC)	Best Iteration
xgboost depthwise	662.934	0.7194800	603
xgboost lossguide	656.224	0.7194800	603
LightGBM master	720.430	0.7176366	427
LightGBM v2	673.870	0.7147095	763
LightGBM v1	642.500	0.7167827	716

xgboost depthwise:

[18:30:30] amalgamation/../src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 0 pruned nodes, max_depth=6
[800]	test-auc:0.719032 
The function ran in 694043.139 milliseconds.
[1] 694043.1
> max(temp_model$evaluation_log$test_auc)
[1] 0.7194797
> which.max(temp_model$evaluation_log$test_auc)
[1] 603

xgboost lossguide:

[18:17:27] amalgamation/../src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 0 pruned nodes, max_depth=6
[800]	test-auc:0.719032 
The function ran in 696006.274 milliseconds.
[1] 696006.3
> max(temp_model$evaluation_log$test_auc)
[1] 0.7194797
> which.max(temp_model$evaluation_log$test_auc)
[1] 603

LightGBM v1:

[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 24
[800]:	test's auc:0.716018 
The function ran in 670116.550 milliseconds.
[1] 670116.5
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$auc))
> max(perf)
[1] 0.7167827
> which.max(perf)
[1] 716

LightGBM v2:

[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 19
[800]:	test's auc:0.71364 
The function ran in 675590.982 milliseconds.
[1] 675591
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$auc))
> max(perf)
[1] 0.7146346
> which.max(perf)
[1] 636

LightGBM master:

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=23 and max_depth=6
[800]:	test's auc:0.717726 
The function ran in 655582.692 milliseconds.
[1] 655582.7
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$auc))
> max(perf)
[1] 0.7180479
> which.max(perf)
[1] 796

guolinke · 2017-05-26T16:49:15Z

@Laurae2 why the best iteration changes so much in LightGBM ?

Laurae2 · 2017-05-26T16:51:40Z

@guolinke I think it is because dataset becomes a lot different when using bin_construct_sample_cnt = 100000 instead of bin_construct_sample_cnt = 1000000.

guolinke · 2017-05-27T06:20:28Z

OK.
I try use the cli version to compare the v2.0 and master first.
Parameters: data=bosch.train num_threads=16 learning_rate=0.02 max_depth=6 num_leaves=63 min_gain_to_split=1 min_sum_hessian_in_leaf=1 min_data_in_leaf=1 num_trees=800 metric=binary_logloss test=bosch.test app=binary

v2.0 result:

[LightGBM] [Info] No further splits with positive gain, best gain: -1.#INF00
[LightGBM] [Info] Trained a tree with leaves=14 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0307989
[LightGBM] [Info] 406.397653 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

master result:

[LightGBM] [Info] No further splits with positive gain, best gain: -1.#INF00
[LightGBM] [Info] Trained a tree with leaves=24 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0303121
[LightGBM] [Info] 164.233118 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

master is far faster than v2.0 .

guolinke · 2017-05-27T06:37:11Z

R version:

Script:

library(data.table)
library(Matrix)
library(lightgbm)
library(R.utils)

# Do xgboost / LightGBM
train_sparse <- readRDS(file = "../data/bosch_sparse.rds")
label <- readRDS(file = "../data/bosch_label.rds")

# Split
train_1 <- train_sparse[1:1000000, ]
train_2 <- label[1:1000000]
test_1 <- train_sparse[1000001:1183747, ]
test_2 <- label[1000001:1183747]

#library(sparsity)
#write.svmlight(train_1, train_2, "bosch.train")
#write.svmlight(test_1, test_2, "bosch.test")

# For LightGBM
train  <- lgb.Dataset(data = train_1, label = train_2)
test <- lgb.Dataset(data = test_1, label = test_2, reference=train)
train$construct()
test$construct()

Laurae::timer_func_print({temp_model <- lgb.train(params = list(num_threads = 16,
                                                                learning_rate = 0.02,
                                                                max_depth = 6,
                                                                num_leaves = 63,
                                                                max_bin = 255,
                                                                min_gain_to_split = 1,
                                                                min_sum_hessian_in_leaf = 1,
                                                                min_data_in_leaf = 1),
                                                  data = train,
                                                  nrounds = 800,
                                                  valids = list(test = test),
                                                  objective = "binary",
                                                  metric = "binary_logloss",
                                                  verbose = 2)})

library(data.table)
perf <- as.numeric(rbindlist(temp_model$record_evals$test$binary_logloss))
min(perf)
which.min(perf)

v2.0 result:

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=6
[800]:  test's binary_logloss:0.0308336
The function ran in 416997.480 milliseconds.
[1] 416997.5
> library(data.table)
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$binary_logloss))
> min(perf)
[1] 0.03083347
> which.min(perf)
[1] 797
>

master result:

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=30 and max_depth=6
[800]:  test's binary_logloss:0.0303229
The function ran in 190090.959 milliseconds.
[1] 190091
>
> library(data.table)
> perf <- as.numeric(rbindlist(temp_model$record_evals$test$binary_logloss))
> min(perf)
[1] 0.03031625
> which.min(perf)
[1] 730

Laurae2 · 2017-05-28T22:16:21Z

@guolinke When I'll get my bigger server with the 20 cores available I will re-test again.

guolinke · 2017-05-29T02:28:47Z

@Laurae2
You also can use log_loss as metric, since the time cost of auc is much larger.

guolinke · 2017-05-29T03:59:52Z

@Laurae2
even when I run it on i7-6700 machine, the master branch is still much faster than v2.0 branch.
My version is based on mingw as well... Not sure why your result is slow...
I guess maybe the dataset you create is still the v2.0 version.

Laurae2 · 2017-05-29T05:40:31Z

@guolinke weird, because I wipe the binary datasets and recreate them manually everytime.

Note that I'm using a custom R installation with gcc 7.1, unlike default R with gcc 4.9. I wonder if there are differences for LightGBM on gcc 7.1 vs gcc 4.9.

guolinke · 2017-05-31T03:23:54Z

@Laurae2 Can your result on master branch much faster the v2.0 branch now ?

Laurae2 · 2017-05-31T05:30:20Z

@guolinke I'll access my laptop in about 11h. At that time I will be able to check speed of v2.0 and master. (I doubt my 20 core server will be available)

Laurae2 · 2017-05-31T18:25:53Z

@guolinke I think MinGW is better for low amount of cores while VS might be better (untested) on more cores. Using 4 threads on i7-4600U, no CPU throttling. I must test on more cores when my 20 core server is free.

Algorithm	Time (s)
v2.0 CLI (O3)	645.962
v2.0 R (O2)	673.568
master CLI (O3)	588.445
master R (O2)	607.209
Visual Studio	615.863

Configuration:

MinGW 7.1 (gcc 7.1)
Visual Studio 2017

v2.0 CLI (O3)

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=27 and max_depth=6
[LightGBM] [Info] Iteration:799, valid_1 binary_logloss : 0.0305089
[LightGBM] [Info] 645.962530 seconds elapsed, finished iteration 799
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=36 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0305101
[LightGBM] [Info] 646.770656 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

v2.0 R (O2)

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=13 and max_depth=6
[799]:	test's binary_logloss:0.0305097 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=20 and max_depth=6
[800]:	test's binary_logloss:0.0305099 
The function ran in 673568.532 milliseconds.
[1] 673568.5

master CLI

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=13 and max_depth=6
[LightGBM] [Info] Iteration:799, valid_1 binary_logloss : 0.0304543
[LightGBM] [Info] 587.834080 seconds elapsed, finished iteration 799
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0304542
[LightGBM] [Info] 588.445944 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

master R

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=21 and max_depth=6
[799]:	test's binary_logloss:0.0304331 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=23 and max_depth=6
[800]:	test's binary_logloss:0.0304333 
The function ran in 607209.395 milliseconds.
[1] 607209.4

master CLI Visual Studio 2017

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=6
[LightGBM] [Info] Iteration:799, valid_1 binary_logloss : 0.030431
[LightGBM] [Info] 615.327239 seconds elapsed, finished iteration 799
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=11 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0304305
[LightGBM] [Info] 615.863466 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

Laurae2 · 2017-06-01T19:41:57Z

2x 10 core CPU (Dual Xeon Ivy Bridge, 3.3/2.7 GHz), 40 threads:

master = microsoft/LightGBM@1d5867b

Algorithm	Time (s)	Threads
v2.0 R (O2)	244.542	40
master R (O2)	174.157	40
master CLI (O3)	164.336	40
master CLI (O3)	225.045	20
Visual Studio	139.214	40
Visual Studio	162.792	20

When using many cores, Visual Studio is significantly faster. But for MinGW, it is not good.

So far, master branch is really fast (but I don't have the wide gap you have). Core scaling is way better with the master branch.

v2.0 R (O2), CPU usage approx 25%

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=13 and max_depth=6
[799]:	test's binary_logloss:0.0305097 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=20 and max_depth=6
[800]:	test's binary_logloss:0.0305099 
The function ran in 244542.445 milliseconds.
[1] 244542.4

master R (O2): CPU usage approx 55%

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=14 and max_depth=6
[799]:	test's binary_logloss:0.0304216 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=24 and max_depth=6
[800]:	test's binary_logloss:0.0304225 
The function ran in 174157.502 milliseconds.
[1] 174157.5

master CLI (O3) with 40 threads: CPU usage approx 45%

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=13 and max_depth=6
[LightGBM] [Info] Iteration:799, valid_1 binary_logloss : 0.0304543
[LightGBM] [Info] 164.173974 seconds elapsed, finished iteration 799
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0304542
[LightGBM] [Info] 164.336424 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

master CLI Visual Studio 2017 with 40 threads: CPU usage approx 100%

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=6
[LightGBM] [Info] Iteration:799, valid_1 binary_logloss : 0.030431
[LightGBM] [Info] 139.108287 seconds elapsed, finished iteration 799
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=11 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0304305
[LightGBM] [Info] 139.214094 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

master CLI (O3) with 20 threads: CPU usage approx 33%

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=13 and max_depth=6
[LightGBM] [Info] Iteration:799, valid_1 binary_logloss : 0.0304543
[LightGBM] [Info] 224.811707 seconds elapsed, finished iteration 799
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0304542
[LightGBM] [Info] 225.045553 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

master CLI Visual Studio 2017 with 20 threads: CPU usage approx 50%

[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=6
[LightGBM] [Info] Iteration:799, valid_1 binary_logloss : 0.030431
[LightGBM] [Info] 162.647329 seconds elapsed, finished iteration 799
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=11 and max_depth=6
[LightGBM] [Info] Iteration:800, valid_1 binary_logloss : 0.0304305
[LightGBM] [Info] 162.792439 seconds elapsed, finished iteration 800
[LightGBM] [Info] Finished training

Laurae2 · 2017-06-10T07:54:23Z

@guolinke Can I run the long tests on microsoft/LightGBM@3089f0b with Visual Studio 2017, or do you have any specific commit you would like me to benchmark? My full xgboost runs (including exact method which take forever) are ending this week.

guolinke · 2017-06-10T07:56:21Z

@Laurae2
I think the latest or the microsoft/LightGBM@582ded5 is better.

Laurae2 · 2017-06-10T08:01:17Z

@guolinke I'll use microsoft/LightGBM@a8673bd (latest master branch) then.

Laurae2 · 2017-06-15T05:44:37Z

@guolinke I have one server which finished running my benchmarks. I'll repost here when I get time to create a dashboard you will be able to explore (probably tomorrow).

guolinke · 2017-06-15T05:50:25Z

@Laurae2
Okay, Thanks very much.

BTW, I find the speed of LightGBM in VM(Azure) will be about 2x-3x slower than "real" machine, in multi-threading, when using the same CPU.

Laurae2 · 2017-06-18T11:02:09Z

@guolinke Here for the new benchmarks, tested on i7-7700K and 20 core Xeon: https://sites.google.com/view/lauraepp/new-benchmarks

On VMs I noticed if the host machine is not rebooted frequently, the CPU performance sinks. I remember a server I did not reboot for 1 year whose performance was 50% of original performance. A simple reboot got it back to 100% (Cinebench R15 score of 350 instead of 700).

guolinke · 2017-06-18T11:12:11Z

@Laurae2
Thanks. Now the accuracy of LightGBM seems is much better.

Laurae2 · 2017-06-18T17:17:30Z

@guolinke I improved the chart layout if you want to check more in details, because I put too many charts on single pages. Now I separated them.

guolinke closed this as completed May 14, 2017

Laurae2 mentioned this issue May 28, 2017

Support learn the best direction for missing value microsoft/LightGBM#516

Merged

Laurae2 mentioned this issue Jun 1, 2017

Visual Studio reports higher CPU usage than MinGW microsoft/LightGBM#542

Closed

Laurae2 mentioned this issue Jun 3, 2017

How are the bins built? (related with missing representation) microsoft/LightGBM#583

Closed

some questions #1

some questions #1

Comments

guolinke commented May 14, 2017

Laurae2 commented May 14, 2017

Laurae2 commented May 14, 2017 • edited

guolinke commented May 14, 2017

Laurae2 commented May 14, 2017 • edited

guolinke commented May 26, 2017

Laurae2 commented May 26, 2017

guolinke commented May 26, 2017

guolinke commented May 26, 2017

Laurae2 commented May 26, 2017

guolinke commented May 26, 2017

guolinke commented May 26, 2017 • edited

Laurae2 commented May 26, 2017

guolinke commented May 26, 2017

Laurae2 commented May 26, 2017

guolinke commented May 26, 2017

guolinke commented May 26, 2017

guolinke commented May 26, 2017 • edited

Laurae2 commented May 26, 2017

guolinke commented May 26, 2017

Laurae2 commented May 26, 2017

guolinke commented May 26, 2017

Laurae2 commented May 26, 2017 • edited

guolinke commented May 26, 2017

Laurae2 commented May 26, 2017

guolinke commented May 27, 2017

guolinke commented May 27, 2017

Laurae2 commented May 28, 2017

guolinke commented May 29, 2017

guolinke commented May 29, 2017 • edited

Laurae2 commented May 29, 2017

guolinke commented May 31, 2017

Laurae2 commented May 31, 2017 • edited

Laurae2 commented May 31, 2017 • edited

Laurae2 commented Jun 1, 2017

Laurae2 commented Jun 10, 2017

guolinke commented Jun 10, 2017

Laurae2 commented Jun 10, 2017

Laurae2 commented Jun 15, 2017

guolinke commented Jun 15, 2017

Laurae2 commented Jun 18, 2017

guolinke commented Jun 18, 2017

Laurae2 commented Jun 18, 2017

Laurae2 commented May 14, 2017 •

edited

Laurae2 commented May 14, 2017 •

edited

guolinke commented May 26, 2017 •

edited

guolinke commented May 26, 2017 •

edited

Laurae2 commented May 26, 2017 •

edited

guolinke commented May 29, 2017 •

edited

Laurae2 commented May 31, 2017 •

edited

Laurae2 commented May 31, 2017 •

edited