New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threading and bagging: not reproducible #632

Closed
Laurae2 opened this Issue Jun 17, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@Laurae2
Collaborator

Laurae2 commented Jun 17, 2017

Environment info

Operating System: Windows 8.1 Pro
CPU: i7-4600U
R version: R 3.4 with VS compilation

Bagging leads to different results when using different number of threads. However, when the number of threads is fixed, the expected (equal) result is reproducible.

This cause major issues when trying to reproduce the results of a script when the number of threads is not specified.

Code for reproducible result:

for (i in 1:4) {
  
  for (j in 1:2) {
    lgb.unloader(wipe = TRUE)
    library(lightgbm)
    data(agaricus.train, package = "lightgbm")
    train <- agaricus.train
    dtrain <- lgb.Dataset(train$data, label = train$label)
    data(agaricus.test, package = "lightgbm")
    test <- agaricus.test
    dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
    params <- list(objective = "regression", metric = "l2")
    valids <- list(test = dtest)
    cat("\n\n\n--------------------- THREADS: ", i, " - RUN: ", j, " ---------------------\n\n", sep = "")
    model <- lgb.train(c(params, num_threads = i, bagging_seed = 1, bagging_freq = 1, bagging_fraction = 0.1),
                       dtrain,
                       5,
                       valids,
                       min_data = 1,
                       learning_rate = 1)
    
  }
  
}

Log:

--------------------- THREADS: 1 - RUN: 1 ---------------------

[LightGBM] [Info] Total Bins 137
[LightGBM] [Info] Number of data: 6513, number of used features: 116
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=10 and max_depth=5
[1]:	test's l2:0.000620732 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=4 and max_depth=3
[2]:	test's l2:0.00248293 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=5 and max_depth=3
[3]:	test's l2:0.0105525 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=30 and max_depth=12
[4]:	test's l2:0.0105525 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=11 and max_depth=6
[5]:	test's l2:0.0105525 



--------------------- THREADS: 1 - RUN: 2 ---------------------

[LightGBM] [Info] Total Bins 137
[LightGBM] [Info] Number of data: 6513, number of used features: 116
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=10 and max_depth=5
[1]:	test's l2:0.000620732 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=4 and max_depth=3
[2]:	test's l2:0.00248293 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=5 and max_depth=3
[3]:	test's l2:0.0105525 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=30 and max_depth=12
[4]:	test's l2:0.0105525 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=11 and max_depth=6
[5]:	test's l2:0.0105525 



--------------------- THREADS: 2 - RUN: 1 ---------------------

[LightGBM] [Info] Total Bins 137
[LightGBM] [Info] Number of data: 6513, number of used features: 116
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=17 and max_depth=7
[1]:	test's l2:0.00682806 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=8 and max_depth=4
[2]:	test's l2:0.00558659 
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=10
[3]:	test's l2:0.00931099 
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=12
[4]:	test's l2:0.0130354 
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=12
[5]:	test's l2:0.00993172 



--------------------- THREADS: 2 - RUN: 2 ---------------------

[LightGBM] [Info] Total Bins 137
[LightGBM] [Info] Number of data: 6513, number of used features: 116
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=17 and max_depth=7
[1]:	test's l2:0.00682806 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=8 and max_depth=4
[2]:	test's l2:0.00558659 
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=10
[3]:	test's l2:0.00931099 
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=12
[4]:	test's l2:0.0130354 
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=12
[5]:	test's l2:0.00993172 



--------------------- THREADS: 3 - RUN: 1 ---------------------

[LightGBM] [Info] Total Bins 137
[LightGBM] [Info] Number of data: 6513, number of used features: 116
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=14 and max_depth=6
[1]:	test's l2:0.00248293 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=7 and max_depth=5
[2]:	test's l2:0.00434513 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=4 and max_depth=2
[3]:	test's l2:0.00248293 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=7 and max_depth=4
[4]:	test's l2:0.00806952 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=8 and max_depth=5
[5]:	test's l2:0.00372439 



--------------------- THREADS: 3 - RUN: 2 ---------------------

[LightGBM] [Info] Total Bins 137
[LightGBM] [Info] Number of data: 6513, number of used features: 116
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=14 and max_depth=6
[1]:	test's l2:0.00248293 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=7 and max_depth=5
[2]:	test's l2:0.00434513 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=4 and max_depth=2
[3]:	test's l2:0.00248293 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=7 and max_depth=4
[4]:	test's l2:0.00806952 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=8 and max_depth=5
[5]:	test's l2:0.00372439 



--------------------- THREADS: 4 - RUN: 1 ---------------------

[LightGBM] [Info] Total Bins 137
[LightGBM] [Info] Number of data: 6513, number of used features: 116
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=15 and max_depth=6
[1]:	test's l2:6.44165e-17 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=5 and max_depth=3
[2]:	test's l2:1.93798e-31 
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=15
[3]:	test's l2:0.0018622 
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=11
[4]:	test's l2:4.80048e-34 
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=12
[5]:	test's l2:1.18812e-33 



--------------------- THREADS: 4 - RUN: 2 ---------------------

[LightGBM] [Info] Total Bins 137
[LightGBM] [Info] Number of data: 6513, number of used features: 116
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=15 and max_depth=6
[1]:	test's l2:6.44165e-17 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=5 and max_depth=3
[2]:	test's l2:1.93798e-31 
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=15
[3]:	test's l2:0.0018622 
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=11
[4]:	test's l2:4.80048e-34 
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=12
[5]:	test's l2:1.18812e-33
@guolinke

This comment has been minimized.

Member

guolinke commented Jun 17, 2017

This is expected. We use multi threading to speed up the bagging, and each thread bags its own data.

@guolinke

This comment has been minimized.

Member

guolinke commented Jun 17, 2017

But some test accuracy of your logs seems very strange

@Laurae2

This comment has been minimized.

Collaborator

Laurae2 commented Jun 17, 2017

@guolinke This is 10% subsampling on agaricus dataset. I expect to get bad results without many rounds, here I limited it to 5 to get readable results.

@guolinke

This comment has been minimized.

Member

guolinke commented Jun 17, 2017

@Laurae2 I think we can add a warning for the reproducible .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment