You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A colleague of mine and I are trying to independently verify each other's prediction results using the same model parameters, training data and test data. He is using xgboost in R and I'm using xgboost in Python (though I have also tried the R version of the package on my machine for the purposes of this test).
What we've found is that we get the same predicted probabilities from the model on the test data using completely different seeds.
In R, seeds were set below either by passing the 'seed' param to xgb.train or by using R's set.seed() function.
We independently verified (by extracting value of .Random.seed) after calling set.seed() that seed was indeed being changed (it was).
nthread was set to 1 for both training/test runs with the exact same model parameters. Predicted probabilities appear below and are identical for two different seed values.
In Python, we can confirm the same effect, though with different resulting predicted probabilities (another issue we think is related to the fact that we can't seem to control the random seed):
Again, we tried to either pass the seed as a parameter to xgb.train or set the random seed via numpy.random.seed() or python stdlib's random.seed(). And again, nthread was set to 1.
I've read previous posts related to this (including #113 mentioned above) which say this is a multithreading issue. However, since we see the exact same results when we set nthread to 1, we don't suspect this is the issue.
We'd appreciate any help we could get with this issue. Thanks a lot.
Pertinent system/package information can be found below.
Operating System: Ubuntu 16.04 LTS
Compiler: GNU gcc/g++ 5.4
Package used (python/R/jvm/C++): python and R xgboost version used: 0.6a2
For Python:
version: 2.7.12
installation command: pip install xgboost
For R:
sessionInfo(): R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xgboost_0.6-4
loaded via a namespace (and not attached):
[1] compiler_3.4.1 magrittr_1.5 Matrix_1.2-10 tools_3.4.1 stringi_1.1.5
[6] grid_3.4.1 data.table_1.10.4 lattice_0.20-35
2. installation command within R session: install.packages("xgboost",dependencies=TRUE)
The text was updated successfully, but these errors were encountered:
Thank you both. Double-checked our parameter settings and realized we had left these at defaults (which was 1.0). So, no surprise we got deterministic behavior (whoops).
Follow-up on closed issue #113
A colleague of mine and I are trying to independently verify each other's prediction results using the same model parameters, training data and test data. He is using xgboost in R and I'm using xgboost in Python (though I have also tried the R version of the package on my machine for the purposes of this test).
What we've found is that we get the same predicted probabilities from the model on the test data using completely different seeds.
In R, seeds were set below either by passing the 'seed' param to xgb.train or by using R's set.seed() function.
We independently verified (by extracting value of .Random.seed) after calling set.seed() that seed was indeed being changed (it was).
nthread was set to 1 for both training/test runs with the exact same model parameters. Predicted probabilities appear below and are identical for two different seed values.
Predicted probabilities (seed=1):
0.4745588005
0.9879690409
0.5989014506
0.9906733632
0.5989014506
0.9928959012
0.1146880165
0.9928619266
0.9917168021
0.9958292842
Predicted probabilities (seed = 2):
0.4745588005
0.9879690409
0.5989014506
0.9906733632
0.5989014506
0.9928959012
0.1146880165
0.9928619266
0.9917168021
0.9958292842
In Python, we can confirm the same effect, though with different resulting predicted probabilities (another issue we think is related to the fact that we can't seem to control the random seed):
Again, we tried to either pass the seed as a parameter to xgb.train or set the random seed via numpy.random.seed() or python stdlib's random.seed(). And again, nthread was set to 1.
Predicted probabilities (seed = 1):
0.141121
0.98446
0.805141
0.949947
0.805141
0.979856
0.511622
0.990588
0.985136
0.987054
Predicted probabilities (seed = 2):
0.141121
0.98446
0.805141
0.949947
0.805141
0.979856
0.511622
0.990588
0.985136
0.987054
I've read previous posts related to this (including #113 mentioned above) which say this is a multithreading issue. However, since we see the exact same results when we set nthread to 1, we don't suspect this is the issue.
We'd appreciate any help we could get with this issue. Thanks a lot.
Pertinent system/package information can be found below.
Operating System: Ubuntu 16.04 LTS
Compiler: GNU gcc/g++ 5.4
Package used (python/R/jvm/C++): python and R
xgboost
version used: 0.6a2For Python:
For R:
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xgboost_0.6-4
loaded via a namespace (and not attached):
[1] compiler_3.4.1 magrittr_1.5 Matrix_1.2-10 tools_3.4.1 stringi_1.1.5
[6] grid_3.4.1 data.table_1.10.4 lattice_0.20-35
2. installation command within R session: install.packages("xgboost",dependencies=TRUE)
The text was updated successfully, but these errors were encountered: