Effect of random seed ignored (not a multithreading issue)... #2636

dataforager · 2017-08-24T21:29:44Z

Follow-up on closed issue #113

A colleague of mine and I are trying to independently verify each other's prediction results using the same model parameters, training data and test data. He is using xgboost in R and I'm using xgboost in Python (though I have also tried the R version of the package on my machine for the purposes of this test).

What we've found is that we get the same predicted probabilities from the model on the test data using completely different seeds.

In R, seeds were set below either by passing the 'seed' param to xgb.train or by using R's set.seed() function.

We independently verified (by extracting value of .Random.seed) after calling set.seed() that seed was indeed being changed (it was).

nthread was set to 1 for both training/test runs with the exact same model parameters. Predicted probabilities appear below and are identical for two different seed values.

Predicted probabilities (seed=1):
0.4745588005
0.9879690409
0.5989014506
0.9906733632
0.5989014506
0.9928959012
0.1146880165
0.9928619266
0.9917168021
0.9958292842

Predicted probabilities (seed = 2):
0.4745588005
0.9879690409
0.5989014506
0.9906733632
0.5989014506
0.9928959012
0.1146880165
0.9928619266
0.9917168021
0.9958292842

In Python, we can confirm the same effect, though with different resulting predicted probabilities (another issue we think is related to the fact that we can't seem to control the random seed):

Again, we tried to either pass the seed as a parameter to xgb.train or set the random seed via numpy.random.seed() or python stdlib's random.seed(). And again, nthread was set to 1.

Predicted probabilities (seed = 1):
0.141121
0.98446
0.805141
0.949947
0.805141
0.979856
0.511622
0.990588
0.985136
0.987054

Predicted probabilities (seed = 2):
0.141121
0.98446
0.805141
0.949947
0.805141
0.979856
0.511622
0.990588
0.985136
0.987054

I've read previous posts related to this (including #113 mentioned above) which say this is a multithreading issue. However, since we see the exact same results when we set nthread to 1, we don't suspect this is the issue.

We'd appreciate any help we could get with this issue. Thanks a lot.

Pertinent system/package information can be found below.

Operating System: Ubuntu 16.04 LTS
Compiler: GNU gcc/g++ 5.4
Package used (python/R/jvm/C++): python and R
xgboost version used: 0.6a2

For Python:

version: 2.7.12
installation command: pip install xgboost

For R:

sessionInfo(): R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] xgboost_0.6-4

loaded via a namespace (and not attached):
[1] compiler_3.4.1 magrittr_1.5 Matrix_1.2-10 tools_3.4.1 stringi_1.1.5
[6] grid_3.4.1 data.table_1.10.4 lattice_0.20-35
2. installation command within R session: install.packages("xgboost",dependencies=TRUE)

The text was updated successfully, but these errors were encountered:

khotilov · 2017-08-25T14:42:53Z

Some parameter configurations are deterministic and some are random. You provided no details on what configuration you were running.

Laurae2 · 2017-08-25T16:47:25Z

@dataforager Did you use colsample_bytree, colsample_bylevel, or subsample? If not, you have the expected behavior.

dataforager · 2017-08-25T22:48:33Z

Thank you both. Double-checked our parameter settings and realized we had left these at defaults (which was 1.0). So, no surprise we got deterministic behavior (whoops).

Appreciate the quick responses. Will close this.

dataforager closed this as completed Aug 25, 2017

lock bot locked as resolved and limited conversation to collaborators Oct 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effect of random seed ignored (not a multithreading issue)... #2636

Effect of random seed ignored (not a multithreading issue)... #2636

dataforager commented Aug 24, 2017

khotilov commented Aug 25, 2017

Laurae2 commented Aug 25, 2017

dataforager commented Aug 25, 2017

Effect of random seed ignored (not a multithreading issue)... #2636

Effect of random seed ignored (not a multithreading issue)... #2636

Comments

dataforager commented Aug 24, 2017

khotilov commented Aug 25, 2017

Laurae2 commented Aug 25, 2017

dataforager commented Aug 25, 2017