Random Forest predict() does not produce reproducible results. random_state=42 #28920

aedavids · 2024-04-30T19:15:42Z

Describe the bug

If I load my pre trained model and set of samples and call predict() multiple times I get different predicted classes. Here are some sample results. I am using a juypter notebook. I have tried restarting the kernal multiple times and also just re-running the cell multiple times

auc: {0: 0.476, 1: 0.524} pred: [0 0 0 1 1 1 1 1 1 1 0 0 1 0 0 0 0 0 1]
auc: {0: 0.613, 1: 0.387} pred: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1]
auc: {0: 0.762, 1: 0.238} pred: [1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0]
auc: {0: 0.589, 1: 0.411} pred: [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

I have a random forest I trained with the following parameters

RandomForestClassifier(max_depth=7, max_features=1, max_samples=0.9,
                       n_estimators=50, random_state=42)

The model was save using joblib. I load the model as follows

model = joblib.load(modelPath)

I make predictions as follow

predictions  = model.predict(XNP)

yProbability = model.predict_proba(XNP)

yNP:
[0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1]

XNP = np.array([[ 16,   9,   0,   0,   5,   0, 104,   1,   1,   1],
           [ 19,   4,   0,   0,   4,   0,  96,   0,   2,   0],
           [ 14,   7,   0,   0,   5,   0,  72,   0,   2,   0],
           [ 29,   5,   0,   0,  11,   0, 108,   0,   1,   0],
           [ 16,   9,   0,   0,   6,   0,  80,   0,   1,   1],
           [ 49,  13,   0,   0,  20,   0, 198,   0,   5,   2],
           [ 45,   7,   0,   0,   7,   0, 163,   0,   1,   1],
           [ 47,  13,   0,   1,  10,   0, 229,   0,   4,   1],
           [ 17,  21,   0,   0,   2,   0,  61,   0,   5,   0],
           [ 56,  15,   0,   0,  12,   0, 362,   0,   4,   1],
           [ 14,   7,   0,   0,   8,   0, 113,   0,   1,   0],
           [  5,   3,   0,   0,   1,   0,  49,   0,   0,   0],
           [ 23,   7,   0,   0,   8,   0,  92,   0,   2,   0],
           [ 15,  12,   0,   0,   3,   0, 119,   0,   0,   1],
           [ 18,   4,   0,   0,   1,   0, 133,   0,   0,   0],
           [ 13,   3,   0,   0,   4,   0, 126,   0,   0,   0],
           [ 20,   3,   0,   0,   5,   0, 161,   0,   0,   0],
           [ 15,   6,   0,   0,   4,   0, 163,   0,   0,   0],
           [ 23,   4,   0,   0,   8,   0, 127,   0,   0,   2]])

I have tried setting calling random.seed()

Any suggestions would be greatly apreciated.

p.s.
When I trained I save the label encoder and load as follows. (This was to insure the class number match the class names)

def encoder2Dict(encoder : LabelEncoder) -> dict  :
    '''
    key is class
    value is int
    '''
    values = encoder.transform(encoder.classes_)
    retDict = dict(zip(encoder.classes_, values))
    return retDict

def loadEncoder(path: str) -> LabelEncoder:
    '''
    arguments:
        path: file containing labelEncoder values saved as a dictionary
    '''
    encoder = LabelEncoder()
    encoderDict = loadDictionary(path)

    # Manually assign the sorted list of class labels to the classes_ attribute
    # The keys of the dictionary are sorted according to their corresponding values
    # dictionary.get(key) returns the value value
    encoder.classes_ = np.array(sorted(encoderDict, key=encoderDict.get))

    return encoder

I can make my trained model avaliable

Steps/Code to Reproduce

predictions = model.predict(XNP)

yProbability = model.predict_proba(XNP)

Expected Results

predict(X) == predict(X)

Actual Results

auc: {0: 0.476, 1: 0.524} pred: [0 0 0 1 1 1 1 1 1 1 0 0 1 0 0 0 0 0 1]
auc: {0: 0.613, 1: 0.387} pred: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1]
auc: {0: 0.762, 1: 0.238} pred: [1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0]
auc: {0: 0.589, 1: 0.411} pred: [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

Versions

System:
    python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0]
executable: /private/home/aedavids/miniconda3/envs/POC/bin/python
   machine: Linux-5.15.0-89-generic-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.4.0
          pip: 23.3.1
   setuptools: 68.2.2
        numpy: 1.26.3
        scipy: 1.11.4
       Cython: None
       pandas: 2.2.0
   matplotlib: 3.7.1
       joblib: 1.4.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /private/home/aedavids/miniconda3/envs/extraCellularRNA/lib/libopenblasp-r0.3.27.so
        version: 0.3.27
threading_layer: pthreads
   architecture: Haswell
    num_threads: 128

       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /private/home/aedavids/miniconda3/envs/extraCellularRNA/lib/libgomp.so.1.0.0
        version: None
    num_threads: 160

$ conda list scikit-learn
# packages in environment at /private/home/aedavids/miniconda3/envs/extraCellularRNA:
#
# Name                    Version                   Build  Channel
scikit-learn              1.4.0           py311hc009520_0    conda-forge


$ python --version
Python 3.11.4

The text was updated successfully, but these errors were encountered:

glemaitre · 2024-05-01T06:23:41Z

We will need the data to understand what is the reason but I suspect that the issue is linked to random tie breaking.

betatim · 2024-05-02T12:08:19Z

Please also provide a short code snippet that we can copy&paste to reproduce the problem. From reading your original comment it sounds like you are using more than just a RandomForestClassifier. Having a full snippet from start to finish makes sure we are all debugging the same thing.

aedavids · 2024-05-06T17:53:59Z

Hi All

I am in the process of creating test code I can post. I have narrowed it down a bit. The problem happens in my jupyter notebook. If I run the predict cell multiple times I get the same results. If I restart the notebook I will get different results from the first run

I wrote a small py script. I can not reproduce the error when I run from the terminal.

I going to try and and figure out how I can isolate the problem in my Notebook. I will post the test notebook

Hopefully I can upload a zip file with the test code and my trained model

Kind regards

Andy

aedavids added Bug Needs Triage Issue requires triage labels Apr 30, 2024

glemaitre added Needs Reproducible Code Issue requires reproducible code Needs Investigation Issue requires investigation and removed Bug Needs Triage Issue requires triage labels May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random Forest predict() does not produce reproducible results. random_state=42 #28920

Random Forest predict() does not produce reproducible results. random_state=42 #28920

aedavids commented Apr 30, 2024

glemaitre commented May 1, 2024

betatim commented May 2, 2024

aedavids commented May 6, 2024 •

edited

Random Forest predict() does not produce reproducible results. random_state=42 #28920

Random Forest predict() does not produce reproducible results. random_state=42 #28920

Comments

aedavids commented Apr 30, 2024

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

glemaitre commented May 1, 2024

betatim commented May 2, 2024

aedavids commented May 6, 2024 • edited

aedavids commented May 6, 2024 •

edited