prediction different in v1.2.1 and master branch #6350

7starsea · 2020-11-06T00:58:11Z

I am using the released version 1.2.1 for development due to its easy installation in python and I also write a simple c code for prediction using the c_api. If linked to v1.2.1 libxgboost.so, the difference of the predictions between python and c is exactly zero. However, if linked to libxgboost.so from the master branch (commit f3a4253 at Nov. 5, 2020), there is a difference.

I would like to deploy the c code in real system using the master branch since I would like to build a static lib, and now the difference of the predictions between v1.2.1 and master branch hinders me.

Thanks.

The text was updated successfully, but these errors were encountered:

hcho3 · 2020-11-06T01:00:51Z

Can you post an example program so that we can reproduce the bug?

hcho3 · 2020-11-06T01:02:31Z

Also, do you see different predictions from the master version of Python package and the C API?

trivialfis · 2020-11-06T01:58:43Z

There are some optimization done on the CPU predictor, might generate different result by different floating point error. But yeah, do you have a reproducible example?

7starsea · 2020-11-06T03:17:19Z

Also, do you see different predictions from the master version of Python package and the C API?

I am comparing the v1.2.1 of python package and the master branch of C API.

hcho3 · 2020-11-06T03:19:11Z

@7starsea Can you also compare the outputs from the Python and C API, both from the master branch? The issue could be the way C API functions are used in your application.

7starsea · 2020-11-06T03:36:30Z

@7starsea Can you also compare the outputs from the Python and C API, both from the master branch? The issue could be the way C API functions are used in your application.

I compared the python and c api, both from v1.2.1 and the predictions are exactly the same.

hcho3 · 2020-11-06T03:38:00Z

@7starsea Got it. If you post both Python and C programs that predict from the same model, we'd be able to troubleshoot the issue further.

7starsea · 2020-11-06T03:49:22Z

@hcho3 here is the testing code

https://github.com/7starsea/xgboost-testing

hcho3 · 2020-11-12T01:40:09Z

@7starsea I just tried your example and got the following output:

difference: [0. 0. 0. 0.] 0.0 0.0

I used the latest commit of XGBoost (debeae2).

7starsea · 2020-11-12T07:03:22Z

@7starsea I just tried your example and got the following output:
difference: [0. 0. 0. 0.] 0.0 0.0
I used the latest commit of XGBoost (debeae2).

Interesting, is your python version v1.2.1?

I still have some difference between python-version 1.2.1 and c api linked to XGBoost (debeae2).

hcho3 · 2020-11-12T07:04:46Z

@7starsea No, I compiled XGBoost from the latest source (commit debeae2), so it's more recent than v1.2.1. My XGBoost Python package prints 1.3.0-SNAPSHOT for xgboost.__version__ field.

7starsea · 2020-11-12T07:14:11Z

@hcho3 I am wondering should XGBoost keep consistent prediction between different versions (at least consecutive versions) ?
Also expecting the release of v1.3.0.
(seems I need to train the model using the master branch now)
Thanks for your time.

hcho3 · 2020-11-12T07:16:11Z

@7starsea If you load a saved model from a previous version, you should be able to obtain the consistent prediction.

I wasn't able to reproduce the issue using your script. Can you try building a Docker image or a VM image and share it with me?

hcho3 · 2020-11-12T07:20:36Z

@7starsea FYI, I also tried building XGBoost 1.2.1 from the source, as follows:

git clone --recursive https://github.com/dmlc/xgboost -b release_1.2.0 xgb_source
cd xgb_source
mkdir build
cd build
cmake ..
make
cd ../python-package
python setup.py install

The results again show difference: [0. 0. 0. 0.] 0.0 0.0

7starsea · 2020-11-12T07:28:09Z

To see the difference, you need two versions of XGBoost, v1.2.1 for python and

dtest = xgb.DMatrix(rx, missing=0.0)
y1 = m2.predict(dtest)   # # internally using libxgboost.so  v1.2.1

and one for cpp

m1 = XgbShannonPredictor(fname)
y2 = m1.predict(rx2)   # # internally using libxgboost.so from the master branch (debeae2)

I will try to build a docker image (which is new to me).

trivialfis · 2020-11-12T07:29:38Z

Let me take a look.

trivialfis · 2020-11-12T07:30:03Z

I will try to build a docker image (which is new to me).

Not necessary.

hcho3 · 2020-11-12T07:34:05Z

Actually, I managed to reproduce the problem. It turns out that the dev version of XGBoost produces a different prediction than XGBoost 1.2.0. And the problem is simple to reproduce; no need to use the C API.

Reproducible example (EDIT: setting the random seed):

import numpy as np
import xgboost as xgb

rng = np.random.default_rng(seed=2020)
rx = rng.standard_normal(size=(100, 127 + 7 + 1))
rx = rx.astype(np.float32, order='C')

m2 = xgb.Booster({'nthread': '4'})  # init model
m2.load_model('xgb.model.bin')  # load data

dtest = xgb.DMatrix(rx, missing=0.0)
y1 = m2.predict(dtest)
print(xgb.__version__)
print(y1)

Output from 1.2.0:

1.2.0
[ 0.00698659 -0.00211251  0.00180039 -0.00016004  0.00526169  0.00801963
  0.00016755  0.00226218  0.00276762  0.00408182  0.00303206  0.00291929
  0.01101092  0.0068329   0.00145864  0.00326979  0.00572816  0.01019934
  0.00074345  0.00784767  0.00173795 -0.00219297  0.0060181   0.00606489
  0.00447372  0.00103396  0.00932363  0.00230178  0.00389203  0.00151157
  0.0034163   0.00821933  0.006686    0.00630778  0.00331488  0.00775066
  0.00443819  0.01030204  0.00924486  0.00645933  0.00777653  0.00231206
  0.00457835  0.00390425  0.00947028  0.00410065  0.00220913  0.00292507
  0.00637993  0.00796807  0.00140873  0.00887537  0.00496858  0.01049942
  0.00908098  0.00332722  0.00799242  0.00228494  0.00463879  0.00213429
  0.00729388  0.01049232  0.00790522  0.01269361 -0.00425893  0.00256333
  0.00859573  0.00472835  0.00077197  0.00191873  0.01546788  0.0014475
  0.00888193  0.00648022  0.00115797  0.00351191  0.00580138  0.00614035
  0.00632426  0.00408354  0.00346044 -0.00034332  0.00599384  0.00302595
  0.00657633  0.01086903  0.00625807  0.00096565  0.00061804  0.00038511
  0.00523874  0.00633043  0.00379965  0.00302553 -0.00123322  0.00153473
  0.00725579  0.00836438  0.01295918  0.00737873]

Note. running the script with XGBoost 1.0.0 and 1.1.0 results in the identical output as 1.2.0.

Output from the dev version (c564518)

1.3.0-SNAPSHOT
[ 0.00698666 -0.00211278  0.00180034 -0.00016027  0.00526194  0.00801962
  0.00016758  0.00226211  0.00276773  0.00408198  0.00303223  0.00291933
  0.01101091  0.00683288  0.00145871  0.00326988  0.00572827  0.01019943
  0.00074329  0.00784767  0.00173803 -0.00219286  0.00601804  0.00606472
  0.00447388  0.00103391  0.00932358  0.00230171  0.003892    0.00151177
  0.00341637  0.00821943  0.00668607  0.00630774  0.00331502  0.00775074
  0.0044381   0.01030211  0.00924495  0.00645958  0.00777672  0.00231205
  0.00457842  0.00390424  0.00947046  0.00410091  0.0022092   0.00292498
  0.00638005  0.00796804  0.00140869  0.00887531  0.00496863  0.01049942
  0.00908096  0.00332738  0.00799218  0.00228496  0.004639    0.00213413
  0.00729368  0.01049243  0.00790528  0.01269368 -0.00425872  0.00256319
  0.00859569  0.00472848  0.0007721   0.00191874  0.01546813  0.00144742
  0.00888212  0.00648021  0.00115819  0.00351191  0.00580168  0.00614044
  0.00632418  0.0040833   0.00346038 -0.00034315  0.00599405  0.00302578
  0.0065765   0.01086897  0.00625799  0.00096572  0.00061766  0.00038494
  0.00523901  0.00633054  0.00379964  0.00302567 -0.00123339  0.00153471
  0.00725584  0.00836433  0.01295913  0.00737863]

trivialfis · 2020-11-12T07:37:29Z

@hcho3 Do you want to look into it? I can help bisecting if needed.

hcho3 · 2020-11-12T07:38:06Z

Hold a sec, I forgot to set the random seed in my repro. Silly me.

hcho3 · 2020-11-12T07:47:41Z

I updated my repro with the fixed random seed. The bug still persists. I tried running the updated repro with XGBoost 1.0.0 and 1.1.0, and the predictions agree with the prediction from XGBoost 1.2.0.

In short:

Prediction from 1.0.0
==  Prediction from 1.1.0
==  Prediction from 1.2.0
!=  Prediction from latest master

@trivialfis Yes, your help will be appreciated.

hcho3 · 2020-11-12T07:48:28Z

Marking this as blocking.

trivialfis · 2020-11-12T07:52:09Z

Got it.

trivialfis · 2020-11-12T08:02:23Z

Traced to a4ce0ea . @ShvetsKS Would you like to take a look?

ShvetsKS · 2020-11-12T08:34:09Z

Traced to a4ce0ea . @ShvetsKS Would you like to take a look?

Sure. Could you help with training the model from python reproducer :
m2.load_model('xgb.model.bin') # load data
What XGBoost version is used for training and which parameters should be provided exactly?

hcho3 · 2020-11-12T08:35:54Z

@ShvetsKS You can obtain the model file xgb.model.bin from https://github.com/7starsea/xgboost-testing. The model was trained with 1.0.0.

7starsea · 2020-11-12T09:14:33Z

@ShvetsKS You can obtain the model file xgb.model.bin from https://github.com/7starsea/xgboost-testing. The model was trained with 1.0.0.

the model was actually trained with 1.2.1 and parameters

 param = {'max_depth': 8, 'eta': 0.1, 'min_child_weight': 2, 'gamma': 1e-8, 'subsample': 0.6, 'nthread': 4}

Thanks.

ShvetsKS · 2020-11-12T12:40:29Z

Seems the small difference is due to changed sequence of floating point operation.
Exact reason:
Before a4ce0ea we increment all trees responses into local variable psum (initially equal to zero) and than increment appropriate value from out_preds.
In a4ce0ea we increment out_preds values directly by each tree response.

Fix is prepared: #6384

@7starsea thanks for finding the difference, could you check the fix above?

@hcho3, @trivialfis Do we consider such difference as critical in future? Seems it's a significant restriction not allow change the sequence of floating point operations for inference. But for training stage there is no such requirement as I remember.

trivialfis · 2020-11-12T12:54:19Z

Do we consider such difference as critical in future?

Usually no. Let me take a look into your changes. ;-)

7starsea · 2020-11-12T13:00:28Z

@ShvetsKS I just checked and the difference is exactly zero now. Thanks for fixing the predict difference.

hcho3 · 2020-11-13T09:59:20Z

@ShvetsKS

Do we consider such difference as critical in future? Seems it's a significant restriction not allow change the sequence of floating point operations for inference

Indeed, we (@RAMitchell, @trivialfis, and I) agree with you here. Mandating exact reproducibility of prediction will severely hamper our ability to make changes. Floating-point arithmetic is famously non-associative, so the sum of a list of number will be slightly different depending on the order of addition.

I have run an experiment to quantify how much the prediction changes between XGBoost 1.2.0 and the latest master branch:

I generated data with 1000 different random seeds and then ran prediction with the 1000 matrices, using both versions 1.2.0 and master. The change in prediction changes between seeds slightly, but the difference is never more than 9.2e-7, so most likely the prediction change was caused by floating-point arithmetic and not a logic error.

Script for experiment

test.py: Generate 1000 matrices with different random seeds and run prediction for them.

import numpy as np
import xgboost as xgb
import argparse

def main(args):
    m2 = xgb.Booster({'nthread': '4'})  # init model
    m2.load_model('xgb.model.bin')  # load data
    out = {}
    for seed in range(1000):
        rng = np.random.default_rng(seed=seed)
        rx = rng.standard_normal(size=(100, 127 + 7 + 1))
        rx = rx.astype(np.float32, order='C')
        dtest = xgb.DMatrix(rx, missing=0.0)
        out[str(seed)] = m2.predict(dtest)
    np.savez(args.out_pred, **out)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--out-pred', type=str, required=True)
    args = parser.parse_args()
    main(args)

Command: python test.py --out-pred [out.npz]. Make sure that your Python env has the correct version of XGBoost. Let us assume that xgb120.npz stores the result for XGBoost 1.2.0 and xgblatest.npz stores the result for the latest master.

compare.py: Make a histogram plot for prediction difference

import numpy as np
import matplotlib.pyplot as plt

xgb120 = np.load('xgb120.npz')
xgblatest = np.load('xgblatest.npz')

percentile_pts = [50, 90, 99]

colors = ['tab:cyan', 'tab:olive', 'tab:green', 'tab:pink']

percentile = {}
for x in percentile_pts:
    percentile[x] = []
percentile['max'] = []

for seed in range(1000):
    diff = np.abs(xgb120[str(seed)] - xgblatest[str(seed)])
    t = np.percentile(diff, percentile_pts)
    for x, y in zip(percentile_pts, t):
        percentile[x].append(y)
    percentile['max'].append(np.max(diff))

bins = np.linspace(0, np.max(percentile['max']), 100)
idx = 0
for x in percentile_pts:
    plt.hist(percentile[x], label=f'Percentile {x}%', bins=bins, alpha=0.8, color=colors[idx])
    idx += 1
plt.hist(percentile['max'], label='Max', bins=bins, alpha=0.8, color=colors[idx])
plt.legend(loc='best')
plt.title('Distribution in prediction difference between XGBoost 1.2.0\nand master branch, tried over seed=[0..1000]')
plt.xlabel('Absolute difference')
plt.ylabel('Frequency')
plt.savefig('foobar.png', dpi=100)

trivialfis · 2020-11-13T10:16:23Z

Since here the problem is the + with float doesn't form a group, we can test with sum removed: Predict on a single tree. The result should be exactly the same.

hcho3 · 2020-11-13T10:22:37Z

@trivialfis Indeed, when I added ntree_limit=1 argument to m2.predict(), the difference vanishes to 0.

trivialfis · 2020-11-13T10:25:13Z

Great! So next thing is how do we document it or whether should we document it.

hcho3 · 2020-11-13T10:26:47Z

Let me sleep on it. For now, suffice to say that this issue is not really a bug.

trivialfis added the Blocking label Nov 11, 2020

hcho3 self-assigned this Nov 12, 2020

hcho3 added status: need update and removed Blocking labels Nov 12, 2020

hcho3 added Blocking and removed status: need update labels Nov 12, 2020

ShvetsKS mentioned this issue Nov 12, 2020

Fix predict difference #6384

Closed

hcho3 added wontfix and removed Blocking labels Nov 13, 2020

hcho3 closed this as completed Dec 16, 2020

trivialfis mentioned this issue Feb 4, 2021

Is there any way to get the same results in different versions? #6679

Closed

valenad1 mentioned this issue Apr 28, 2023

Change order of the floating operation to match prediction in native xgboost version 1.3.0 and newer h2oai/xgboost-predictor#27

Merged

prediction different in v1.2.1 and master branch #6350

prediction different in v1.2.1 and master branch #6350

Comments

7starsea commented Nov 6, 2020

hcho3 commented Nov 6, 2020

hcho3 commented Nov 6, 2020

trivialfis commented Nov 6, 2020

7starsea commented Nov 6, 2020

hcho3 commented Nov 6, 2020

7starsea commented Nov 6, 2020

hcho3 commented Nov 6, 2020

7starsea commented Nov 6, 2020

hcho3 commented Nov 12, 2020

7starsea commented Nov 12, 2020

hcho3 commented Nov 12, 2020 • edited Loading

7starsea commented Nov 12, 2020

hcho3 commented Nov 12, 2020

hcho3 commented Nov 12, 2020 • edited Loading

7starsea commented Nov 12, 2020 • edited Loading

trivialfis commented Nov 12, 2020

trivialfis commented Nov 12, 2020

hcho3 commented Nov 12, 2020 • edited Loading

trivialfis commented Nov 12, 2020

hcho3 commented Nov 12, 2020

hcho3 commented Nov 12, 2020

hcho3 commented Nov 12, 2020

trivialfis commented Nov 12, 2020

trivialfis commented Nov 12, 2020

ShvetsKS commented Nov 12, 2020

hcho3 commented Nov 12, 2020 • edited Loading

7starsea commented Nov 12, 2020

ShvetsKS commented Nov 12, 2020 • edited Loading

trivialfis commented Nov 12, 2020

7starsea commented Nov 12, 2020

hcho3 commented Nov 13, 2020 • edited Loading

trivialfis commented Nov 13, 2020

hcho3 commented Nov 13, 2020

trivialfis commented Nov 13, 2020

hcho3 commented Nov 13, 2020

hcho3 commented Nov 12, 2020 •

edited

Loading

hcho3 commented Nov 12, 2020 •

edited

Loading

7starsea commented Nov 12, 2020 •

edited

Loading

hcho3 commented Nov 12, 2020 •

edited

Loading

hcho3 commented Nov 12, 2020 •

edited

Loading

ShvetsKS commented Nov 12, 2020 •

edited

Loading

hcho3 commented Nov 13, 2020 •

edited

Loading