Predictions from XGBClassifier with gblinear booster doesn't match manual calculations #5634

StrikerRUS · 2020-05-06T02:27:01Z

Hi there!

I'm trying to reproduce prediction results from simple dumped JSON model, but my calculations doesn't match results produced by estimator. I'll be very grateful if anyone point me to the problem in my script.

import json

import numpy as np
import xgboost as xgb

from scipy.special import expit
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

X, y = load_digits(2, True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

for bst_score in np.arange(0.1, 0.9, 0.1):
    for rnd_state in range(1, 10):
        clf = xgb.XGBClassifier(base_score=bst_score, n_estimators=10,
                                feature_selector="shuffle", booster="gblinear",
                                random_state=rnd_state)
        clf.fit(X_train, y_train)

        model_dump = clf.get_booster().get_dump(dump_format="json")

        weights = np.array(json.loads(model_dump[0])["weight"], dtype=np.float32)
        bias = np.array(json.loads(model_dump[0])["bias"][0], dtype=np.float32)
        base_score = np.array(clf.get_params()["base_score"], dtype=np.float32)

        base_score = -np.log(1.0 / base_score - 1.0, dtype=np.float32)

        for row_idx in range(X_test.shape[0]):
            y_proba = expit(np.dot(weights, X_test[row_idx, :]) + bias + base_score, dtype=np.float32)
            y_pred_manual = np.array([1 - y_proba, y_proba], dtype=np.float32)

            y_pred = clf.predict_proba(X_test[row_idx, :].reshape(1, -1))

            np.testing.assert_allclose(y_pred[0], y_pred_manual)

If I decrease precision in checks, then the snippet runs without errors.

np.testing.assert_allclose(y_pred[0], y_pred_manual, atol=1e-6)

I tried to compile the latest master with changes in gblinear_model.h similar to ones from #3298 and #3356, but it gave nothing. Script is still failing with precision error.

AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0

Mismatched elements: 1 / 2 (50%)
Max absolute difference: 1.4779289e-12
Max relative difference: 1.0299701e-06
 x: array([9.999986e-01, 1.434925e-06], dtype=float32)
 y: array([9.999986e-01, 1.434924e-06], dtype=float32)

The text was updated successfully, but these errors were encountered:

trivialfis · 2020-12-27T08:29:09Z

Hi, why is this closed?

StrikerRUS · 2020-12-28T03:59:32Z

Hi!

Unfortunately, this is not actual for me anymore. I'm not sure whether this problem is still presented in the actual XGBoost version and I won't be able to confirm a fix (if any). Do you think this issue should be kept open?

trivialfis · 2020-12-28T20:13:43Z

Let's be sure

StrikerRUS mentioned this issue May 6, 2020

fix links in comments BayesWitnesses/m2cgen#204

Merged

izeigerman mentioned this issue May 6, 2020

Flaky e2e test for the XGBoost model with the 'gblinear' booster BayesWitnesses/m2cgen#205

Open

StrikerRUS closed this as completed Dec 26, 2020

trivialfis reopened this Dec 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictions from XGBClassifier with gblinear booster doesn't match manual calculations #5634

Predictions from XGBClassifier with gblinear booster doesn't match manual calculations #5634

StrikerRUS commented May 6, 2020

trivialfis commented Dec 27, 2020

StrikerRUS commented Dec 28, 2020

trivialfis commented Dec 28, 2020

Predictions from XGBClassifier with gblinear booster doesn't match manual calculations #5634

Predictions from XGBClassifier with gblinear booster doesn't match manual calculations #5634

Comments

StrikerRUS commented May 6, 2020

trivialfis commented Dec 27, 2020

StrikerRUS commented Dec 28, 2020

trivialfis commented Dec 28, 2020