Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predictions from XGBClassifier with gblinear booster doesn't match manual calculations #5634

Open
StrikerRUS opened this issue May 6, 2020 · 3 comments

Comments

@StrikerRUS
Copy link
Contributor

Hi there!

I'm trying to reproduce prediction results from simple dumped JSON model, but my calculations doesn't match results produced by estimator. I'll be very grateful if anyone point me to the problem in my script.

import json

import numpy as np
import xgboost as xgb

from scipy.special import expit
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

X, y = load_digits(2, True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

for bst_score in np.arange(0.1, 0.9, 0.1):
    for rnd_state in range(1, 10):
        clf = xgb.XGBClassifier(base_score=bst_score, n_estimators=10,
                                feature_selector="shuffle", booster="gblinear",
                                random_state=rnd_state)
        clf.fit(X_train, y_train)

        model_dump = clf.get_booster().get_dump(dump_format="json")

        weights = np.array(json.loads(model_dump[0])["weight"], dtype=np.float32)
        bias = np.array(json.loads(model_dump[0])["bias"][0], dtype=np.float32)
        base_score = np.array(clf.get_params()["base_score"], dtype=np.float32)

        base_score = -np.log(1.0 / base_score - 1.0, dtype=np.float32)

        for row_idx in range(X_test.shape[0]):
            y_proba = expit(np.dot(weights, X_test[row_idx, :]) + bias + base_score, dtype=np.float32)
            y_pred_manual = np.array([1 - y_proba, y_proba], dtype=np.float32)

            y_pred = clf.predict_proba(X_test[row_idx, :].reshape(1, -1))

            np.testing.assert_allclose(y_pred[0], y_pred_manual)

If I decrease precision in checks, then the snippet runs without errors.

np.testing.assert_allclose(y_pred[0], y_pred_manual, atol=1e-6)

I tried to compile the latest master with changes in gblinear_model.h similar to ones from #3298 and #3356, but it gave nothing. Script is still failing with precision error.

AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0

Mismatched elements: 1 / 2 (50%)
Max absolute difference: 1.4779289e-12
Max relative difference: 1.0299701e-06
 x: array([9.999986e-01, 1.434925e-06], dtype=float32)
 y: array([9.999986e-01, 1.434924e-06], dtype=float32)
@trivialfis
Copy link
Member

Hi, why is this closed?

@StrikerRUS
Copy link
Contributor Author

Hi!

Unfortunately, this is not actual for me anymore. I'm not sure whether this problem is still presented in the actual XGBoost version and I won't be able to confirm a fix (if any). Do you think this issue should be kept open?

@trivialfis trivialfis reopened this Dec 28, 2020
@trivialfis
Copy link
Member

Let's be sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants