Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different behaviour on python-exported model and C++ API predictor #2549

Closed
mkornaukhov03 opened this issue Dec 4, 2023 · 4 comments
Closed

Comments

@mkornaukhov03
Copy link

Different behaviour on python-exported model and C++ API predictor

Catboost version: 1.2.2
CPU: Intel Core i5-8550U
OS: Ubuntu [20.04]

I have the trained catboost model (see multi_class.py in attachment) and saved it as
model.cbm and model.py. Besides,
I have saved the input and the answer for one row in input.1.json.
In C++ I have the following code:

auto wrapper = ModelCalcerWrapper("model.cbm");

// parse input.1.json

for (const InputRow &row : in.input) {
auto pred = wrapper.CalcMulti(row.vector_float_features, row.vector_categorial_features);
// check that difference (pred - expected) is small enough
}

and the answer matches.
If I look at the python-exported file, then the error is immediately visible there:

current_tree_leaf_values_index += (1 << current_tree_depth) * model.dimension

* model.dimension shouldn't be there (as it is in cpp-exported version, for now it raises index out of bound
exception).
I fixed it in model.fixed.py file and add some code to predict a model:

if __name__ == "__main__":
    input_file = "input.1.json"
    data = open(input_file)
    item = json.load(data)[0]
    ans = item['ans']
    float_feats = item['float_features']
    cat_feats = item['cat_features']
    resp = apply_catboost_model_multi(float_feats, cat_feats)
    print("expected = {}".format(ans))
    print("real     = {}".format(resp))

# expected = [-0.4315705250918395, -0.07602514583990287, 0.5075956709317426]
# real     = [-0.012774193043495252, -0.048760865913935775, 0.38847943878388946]

but the answer doesn't match. I suppose it's a problem. Or am I doing something wrong?

Attachment

https://gist.github.com/mkornaukhov03/5c5d9e394f17141cac4fa63d2b09e026

@mkornaukhov03
Copy link
Author

This problem is still reproducible, snippet is the same. Please, reopen issue

@andrey-khropov
Copy link
Member

This problem is still reproducible, snippet is the same. Please, reopen issue

What is the commit where it can be reproduced?

@mkornaukhov03
Copy link
Author

This problem is still reproducible, snippet is the same. Please, reopen issue

What is the commit where it can be reproduced?

e10f9da

@andrey-khropov
Copy link
Member

This problem is still reproducible, snippet is the same. Please, reopen issue

What is the commit where it can be reproduced?

e10f9da

I cannot reproduce it, the example in the description works for me.

Also, using catboost 1.2.3 (multi_class.py is from the attachment and check_result.py is a copy of the code in the description of this issue):

$ python --version
Python 3.11.0
$ python -m pip install catboost==1.2.3
...
$ python ./multi_class.py 



0:	learn: 0.9417331	total: 49.8ms	remaining: 448ms
1:	learn: 0.8421839	total: 50ms	remaining: 200ms
2:	learn: 0.6597822	total: 50.1ms	remaining: 117ms
3:	learn: 0.6028493	total: 50.2ms	remaining: 75.3ms
4:	learn: 0.4900112	total: 50.4ms	remaining: 50.4ms
5:	learn: 0.4076408	total: 50.5ms	remaining: 33.7ms
6:	learn: 0.3458205	total: 50.6ms	remaining: 21.7ms
7:	learn: 0.2982687	total: 50.8ms	remaining: 12.7ms
8:	learn: 0.2608927	total: 50.9ms	remaining: 5.65ms
9:	learn: 0.2309514	total: 51ms	remaining: 0us
[['USA']
 ['USA']
 ['UK']
 ['USA']]
[[0.20060959 0.2862616  0.51312881]
 [0.07388963 0.06071726 0.86539311]
 [0.27590481 0.46474219 0.259353  ]
 [0.2580995  0.1213261  0.6205744 ]]
[[-0.43157053 -0.07602515  0.50759567]
 [-0.75475564 -0.95110009  1.70585572]
 [-0.15318701  0.36823989 -0.21505288]
 [-0.04081236 -0.7956756   0.83648797]]
Input #1
	Ans = [-0.43157053 -0.07602515  0.50759567]
	Cat   features = ['winter']
	Float features = [1996, 197]
Input #2
	Ans = [-0.75475564 -0.95110009  1.70585572]
	Cat   features = ['winter']
	Float features = [1968, 37]
Input #3
	Ans = [-0.15318701  0.36823989 -0.21505288]
	Cat   features = ['summer']
	Float features = [2002, 77]
Input #4
	Ans = [-0.04081236 -0.7956756   0.83648797]
	Cat   features = ['summer']
	Float features = [1948, 59]
$ cat ./check_result.py 

import json

from model import apply_catboost_model_multi


if __name__ == "__main__":
    input_file = "input.1.json"
    data = open(input_file)
    item = json.load(data)[0]
    ans = item['ans']
    float_feats = item['float_features']
    cat_feats = item['cat_features']
    resp = apply_catboost_model_multi(float_feats, cat_feats)
    print("expected = {}".format(ans))
    print("real     = {}".format(resp))
$ python ./check_result.py 
expected = [-0.4315705250918395, -0.07602514583990287, 0.5075956709317426]
real     = [-0.4315705250918395, -0.07602514583990279, 0.5075956709317426]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants