Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in shapley in CreateTargetDataProvider #1827

Closed
pseudotensor opened this issue Aug 30, 2021 · 2 comments
Closed

segfault in shapley in CreateTargetDataProvider #1827

pseudotensor opened this issue Aug 30, 2021 · 2 comments

Comments

@pseudotensor
Copy link

pseudotensor commented Aug 30, 2021

Problem:
catboost version: 0.26.1 but also 0.25.1
Operating System: Ubuntu 18 or centos7
CPU: Intel
GPU: Only happens when using CPU

backtrace of core file

#0  0x000014a5beee7579 in NCB::CreateTargetDataProvider(NCB::TRawTargetDataProvider const&, TMaybe<TArrayRef<unsigned int const>, NCB::TPolicyUnavailableData>, bool, NCatboostOptions::TLossDescription const*, bool, TMaybe<unsigned int, NMaybe::TPolicyUndefinedExcept>, NCB::TTargetCreationOptions const&, NCB::TInputClassificationInfo const&, NCB::TOutputClassificationInfo*, TRestorableFastRng64*, NPar::ILocalExecutor*, NCB::TOutputPairsInfo*) ()
   from /home/jon/h2oai/tmp/h2oai/contrib/env/lib/python3.8/site-packages/catboost/_catboost.so
#1  0x000014a5beeea291 in NCB::CreateModelCompatibleProcessedDataProvider(NCB::TDataProviderTemplate<NCB::TObjectsDataProvider> const&, TArrayRef<NCatboostOptions::TLossDescription const>, TFullModel const&, unsigned long, TRestorableFastRng64*, NPar::ILocalExecutor*, bool) () from /home/jon/h2oai/tmp/h2oai/contrib/env/lib/python3.8/site-packages/catboost/_catboost.so
#2  0x000014a5bf0f641a in CollectLeavesStatistics(NCB::TDataProviderTemplate<NCB::TObjectsDataProvider> const&, TFullModel const&, NPar::ILocalExecutor*) () from /home/jon/h2oai/tmp/h2oai/contrib/env/lib/python3.8/site-packages/catboost/_catboost.so
#3  0x000014a5bf0e0656 in PrepareTrees(TFullModel const&, NCB::TDataProviderTemplate<NCB::TObjectsDataProvider> const*, TIntrusivePtr<NCB::TDataProviderTemplate<NCB::TObjectsDataProvider>, TDefaultIntrusivePtrOps<NCB::TDataProviderTemplate<NCB::TObjectsDataProvider> > >, EPreCalcShapValues, NPar::ILocalExecutor*, bool, ECalcTypeShapValues, EExplainableModelOutput) () from /home/jon/h2oai/tmp/h2oai/contrib/env/lib/python3.8/site-packages/catboost/_catboost.so
#4  0x000014a5bf0ee572 in CalcShapValuesMulti(TFullModel const&, NCB::TDataProviderTemplate<NCB::TObjectsDataProvider> const&, TIntrusivePtr<NCB::TDataProviderTemplate<NCB::TObjectsDataProvider>, TDefaultIntrusivePtrOps<NCB::TDataProviderTemplate<NCB::TObjectsDataProvider> > >, TMaybe<TFixedFeatureParams, NMaybe::TPolicyUndefinedExcept> const&, int, EPreCalcShapValues, NPar::ILocalExecutor*, ECalcTypeShapValues, EExplainableModelOutput) ()
   from /home/jon/h2oai/tmp/h2oai/contrib/env/lib/python3.8/site-packages/catboost/_catboost.so
#5  0x000014a5bf0c0b3c in GetFeatureImportancesMulti(EFstrType, TFullModel const&, TIntrusivePtr<NCB::TDataProviderTemplate<NCB::TObjectsDataProvider>, TDefaultIntrusivePtrOps<NCB::TDataProviderTemplate<NCB::TObjectsDataProvider> > >, TIntrusivePtr<NCB::TDataProviderTemplate<NCB::TObjectsDataProvider>, TDefaultIntrusivePtrOps<NCB::TDataProviderTemplate<NCB::TObjectsDataProvider> > >, int, EPreCalcShapValues, int, ECalcTypeShapValues, EExplainableModelOutput) ()
   from /home/jon/h2oai/tmp/h2oai/contrib/env/lib/python3.8/site-packages/catboost/_catboost.so
#6  0x000014a5be6898fa in __pyx_f_9_catboost_9_CatBoost__calc_fstr(__pyx_obj_9_catboost__CatBoost*, _object*, __pyx_obj_9_catboost__PoolBase*, __pyx_obj_9_catboost__PoolBase*, int, int, _object*, _object*, _object*, _object*, int) ()
   from /home/jon/h2oai/tmp/h2oai/contrib/env/lib/python3.8/site-packages/catboost/_catboost.so
#7  0x000014a5be707633 in __pyx_pw_9_catboost_9_CatBoost_53_calc_fstr(_object*, _object*, _object*) () from /home/jon/h2oai/tmp/h2oai/contrib/env/lib/python3.8/site-packages/catboost/_catboost.so
#8  0x00005587253bb1c7 in method_vectorcall_VARARGS_KEYWORDS () at /tmp/build/80754af9/python_1599203911753/work/Objects/descrobject.c:332
#9  0x000055872531675e in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x14a63859f530, callable=0x14a6b41d1450) at /tmp/build/80754af9/python_1599203911753/work/Include/cpython/abstract.h:127
#10 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x14a5c8003020) at /tmp/build/80754af9/python_1599203911753/work/Python/ceval.c:4963
#11 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1599203911753/work/Python/ceval.c:3486
#12 0x00005587253a186b in function_code_fastcall (globals=<optimized out>, nargs=10, args=<optimized out>, co=<optimized out>) at /tmp/build/80754af9/python_1599203911753/work/Objects/call.c:283
#13 _PyFunction_Vectorcall.localalias.355 () at /tmp/build/80754af9/python_1599203911753/work/Objects/call.c:410
#14 0x000055872531675e in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x14a638553318, callable=0x14a6b27da550) at /tmp/build/80754af9/python_1599203911753/work/Include/cpython/abstract.h:127
#15 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x14a5c8003020) at /tmp/build/80754af9/python_1599203911753/work/Python/ceval.c:4963
#16 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1599203911753/work/Python/ceval.c:3486
#17 0x00005587253a0a92 in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1599203911753/work/Python/ceval.c:4298
#18 0x00005587253a1d20 in _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=0x14a638577408, func=0x14a5dc028790) at /tmp/build/80754af9/python_1599203911753/work/Objects/call.c:435

repro:

import pickle

file2 = "catboost_shappredict3b3da3.tmp.pickle"

model, X, y, kwargs, shap_calc_type, cat_features = pickle.load(open(file2, "rb"))

from catboost import CatBoostClassifier, CatBoostRegressor, EFstrType, Pool

data = Pool(X, label=y, cat_features=cat_features)
preds_shap = model.get_feature_importance(
                data=data,
                thread_count=4,
                type=EFstrType.ShapValues,
                shap_calc_type=shap_calc_type,
            )

catboost_shappredict3b3da3.tmp.pickle.zip

shap_calc_type Approximate or Regular doesn't matter.

label encoding the items and then making strings also doesn't help. Seems problem is just that they are all unique values, so kinda a corner case to avoid segfault.

pseudotensor added a commit to h2oai/driverlessai-recipes that referenced this issue Aug 30, 2021
…coding or characters). Catboost fails on entirely arbitrary tweet like objects, so handle testing more carefully. catboost/catboost#1827
arcadia-devtools pushed a commit that referenced this issue Sep 1, 2021
ref:947a5ff6c476a447ca85ce1b317432b16cede598
@LyzhinIvan
Copy link
Collaborator

Fixed.

@kizill
Copy link
Member

kizill commented Oct 1, 2021

Released in 1.0.0

robot-piglet pushed a commit that referenced this issue Jan 16, 2023
ref:947a5ff6c476a447ca85ce1b317432b16cede598
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants