Dalex not working with PyCaret DataTypes_Auto_infer pipelin #329

PhilipNell · 2020-09-15T15:38:16Z

I am trying to explain a model that was trained using PyCaret. The pipeline is a normal sklearn pipeline with a custom DataTypes_Auto_infer function that force the dataframe columns into specified datatypes.

The pipeline looks something like this:

Pipeline(memory=None,
steps=[('dtypes',
DataTypes_Auto_infer(categorical_features=['X4_number_of_convenience_stores'],
display_types=False,
features_todrop=['No'],
ml_usecase='regression',
numerical_features=['X1_transaction_date',
'X2_house_age',
'X3_distance_to_the_nearest_MRT_station',
'X5_latitude',
'X6_longitude'],
target='Y_house_price_of_unit_area',
tim...
RandomForestRegressor(bootstrap=True, ccp_alpha=0.0,
criterion='mse', max_depth=None,
max_features='auto', max_leaf_nodes=None,
max_samples=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0,
n_estimators=100, n_jobs=-1,
oob_score=False, random_state=42,
verbose=0, warm_start=False)]],
verbose=False)

In this case, the categorical feature ['X4_number_of_convenience_stores'] is taken as numerical by DALEX but was specified as categorical for predictions. I have attached my Jupyter Notebook and the dataset that is used.
Archive.zip

I greatly appreciate all the work that has gone into making this package.

hbaniecki · 2020-09-19T14:29:30Z

Hi @PhilipNell, thanks for this example.

dalex can only be so inteligent when it comes to recoginizing numerical/categorical columns. Currently pd.api.types.is_numeric_dtype function is being used through all the package to differentiate that. There is no easy way to take out user defined categorical_features in a model (framework) agnostic fashion for different explanations.

Answering your needs (the 2. option indeed was missing, which I have fixed on master version only now):

calculating the model_profile for a numerical variable in a specific set of points is possible using the variable_splits parameter

calculating the model_profile for a numerical variable in a specific set of points (+ in a categorical fashion) will be possible using the variable_splits and variable_type parameters

Code: temp.zip

PhilipNell · 2020-09-21T09:54:19Z

Cool, thanks for the reply Hubert. I managed to solve the issue by typecasting my categorical columns before hand to type str. This has solved the issue. Thank You Kind Regards Philip Nell Technical Lead - Data & Machine Learning Engineer Cell: +27 72 847 9829 Web: https://www.oqlis.com

…

On 19 Sep 2020, at 16:29, Hubert Baniecki ***@***.***> wrote: Hi @PhilipNell <https://github.com/PhilipNell>, thanks for this example. dalex can only be so inteligent when it comes to recoginizing numerical/categorical columns. Currently pd.api.types.is_numeric_dtype function is being used through all the package to differentiate that. There is no easy way to take out user defined categorical_features in a model (framework) agnostic fashion for different explanations. Answering your needs (the 2. option indeed was missing, which I have fixed on master version only now): calculating the model_profile for a numerical variable in a specific set of points is possible using the variable_splits parameter <https://user-images.githubusercontent.com/32574004/93668971-be1c9400-fa90-11ea-8fbc-f6b6332bfb79.png> calculating the model_profile for a numerical variable in a specific set of points (+ in a categorical fashion) will be possible using the variable_splits and variable_type parameters <https://user-images.githubusercontent.com/32574004/93669204-b100a480-fa92-11ea-960c-9c4fc1b67804.png> Code: temp.zip <https://github.com/ModelOriented/DALEX/files/5249996/temp.zip> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#329 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APSAYDHN2JEHPAW7TH4VS7LSGS55LANCNFSM4RNKMJNA>.

hbaniecki added Python 🐍 Related to Python invalid ❕ This doesn't seem right, potential bug labels Sep 15, 2020

hbaniecki added a commit that referenced this issue Sep 19, 2020

[python] potential fix for #329

3c6f340

hbaniecki closed this as completed Sep 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dalex not working with PyCaret DataTypes_Auto_infer pipelin #329

Dalex not working with PyCaret DataTypes_Auto_infer pipelin #329

PhilipNell commented Sep 15, 2020

hbaniecki commented Sep 19, 2020

PhilipNell commented Sep 21, 2020 via email

Dalex not working with PyCaret DataTypes_Auto_infer pipelin #329

Dalex not working with PyCaret DataTypes_Auto_infer pipelin #329

Comments

PhilipNell commented Sep 15, 2020

hbaniecki commented Sep 19, 2020

PhilipNell commented Sep 21, 2020 via email