Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dalex not working with PyCaret DataTypes_Auto_infer pipelin #329

Closed
PhilipNell opened this issue Sep 15, 2020 · 2 comments
Closed

Dalex not working with PyCaret DataTypes_Auto_infer pipelin #329

PhilipNell opened this issue Sep 15, 2020 · 2 comments
Labels
invalid ❕ This doesn't seem right, potential bug Python 🐍 Related to Python

Comments

@PhilipNell
Copy link

I am trying to explain a model that was trained using PyCaret. The pipeline is a normal sklearn pipeline with a custom DataTypes_Auto_infer function that force the dataframe columns into specified datatypes.

The pipeline looks something like this:

Pipeline(memory=None,
steps=[('dtypes',
DataTypes_Auto_infer(categorical_features=['X4_number_of_convenience_stores'],
display_types=False,
features_todrop=['No'],
ml_usecase='regression',
numerical_features=['X1_transaction_date',
'X2_house_age',
'X3_distance_to_the_nearest_MRT_station',
'X5_latitude',
'X6_longitude'],
target='Y_house_price_of_unit_area',
tim...
RandomForestRegressor(bootstrap=True, ccp_alpha=0.0,
criterion='mse', max_depth=None,
max_features='auto', max_leaf_nodes=None,
max_samples=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0,
n_estimators=100, n_jobs=-1,
oob_score=False, random_state=42,
verbose=0, warm_start=False)]],
verbose=False)

In this case, the categorical feature ['X4_number_of_convenience_stores'] is taken as numerical by DALEX but was specified as categorical for predictions. I have attached my Jupyter Notebook and the dataset that is used.
Archive.zip

I greatly appreciate all the work that has gone into making this package.

@hbaniecki hbaniecki added Python 🐍 Related to Python invalid ❕ This doesn't seem right, potential bug labels Sep 15, 2020
hbaniecki added a commit that referenced this issue Sep 19, 2020
@hbaniecki
Copy link
Member

Hi @PhilipNell, thanks for this example.

dalex can only be so inteligent when it comes to recoginizing numerical/categorical columns. Currently pd.api.types.is_numeric_dtype function is being used through all the package to differentiate that. There is no easy way to take out user defined categorical_features in a model (framework) agnostic fashion for different explanations.

Answering your needs (the 2. option indeed was missing, which I have fixed on master version only now):

  1. calculating the model_profile for a numerical variable in a specific set of points is possible using the variable_splits parameter

image

  1. calculating the model_profile for a numerical variable in a specific set of points (+ in a categorical fashion) will be possible using the variable_splits and variable_type parameters

image

Code: temp.zip

@PhilipNell
Copy link
Author

PhilipNell commented Sep 21, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid ❕ This doesn't seem right, potential bug Python 🐍 Related to Python
Projects
None yet
Development

No branches or pull requests

2 participants