-
-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dalex not working with PyCaret DataTypes_Auto_infer pipelin #329
Comments
Hi @PhilipNell, thanks for this example.
Answering your needs (the 2. option indeed was missing, which I have fixed on
Code: temp.zip |
Cool, thanks for the reply Hubert.
I managed to solve the issue by typecasting my categorical columns before hand to type str. This has solved the issue.
Thank You
Kind Regards
Philip Nell
Technical Lead - Data & Machine Learning Engineer
Cell: +27 72 847 9829
Web: https://www.oqlis.com
… On 19 Sep 2020, at 16:29, Hubert Baniecki ***@***.***> wrote:
Hi @PhilipNell <https://github.com/PhilipNell>, thanks for this example.
dalex can only be so inteligent when it comes to recoginizing numerical/categorical columns. Currently pd.api.types.is_numeric_dtype function is being used through all the package to differentiate that. There is no easy way to take out user defined categorical_features in a model (framework) agnostic fashion for different explanations.
Answering your needs (the 2. option indeed was missing, which I have fixed on master version only now):
calculating the model_profile for a numerical variable in a specific set of points is possible using the variable_splits parameter
<https://user-images.githubusercontent.com/32574004/93668971-be1c9400-fa90-11ea-8fbc-f6b6332bfb79.png>
calculating the model_profile for a numerical variable in a specific set of points (+ in a categorical fashion) will be possible using the variable_splits and variable_type parameters
<https://user-images.githubusercontent.com/32574004/93669204-b100a480-fa92-11ea-960c-9c4fc1b67804.png>
Code: temp.zip <https://github.com/ModelOriented/DALEX/files/5249996/temp.zip>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#329 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APSAYDHN2JEHPAW7TH4VS7LSGS55LANCNFSM4RNKMJNA>.
|
I am trying to explain a model that was trained using PyCaret. The pipeline is a normal sklearn pipeline with a custom DataTypes_Auto_infer function that force the dataframe columns into specified datatypes.
The pipeline looks something like this:
Pipeline(memory=None,
steps=[('dtypes',
DataTypes_Auto_infer(categorical_features=['X4_number_of_convenience_stores'],
display_types=False,
features_todrop=['No'],
ml_usecase='regression',
numerical_features=['X1_transaction_date',
'X2_house_age',
'X3_distance_to_the_nearest_MRT_station',
'X5_latitude',
'X6_longitude'],
target='Y_house_price_of_unit_area',
tim...
RandomForestRegressor(bootstrap=True, ccp_alpha=0.0,
criterion='mse', max_depth=None,
max_features='auto', max_leaf_nodes=None,
max_samples=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0,
n_estimators=100, n_jobs=-1,
oob_score=False, random_state=42,
verbose=0, warm_start=False)]],
verbose=False)
In this case, the categorical feature ['X4_number_of_convenience_stores'] is taken as numerical by DALEX but was specified as categorical for predictions. I have attached my Jupyter Notebook and the dataset that is used.
Archive.zip
I greatly appreciate all the work that has gone into making this package.
The text was updated successfully, but these errors were encountered: