Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation of different preprocessing methods #304

Open
NTNguyen13 opened this issue Feb 25, 2022 · 1 comment
Open

Compilation of different preprocessing methods #304

NTNguyen13 opened this issue Feb 25, 2022 · 1 comment

Comments

@NTNguyen13
Copy link

Hi, I've just checked out Shapash. I've seen a lot of this line in the document:
preprocessing=encoder, # Optional: compile step can use inverse_transform method

However, I'm not sure how to process with this. I checked the code in here, but I'm not clear of about the use of parsing dict or list_of_dict to preprocessing.

I have this example, could you please advise me how to process with it?

Original df:

   A   B1   B2   C1   C2   E
1  0   B11  B03  C02  C04  1
2  1   B03  B04  C03  C04  1
3  0   B02  B03  C02  C02  1
4  1   B04  B03  C02  C03  0

I want to one hot encode A and E, and multi label binarizer (B1, B2) and (C1, C2) (both encoders are from sklearn)

Target df:

    A0   A1   B02  B03  B04  B11  C02  C03  C04  E0  E1
1   1    0    0    1    0    1    1    0    1    0   1
2   0    1    0    1    1    0    0    1    1    0   1
3   1    0    1    1    0    0    2    0    0    0   1
4   0    1    0    1    1    0    1    1    0    1   0

Because I have multiple encoders of multiple columns, how should I pass them preprocessing?

Thank you very much

@SebastienBidault
Copy link
Collaborator

Hi,

I recommend you to take a look at the encoding tutorials for a better understanding tutorial.

But at the moment we don't support multi label binarizer from sklearn.

We support :
from sklearn : OneHotEncoder / OrdinalEncoder / StandardScaler / QuantileTransformer / PowerTransformer
from category_encoder : OneHotEncoder / OrdinalEncoder / BaseNEncoder / BinaryEncoder / TargetEncoder
or a dict with the mapping needed

I don't know how complex your problem is but maybe you can use the features_groups of the compile step to get the importance of A,B,C or E.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants