Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not working for multi-valued categorical features #2

Closed
raam93 opened this issue Jul 31, 2019 · 3 comments
Closed

Not working for multi-valued categorical features #2

raam93 opened this issue Jul 31, 2019 · 3 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@raam93
Copy link

raam93 commented Jul 31, 2019

Does the current implementation support only binary-valued categorical features?

Because I tried with the adult income dataset which has many multi-value categorical and continuous features (https://archive.ics.uci.edu/ml/datasets/adult)
and got output like these:

"The model predicted '<=50k' instead of '>50k' because 'hours_per_week <= 42.832 and not occupation and age <= 34.95 and not education and hours_per_week <= 57.892'"

Here, education and occupation are not binary features - they have many levels.

@MarcelRobeer MarcelRobeer self-assigned this Jul 31, 2019
@MarcelRobeer
Copy link
Owner

I have added initial support for multi-valued categorical features (mapping to a one-hot encoder and back in DomainMapperTabular). Typically this is already done as it is required by the predictor, so could you please indicate which package you are using to directly get predictions for categorical features?

@raam93
Copy link
Author

raam93 commented Aug 1, 2019

Thanks for the reply!

I worked with your updated library - now I get outputs like this:

"The model predicted '>50k' instead of '<=50k' because '34 > 0.954 and 59 <= 4993.447 and 79 <= 0.046'"

So I remove 'fnlwgt' and 'education-num' features from adult income data and label encode the data and feed to your library.

df = pd.read_csv('adult_income.csv')
del df['fnlwgt']
del df['education-num']
df_le, label_encoder = label_encode(df, discrete) # discrete is discrete feature names
X = df_le.loc[:, df_le.columns != class_name].values # class_name is 'class'
y = df_le[class_name].values

'X' looks like this:

array([[39, 6, 9, ..., 0, 40, 38],
[50, 5, 9, ..., 0, 13, 38],
[38, 3, 11, ..., 0, 40, 38],
...,
[58, 3, 11, ..., 0, 40, 38],
[22, 3, 11, ..., 0, 20, 38],
[52, 4, 11, ..., 0, 40, 38]], dtype=int64)

Then, after training I follow your code:

sample = x_test[17]

# Create a domain mapper (map the explanation to meaningful labels for explanation)
dm = ce.domain_mappers.DomainMapperTabular(x_train,
                                           feature_names=np.array(['age',
                                                         'workclass',
                                                         'education',
                                                         'marital-status',
                                                         'occupation',
                                                         'relationship',
                                                         'race',
                                                         'sex',
                                                         'capital-gain',
                                                         'capital-loss',
                                                         'hours-per-week',
                                                         'native-country']),
                                           contrast_names=np.array(['<=50k', '>50k']),
                                           categorical_features=np.array([1,2,3,4,5,6,7,11]))

# Create the contrastive explanation object (default is a Foil Tree explanator)
exp = ce.ContrastiveExplanation(dm)

# Explain the instance (sample) for the given model
exp.explain_instance_domain(model.predict_proba, sample)

Can you try your code on the adult income dataset or any other dataset with multi-valued categorical features? Thanks in advance!

@MarcelRobeer
Copy link
Owner

I added your case as example number 2 to the example notebook.

@MarcelRobeer MarcelRobeer added bug Something isn't working enhancement New feature or request labels Feb 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants