Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore the data with continuous output and category input #540

Open
Vu1992 opened this issue May 9, 2024 · 4 comments
Open

Explore the data with continuous output and category input #540

Vu1992 opened this issue May 9, 2024 · 4 comments

Comments

@Vu1992
Copy link

Vu1992 commented May 9, 2024

Hi,

Thank for your great work. I have one question regard to the Explore data. Is it possible to use the following code to explain the continuous output and category input in Explore the data:

marginal = Marginal(names).explain_data(X_train, y_train, name='Train Data')
show(marginal)

When i try to use the above code, they return with Type error: Unable to do the formular for 'str'

@paulbkoch
Copy link
Collaborator

Hi @Vu1992 -- It should handle continuous output and category input. I don't see that error message in our repo or on the internet. Can you include a stack trace? Also, is the data public?

@Vu1992
Copy link
Author

Vu1992 commented May 13, 2024

Hi @paulbkoch ,

Thank for your reply. Unfortunately that the data is private, but i can show you what i'm trying to do. I have a dataframe and do the following step with df is my data as a table.
A=df[['BRANCH']] ; B=df[['Gross_Incurred']]; names=['BRANCH']
So basically A and B have the value as in the image bellow
image
image
Then I use your code for Data explorer
marginal = Marginal(names).explain_data(A, B, name='Train Data'); show(marginal)
Then python comeback to me with Type Error: unsupported operand type(s) for -: 'str' and 'str

@paulbkoch
Copy link
Collaborator

I tried to replicate this with the following code:

import numpy as np
import pandas as pd
from interpret.data import Marginal
from interpret import show
names=['BRANCH']
A = pd.DataFrame()
A["BRANCH"] = pd.Series(np.array(['VC', 'VC', 'MS', 'VH'], dtype=np.str_))
B = pd.DataFrame()
B["Gross_Incurred"] = pd.Series(np.array([18000000.0, 36200000000.0, 0.0, -50000000.0], dtype=float))
marginal = Marginal(names).explain_data(A, B, name='Train Data'); show(marginal)

My example works though. Any idea what could be different?

@Vu1992
Copy link
Author

Vu1992 commented May 15, 2024

Thank for your help.
I don't know what have gone wrong last time but now i tried again it work but the graph do not change when i change to Type Categorical even in your replication.
when i add continuous variable, it show like this
image
but when i want to see the categorical variable, nothing change
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants