Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support MultiIndex in the Web App #381

Open
davibicudo opened this issue Sep 19, 2022 · 3 comments
Open

Support MultiIndex in the Web App #381

davibicudo opened this issue Sep 19, 2022 · 3 comments

Comments

@davibicudo
Copy link

davibicudo commented Sep 19, 2022

First of all, thank you for the great library! Investigating shap values with it is pretty comfortable.

Description of Problem:
The Shapax App does not support DataFrames with MultiIndexes. MultiIndexes are useful for querying tables coming from other structured data formats.
At the xpl.run_app() step an error occurs at https://github.com/MAIF/shapash/blob/master/shapash/webapp/smart_app.py#L149.

Overview of the Solution:
Use reset_index() instead of assigning the index to a column named index. If need be, store the names of the index columns for differentiating them from the remaining dataframe.

Examples:

Blockers:

Definition of Done:

@davibicudo davibicudo changed the title Support MultiIndex Support MultiIndex in the Web App Sep 19, 2022
@ThomasBouche
Copy link
Collaborator

Hi, Thanks,

Can you give us an example of a MultiIndex that does not work as you would like?
For example, if we take this tutorial (https://github.com/MAIF/shapash/blob/master/tutorial/tutorial01-Shapash-Overview-Launch-WebApp.ipynb), and create a MultiIndex like this (not the beautiful way, but its works) :

Xtest['new_col'] = Xtest['BedroomAbvGr']
Xtest.set_index('new_col',append=True, inplace=True)
Xtest = Xtest.reset_index().set_index(['new_col', 'Id']).sort_index()

image

When i run the app, the app works and index is concatenated:
image

What result do you expect ?

@davibicudo
Copy link
Author

Hi

Thanks for the reply.
I'm not able to reproduce your test (same notebook, using master branch). My pandas version is 1.3.5 (compatible with requirements.dev.txt).
I'm pasting the stacktrace below. Anyhow, even though the example you provided might work, it breaks the MultiIndex and it is no longer possible to filter it meaningfully. Would be nice to see each MultiIndex level in a separate column, so that filtering at the individual level is still possible (e.g. city, state, country as three levels, encoded as strings or integers).

Stacktrace:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp\4\ipykernel_24872\3799977857.py in
----> 1 app = xpl.run_app(title_story='House Prices')

\filer16l\p-v160l\SIMBA.A11244\90_Persoenlich\u229351\Saisonalisierung_PG\02_Code\shapash-master\shapash\explainer\smart_explainer.py in run_app(self, port, host, title_story, settings)
1017 self.predict()
1018 if hasattr(self, '_case'):
-> 1019 self.smartapp = SmartApp(self, settings)
1020 if host is None:
1021 host = "0.0.0.0"

\filer16l\p-v160l\SIMBA.A11244\90_Persoenlich\u229351\Saisonalisierung_PG\02_Code\shapash-master\shapash\webapp\smart_app.py in init(self, explainer, settings)
104 self.dataframe = pd.DataFrame()
105 self.round_dataframe = pd.DataFrame()
--> 106 self.init_data()
107
108 # COMPONENTS

\filer16l\p-v160l\SIMBA.A11244\90_Persoenlich\u229351\Saisonalisierung_PG\02_Code\shapash-master\shapash\webapp\smart_app.py in init_data(self)
147 raise ValueError('y_pred must be set when calling compile function.')
148
--> 149 self.dataframe['index'] = self.explainer.x_init.index
150 self.dataframe.rename(columns={f'{self.predict_col}': 'predict'}, inplace=True)
151 col_order = ['index', 'predict'] + self.dataframe.columns.drop(['index', 'predict']).tolist()

~\dg_env\lib\site-packages\pandas\core\frame.py in setitem(self, key, value)
3610 else:
3611 # set column
-> 3612 self._set_item(key, value)
3613
3614 def _setitem_slice(self, key: slice, value):

~\dg_env\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
3782 ensure homogeneity.
3783 """
-> 3784 value = self._sanitize_column(value)
3785
3786 if (

~\dg_env\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, value)
4508 if is_list_like(value):
4509 com.require_length_match(value, self.index)
-> 4510 return sanitize_array(value, self.index, copy=True, allow_2d=True)
4511
4512 @Property

~\dg_env\lib\site-packages\pandas\core\construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure, allow_2d)
498
499 # extract ndarray or ExtensionArray, ensure we have no PandasArray
--> 500 data = extract_array(data, extract_numpy=True)
501
502 if isinstance(data, np.ndarray) and data.ndim == 0:

~\dg_env\lib\site-packages\pandas\core\construction.py in extract_array(obj, extract_numpy, extract_range)
421 return obj
422
--> 423 obj = obj.array
424
425 if extract_numpy and isinstance(obj, ABCPandasArray):

~\dg_env\lib\site-packages\pandas\core\indexes\multi.py in array(self)
724 """
725 raise ValueError(
--> 726 "MultiIndex has no single backing array. Use "
727 "'MultiIndex.to_numpy()' to get a NumPy array of tuples."
728 )

ValueError: MultiIndex has no single backing array. Use 'MultiIndex.to_numpy()' to get a NumPy array of tuples.

@ThomasBouche
Copy link
Collaborator

Hi,

Indeed, there is an error with the version of pandas 1.3.5. It works for me with pandas 1.4.2.

I see what you mean about your need with MultiIndex filter. We will have a contribution (in the coming weeks) on a new interface in the webapp to propose a nicer way to filter columns.

At the end of this contribution, we will try to answer your issue.

One of the impacts I see is the management of this "index" box in this screenshot:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants