Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Tutorial_SHAP.ipynb #14

Merged
merged 1 commit into from
Dec 4, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions xai-for-tabular-data/Tutorial_SHAP.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
"source": [
"# Model-Agnostic Interpretation with SHAP\n",
"\n",
"In this Notebook we will demonstrate how to use the SHapley Additive exPlanations (SHAP) method and interpret its results.\n",
"In this Notebook, we will demonstrate how to use the SHapley Additive exPlanations (SHAP) method and interpret its results.\n",
"\n",
"--------"
]
Expand All @@ -43,7 +43,7 @@
"id": "1c99c0f2",
"metadata": {},
"source": [
"Now that you opened the notebook in Google Colab follow the next step:\n",
"Now that you opened the notebook in Google Colab, follow the next step:\n",
"\n",
"1. Run this cell to connect your Google Drive to Colab and install packages\n",
"2. Allow this notebook to access your Google Drive files. Click on 'Yes', and select your account.\n",
Expand Down Expand Up @@ -148,7 +148,7 @@
"source": [
"## The California Housing Dataset: Data and Model Loading\n",
"\n",
"In this notebook, we will work with the **California Housing dataset**, containing 20,640 median house values for California districts (expressed in $100,000), which are described by eight numeric features. Each row in the dataset represents a block of houses, not a single household. The data pertains to the house prices found in a given California district and some summary statistics about them based on the 1990 census data. Our goal is to **predict price** of house blocks and find the most predictive features.\n",
"In this notebook, we will work with the **California Housing dataset**, containing 20,640 median house values for California districts (expressed in $100,000), described by eight numeric features. Each row in the dataset represents a block of houses, not a single household. The data pertains to the house prices found in a given California district and some summary statistics about them based on the 1990 census data. Our goal is to **predict price** of house blocks and find the most predictive features.\n",
"\n",
"<center><img src=\"https://github.com/HelmholtzAI-Consultants-Munich/XAI-Tutorials/blob/main/docs/source/_figures/dataset_california_housing.jpg?raw=true\" width=\"900\" /></center>\n",
"\n",
Expand All @@ -160,7 +160,7 @@
"id": "641755c5",
"metadata": {},
"source": [
"In the notebook [*Dataset-Housing.ipynb*](../data_and_models/Dataset-Housing.ipynb), we explain how to do the exploratory data analysis, preprocess the data and in the notebook [*Model-RandomForest.ipynb*](../data_and_models/Model-RandomForest.ipynb) we train a Random Forest model with the given data. This notebook focuses on the interpretation of the trained model and not on the data pre-processing or model training part. Hence, here we load the data and the model that we saved in the previous notebook."
"In the notebook [*Dataset-Housing.ipynb*](../data_and_models/Dataset-Housing.ipynb), we explain how to do the exploratory data analysis, preprocess the data and in the notebook [*Model-RandomForest.ipynb*](../data_and_models/Model-RandomForest.ipynb) we train a Random Forest model with the given data. This notebook focuses on the interpretation of the trained model and not on the data pre-processing or model training part. Hence, here we load the data and model saved in the previous notebook."
]
},
{
Expand Down Expand Up @@ -667,9 +667,9 @@
"cell_marker": "'''"
},
"source": [
"The average prediction for all houses in all the census blocks is labeled as the *base value* here, which is about 2.08. The predicted median house price in this census block is 2.21 and is labeled as the *f(x)*.\n",
"The average prediction for all houses in all the census blocks is labeled as the *base value* here, which is about 2.08. The predicted median house price in this census block is 2.21, labeled as the *f(x)*.\n",
"\n",
"Features that increase the predicted price from the *base value* are colored in red and are distinguished from each other by arrows pointing to the right. Features that decrease the predicted price are colored in blue with left-pointing arrows. Features with larger effects on the prediction, occupy more space in the row of arrows. The two sets of features point to the *output value*. The names of the features and their values are printed below the row of arrows.\n",
"Features that increase the predicted price from the *base value* are colored in red and are distinguished from each other by arrows pointing to the right. Features that decrease the predicted price are colored in blue with left-pointing arrows. Features with larger effects on the prediction, occupy more space in the row of arrows. The two sets of features point to the *output value*. The features' names and values are printed below the row of arrows.\n",
"\n",
"You can find more advanced use cases for decision and force plots [here](https://shap.readthedocs.io/en/latest/example_notebooks/api_examples/plots/decision_plot.html)."
]
Expand Down Expand Up @@ -1413,9 +1413,9 @@
"source": [
"### Global Explanations\n",
"\n",
"For the global explanations we can visualize a combined bar plot that shows the average absolute SHAP values stacked per class.\n",
"For the global explanations, we can visualize a combined bar plot that shows the average absolute SHAP values stacked per class.\n",
"\n",
"*Note: the shap.plots.bar() fucntion of the new package does currently not work for multi-class classiciation problem. Instaed we have to use the old shap.summary_plot() function.*"
"*Note: the shap.plots.bar() function of the new package does currently not work for multi-class classification problems. Instead, we have to use the old shap.summary_plot() function.*"
]
},
{
Expand Down Expand Up @@ -1462,7 +1462,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"No data for colormapping provided via 'c'. Parameters 'vmin', 'vmax' will be ignored\n"
"No data for color mapping was provided via 'c'. Parameters 'vmin', 'vmax' will be ignored\n"
]
},
{
Expand Down Expand Up @@ -1544,7 +1544,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"No data for colormapping provided via 'c'. Parameters 'vmin', 'vmax' will be ignored\n"
"No data for color mapping was provided via 'c'. Parameters 'vmin', 'vmax' will be ignored\n"
]
},
{
Expand Down