# Arize Tutorial: SHAP Value For Neural Networks

Let's get started on using Arize! ✨

Arize helps you visualize your model performance, understand drift & data quality issues, and share insights learned from your models.

**SHAP (SHapley Additive exPlanations)** is a game theoretic approach to explain the output of any machine learning model.

For neural network models, we can use `GradientExplainer` from the `SHAP` package to generate SHAP values ([API reference](https://shap-lrjball.readthedocs.io/en/latest/generated/shap.GradientExplainer.html)).

This demo consists of three parts.

1.   Train a neural network on tabular data using `tf.keras`
2.   Generate SHAP values using `shap.GradientExplainer`
3.   Logging the predictions and SHAP values to the Arize platform


# 1. Download Data and Train Model
For this demo, we use the pulsar classification dataset from UCI ([link](https://archive.ics.uci.edu/ml/datasets/HTRU2)).

In [None]:
import pandas as pd

df = pd.read_csv("https://storage.googleapis.com/arize-assets/fixtures/UCI/HTRU_2.zip")
features = df.columns.drop("class")
df

## 1.1. Split data into train and test sets

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    df[features], df["class"], test_size=1000
)

from sklearn.preprocessing import StandardScaler

# Standardize feature variables.
sc = StandardScaler()
X_train2 = sc.fit_transform(X_train)
X_test2 = sc.transform(X_test)

## 1.2. Train a neural network using `tf.keras`.

In [None]:
import tensorflow as tf

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(len(features),)),
        tf.keras.layers.Dense(16, activation="relu"),
        tf.keras.layers.Dense(1, activation="sigmoid"),
    ]
)

model.compile(
    optimizer="Adam", loss="binary_crossentropy", metrics=[tf.keras.metrics.AUC()]
)
model.fit(X_train2, y_train)
model.evaluate(X_test2, y_test)

# 2. Generate SHAP Values
Install the SHAP package.

In [None]:
!pip install -q shap
import shap

## 2.1. Use `GradientExplainer` to generate SHAP values from neural network models

Read more in ([SHAP API reference](https://shap-lrjball.readthedocs.io/en/latest/generated/shap.GradientExplainer.html))


In [None]:
e = shap.GradientExplainer(model, X_train2)
shap_values = pd.DataFrame(e.shap_values(X_test2)[0], columns=features)
shap_values

# 3. Log Predictions to Arize
We'll use the following helper functions to generate prediction IDs and timestamps to simulate a production environment.

In [None]:
import uuid
import numpy as np
from datetime import datetime, timedelta

# Prediction ID is required for logging any dataset
def generate_prediction_ids(X):
    return pd.Series((str(uuid.uuid4()) for _ in range(len(X_test))))


# OPTIONAL: We can directly specify when inferences were made
def simulate_production_timestamps(X, days=30):
    t = datetime.now()
    current_t, earlier_t = t.timestamp(), (t - timedelta(days=days)).timestamp()
    return pd.Series(np.linspace(earlier_t, current_t, num=len(X)))

## 3.1. Assemble pandas dataframe

In [None]:
y_test_score = model.predict(X_test2).flatten()
prediction_label = list(
    map(lambda x: "pulsar" if x > 0.5 else "non-pulsar", y_test_score)
)
actual_label = list(map(lambda x: "pulsar" if x > 0.5 else "non-pulsar", y_test))

shap_values_column_names_mapping = {f"{feat}": f"{feat}_shap" for feat in features}

production_dataset = pd.concat(
    [
        X_test.reset_index(drop=True),
        pd.DataFrame(
            {
                "prediction_id": generate_prediction_ids(X_test),
                "prediction_ts": simulate_production_timestamps(X_test),
                "prediction_label": prediction_label,
                "prediction_score": y_test_score,
                "actual_label": actual_label,
            }
        ),
        shap_values.rename(columns=shap_values_column_names_mapping),
    ],
    axis=1,
)

## 3.2 Initialize Arize client
You can find your `API_KEY` and `SPACE_KEY` by navigating to the settings page in your workspace as shown below (only space admins can see the keys). 



<img src="https://storage.cloud.google.com/arize-assets/fixtures/copy-keys.png" width="700">

In [None]:
!pip install -q arize
from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments

SPACE_KEY = "SPACE_KEY"
API_KEY = "API_KEY"
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

model_id="pulsar"
model_version = "1.0"
model_type = ModelTypes.SCORE_CATEGORICAL


if SPACE_KEY == "SPACE_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ NEED TO CHANGE SPACE AND/OR API_KEY")
else:
    print("✅ Import and Setup Arize Client Done! Now we can start using Arize!")

## 3.3 Log data to arize

In [None]:
# Define a Schema() object for Arize to pick up data from the correct columns for logging
production_schema = Schema(
    prediction_id_column_name="prediction_id",  # REQUIRED
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="prediction_label",
    prediction_score_column_name="prediction_score",
    actual_label_column_name="actual_label",
    feature_column_names=features,
    shap_values_column_names=shap_values_column_names_mapping,
)

# arize_client.log returns a Response object from Python's requests module
response = arize_client.log(
    dataframe=production_dataset,
    schema=production_schema,
    model_id = model_id,
    model_version=model_version,
    model_type=model_type,
    environment=Environments.PRODUCTION,
)

# If successful, the server will return a status_code of 200
if response.status_code != 200:
    print(
        f"❌ logging failed with response code {response.status_code}, {response.text}"
    )
else:
    print(
        f"✅ You have successfully logged {len(production_dataset)} data points to Arize!"
    )

### Check Data Ingestion Information
You now know how to seamlessly log SHAP values for neural networks onto the Arize platform. Go to [Arize](https://app.arize.com/) in order to analyze and monitor the logged SHAP values.

Data will be available in the UI in about 10 minutes after it was received. If data from a new model is sent, the model will be reflected almost immediately in the Arize platform. However, you will not see data yet. To verify data has been sent correctly and is being processed, we recommend that you check our Data Ingestion tab.

You will be able to see the predictions, actuals, and feature importances that have been sent in the last week, last day or last 30 minutes.

An example view of the Data Ingestion tab from a model, when data is sent continuously over 30 minutes, is shown in the image below.

<img src="https://storage.cloud.google.com/arize-assets/fixtures/data-ingestion-tab.png" width="700">



### Overview
Arize is an end-to-end ML observability and model monitoring platform. The platform is designed to help ML engineers and data science practitioners surface and fix issues with ML models in production faster with:
- Automated ML monitoring and model monitoring
- Workflows to troubleshoot model performance
- Real-time visualizations for model performance monitoring, data quality monitoring, and drift monitoring
- Model prediction cohort analysis
- Pre-deployment model validation
- Integrated model explainability

### Website
Visit Us At: https://arize.com/model-monitoring/

### Additional Resources
- [What is ML observability?](https://arize.com/what-is-ml-observability/)
- [Playbook to model monitoring in production](https://arize.com/the-playbook-to-monitor-your-models-performance-in-production/)
- [Using statistical distance metrics for ML monitoring and observability](https://arize.com/using-statistical-distance-metrics-for-machine-learning-observability/)
- [ML infrastructure tools for data preparation](https://arize.com/ml-infrastructure-tools-for-data-preparation/)
- [ML infrastructure tools for model building](https://arize.com/ml-infrastructure-tools-for-model-building/)
- [ML infrastructure tools for production](https://arize.com/ml-infrastructure-tools-for-production-part-1/)
- [ML infrastructure tools for model deployment and model serving](https://arize.com/ml-infrastructure-tools-for-production-part-2-model-deployment-and-serving/)
- [ML infrastructure tools for ML monitoring and observability](https://arize.com/ml-infrastructure-tools-ml-observability/)

Visit the [Arize Blog](https://arize.com/blog) and [Resource Center](https://arize.com/resource-hub/) for more resources on ML observability and model monitoring.