# Arize Tutorial: 1-Hot Encoding Decomposition

Let's get started on using Arize! ✨

Arize helps you visualize your model performance, understand drift & data quality issues, and share insights learned from your models.

This is a simple example on how to decompose 1-hot encoded features and/or shap values into their original multi-class state prior to sending data to Arize.

In this case, we have features, predicions, actuals, and their respective SHAP values all in a single dataframe. In the case where your data is not colocated, you can send each peice (prediction, actual, and SHAP values) separatedly as long as the `prediction_id` variable from a SHAP and/or Actual latent call matches a previously sent Prediction.

In [None]:
import pandas as pd

## Sample data set with features, predictions, actuals and shap values
df = pd.read_csv('https://storage.googleapis.com/arize-assets/fixtures/example_shap_data.zip')

In [None]:
## Here is an example of data where some features being 1-hot encoded while others are not
df.head(2)

## Prepare feature names

Since we need the same feature names as the original prediction inputs, we'll need to "un-encode" the 1-hot encoded features in this case addr_state and term features were 1-hot encoded, so we create a dictionary where keys are the decomposed names and the values are all the 1-hot encoded names

In [None]:
## This helper function decomposes the 1-hot encoded columns into their original names.
## We calculate the sum of the SHAP values for each origial column from each 1-hot column
## Reference: https://github.com/slundberg/shap/issues/679#issuecomment-508575567
def map_shap(shap_df, one_h_map):
  for key, value in one_h_map.items():
    shap_df[key] = shap_df[value].sum(axis=1)
    shap_df = shap_df.drop(columns=value)
  return shap_df

encoding_map = {"term_shap": ['term_one_h_0_shap', 'term_one_h_1_shap', 'term_one_h_2_shap','term_one_h_3_shap'],
              "addr_state_shap": ['addr_state_one_h_0_shap', 'addr_state_one_h_1_shap', 'addr_state_one_h_2_shap']}

shap_values = map_shap(df, encoding_map)
shap_values.head(2)

In [None]:
## Features names for your model
feature_names = ['installment', 'grade', 'home_ownership', 'annual_income',
       'verification_status', 'pymnt_plan', 'purpose', 'inq_last_6mths',
       'mths_since_last_delinq', 'mths_since_last_record', 'open_acc',
       'pub_rec', 'revol_bal', 'revol_util', 'total_acc', 'fico_score',
       'fico_range', 'term', 'addr_state']

In [None]:
## Helper function to get name of shap columns
def get_shap_column_names(feature_names):
  shap_column_names = []
  for name in feature_names:
    shap_column_names.append(f"{name}_shap")
  return shap_column_names

shap_column_names = get_shap_column_names(feature_names)

## Initialize Arize client
You can find your `API_KEY` and `SPACE_KEY` by navigating to the settings page in your workspace as shown below (only space admins can see the keys). 



<img src="https://storage.cloud.google.com/arize-assets/fixtures/copy-keys.png" width="700">

In [None]:
!pip install -q arize
from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments

SPACE_KEY = "SPACE_KEY"
API_KEY = "API_KEY"
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

model_id="Example-SHAP-Decomposition"
model_version="1.0"
model_type=ModelTypes.CATEGORICAL

if SPACE_KEY == "SPACE_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ NEED TO CHANGE SPACE AND/OR API_KEY")
else:
    print("✅ Import and Setup Arize Client Done! Now we can start using Arize!")

In [None]:
response = arize_client.log(
    dataframe=shap_values,
    model_id=model_id,
    model_version=model_version,
    model_type=model_type,
    environment=Environments.PRODUCTION,
    schema = Schema(
        prediction_id_column_name="ids",
        prediction_label_column_name="prediction",
        actual_label_column_name="actual",
        feature_column_names=feature_names,
        shap_values_column_names=dict(zip(feature_names, shap_column_names)),
    )
)

if response.status_code != 200:
    print(f"❌ logging failed with response code {response.status_code}, {response.text}")
else:
    print(f"✅ logging completed with response code {response.status_code}")

### Check Data Ingestion Information
Data will be available in the UI in about 10 minutes after it was received. If data from a new model is sent, the model will be reflected almost immediately in the Arize platform. However, you will not see data yet. To verify data has been sent correctly and is being processed, we recommend that you check our Data Ingestion tab.

You will be able to see the predictions, actuals, and feature importances that have been sent in the last week, last day or last 30 minutes.

An example view of the Data Ingestion tab from a model, when data is sent continuously over 30 minutes, is shown in the image below.

<img src="https://storage.cloud.google.com/arize-assets/fixtures/data-ingestion-tab.png" width="700">



### Overview
Arize is an end-to-end ML observability and model monitoring platform. The platform is designed to help ML engineers and data science practitioners surface and fix issues with ML models in production faster with:
- Automated ML monitoring and model monitoring
- Workflows to troubleshoot model performance
- Real-time visualizations for model performance monitoring, data quality monitoring, and drift monitoring
- Model prediction cohort analysis
- Pre-deployment model validation
- Integrated model explainability

### Website
Visit Us At: https://arize.com/model-monitoring/

### Additional Resources
- [What is ML observability?](https://arize.com/what-is-ml-observability/)
- [Playbook to model monitoring in production](https://arize.com/the-playbook-to-monitor-your-models-performance-in-production/)
- [Using statistical distance metrics for ML monitoring and observability](https://arize.com/using-statistical-distance-metrics-for-machine-learning-observability/)
- [ML infrastructure tools for data preparation](https://arize.com/ml-infrastructure-tools-for-data-preparation/)
- [ML infrastructure tools for model building](https://arize.com/ml-infrastructure-tools-for-model-building/)
- [ML infrastructure tools for production](https://arize.com/ml-infrastructure-tools-for-production-part-1/)
- [ML infrastructure tools for model deployment and model serving](https://arize.com/ml-infrastructure-tools-for-production-part-2-model-deployment-and-serving/)
- [ML infrastructure tools for ML monitoring and observability](https://arize.com/ml-infrastructure-tools-ml-observability/)

Visit the [Arize Blog](https://arize.com/blog) and [Resource Center](https://arize.com/resource-hub/) for more resources on ML observability and model monitoring.