<img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/>

# Getting Started with the Arize Platform - Click-Through Rate in the Advertisement Industry

**In this walkthrough, we are going to investigate various performance related aspects of an online advertisement use-case. More specificlly we will be using the Arize platform to monitor Click-through Rate (CTR) performance.**

You manage the Machine Learning models for an online adveristing platform. You have spent a great deal of your time collecting online data and traning models for best performance. With your models now in production you have no tools available to your disposal to monitor the performance of your models, identify any issues or get insights into how to improve your models. In this walkthrough we will look at a few scenarios common to an adverstisement use-case and more specifically looking at CTR predictions versus actuals for a given ad or ad group.

You will learn to:

1. Get training, valdiation, and production data into the Arize platform
2. Setup performance dashboards
3. Setup threshold alerts
4. Understand where the model is underperforming
5. Discover the root cause of issues

The sample data contains 1 month of information in which 3 main characteristics exist. You will work on identifying these characteristics in the course of this exercise. At a glance:

1. A new untrained domain is introduced
2. Bad data is received in one of the feature
3. The model is inacurate during some time period

---

**Assumptions**: A typical CTR can be as low as only a few percents or less meaning that user clicks are very sparse relative to the total number of impressions. In this example we use a relatively high CTR of about 6.5% with a relatively small dataset. By definition the CTR of an ad or ad group can be expressed as:

$$CTR = \frac{Clicks}{Impressions}$$

A CTR model predicts probabilities of an impression getting a click. The actual score for the impression can be represented as 0 or 1 with 1 meaning the impression received a click. For example:

Predicted Probability (Score)  | Outcome     | Actual Score
-----------------------|---------------------|------------------
0.05                   | User did not click  | 0
0.15                   | User clicked        | 1
0.02                   | User did not click  | 0


As seen in this table we a must be able to compare probabilities (0 to 1) with discrete (0 or 1) scores in order to evaluate the performance of a CTR model. In this exercise we will be using a log-loss function as a classification metric. Log-loss tracks how close the prediction probability is to the corresponding actual score (0 or 1) and penalises based on how far the prediction is. The more the predicted probability diverges from the actual score, the higher is the log-loss value.

---






# Step 0. Setup and Getting the Data
First step is to load a preexisting dataset that represents a Click-through Rate use-case including training, validation, and prediction data. Using a preexisting dataset saves time in this example and also illustrates how simple it is to plug into the Arize platform.
## Install Dependencies and Import Libraries 📚

In [None]:
!pip install arize --upgrade -q


import datetime
import tempfile
import pandas as pd
import requests
from datetime import timedelta
from arize.pandas.logger import Client, Schema
from arize.utils.types import Environments, ModelTypes

## **🌐 Download the Data**

In [None]:
datasets = {}
for environment in ["training", "validation", "production"]:
    datasets[environment] = pd.read_csv(
        f"https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/click_through_rate_categorical_{environment}.csv"
    )

print("✅ Dependencies installed and data successfully downloaded!")

## Inspect the Data
Take a quick look at the dataset. The data represents a model designed and trained to evaluate the probability of a user clicking on an a given ad or ad group based on various features such as domain, page category, position, matching keywords, etc. The dataset containts one month of data and the performance will be evaluated by looking particularly at:

*   **CTR_predicted**: The probability of a click predicted by the model for the ad or ad group (values from 0 to 1, 1 representing 100%)
*   **actuals**: The actual click/noclick value as a string for the impression



In [None]:
datasets["production"][:10].style

In [None]:
print(
    "The average CTR is %.3f %%"
    % (100 * sum(datasets["production"]["CTR_predicted"]) / len(datasets["production"]))
)

# Step 1. Sending Data into Arize 💫

Now that we have our data configured, we are ready to integrate into Arize. We do this by logging (sending) important data we want to analyze to the platform. There, the data will be easily visualized and investigated to source our problem.

For our model, we are going to log:

*   feature data
*   predictions
*   actuals

## Import and Setup Arize Client

The first step is to setup our Arize client. After that we will log the data.

First, copy the Arize `API_KEY` and `ORGANIZATION_KEY` from your admin page linked below! Copy those over to the set-up section. We will also be setting up some metadata to use across all logging.

[![Button_Open.png](https://storage.googleapis.com/arize-assets/fixtures/Button_Open.png)](https://app.arize.com/admin)


In [None]:
ORGANIZATION_KEY = "ORGANIZATION_KEY"
API_KEY = "API_KEY"
arize_client = Client(organization_key=ORGANIZATION_KEY, api_key=API_KEY)

model_id = "click_through_rate_categorical"  # This is the model name that will show up in Arize
model_version = "v1.0"  # Version of model - can be any string

if ORGANIZATION_KEY == "ORGANIZATION_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ YOU NEED TO CHANGE ORGANIZATION AND/OR API_KEY")
else:
    print("✅ Arize setup complete!")

## Log Training & Validation Data
Now that our Arize client is setup, let's go ahead and log all of our data to the platform. For more details on how **`arize.pandas.logger`** works, visit out documentations page below.

[![Buttons_OpenOrange.png](https://storage.googleapis.com/arize-assets/fixtures/Buttons_OpenOrange.png)](https://docs.arize.com/arize/sdks-and-integrations/python-sdk/arize.pandas)

Key parameters:

*   **prediction_label_column_name**: tells Arize which column contains the predictions ("click"/"noclick")
*  **prediction_score_column_name**: tells Arize which column contains the predicted click rate for the impression (probability 0 to 1)
*   **actual_label_column_name**: tells Arize which column contains the actual results from field data ("click"/"noclick")
*  **actual_score_column_name** same as label but expresed as 0 or 1

We will use [ModelTypes.SCORE_CATEGORICAL](https://docs.arize.com/arize/product-guides-1/models/model-types) to perform this analysis.






In [None]:
# This is the data which we will be logging
df_train = datasets["training"]
df_valid = datasets["validation"]
df_prod = datasets["production"]

features = [
    "position",
    "domain",
    "category",
    "device",
    "keywords",
]

# Define a Schema() object for Arize to pick up data from the correct columns for logging
schema = Schema(
    prediction_id_column_name="id",
    prediction_label_column_name="predictions",
    prediction_score_column_name="CTR_predicted",
    actual_label_column_name="actuals",
    actual_score_column_name="CTR",
    feature_column_names=features,
)

# Logging Training DataFrame
response = arize_client.log(
    dataframe=df_train,
    model_id=model_id,
    model_version=model_version,
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.TRAINING,
    schema=schema,
)

# If successful, the server will return a status_code of 200
if response.status_code != 200:
    print(f"logging failed with response code {response.status_code}, {response.text}")
else:
    print(f"✅ You have successfully logged training set to Arize")

# Logging Validation DataFrame
response = arize_client.log(
    dataframe=df_valid,
    model_id=model_id,
    model_version=model_version,
    batch_id="validation",
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.VALIDATION,
    schema=schema,
)

# If successful, the server will return a status_code of 200
if response.status_code != 200:
    print(f"logging failed with response code {response.status_code}, {response.text}")
else:
    print(f"✅ You have successfully logged validation set to Arize")

## Log the Production Data
Similarly, we will use the `arize.pandas.logger` to log the production dataset. Here, we will first need to update the timestamps to align with current day and time. This is to ensure that the sample data shows up as recent in Arize.


In [None]:
# Adjusting the time in the dataset to align with current time
time_offset = datetime.datetime.now() - max(
    [datetime.datetime.fromisoformat(str(t)) for t in df_prod["model_date"]]
)
time_offset = datetime.timedelta(days=time_offset.days)
df_prod["prediction_ts"] = df_prod["model_date"].apply(
    lambda t: (datetime.datetime.fromisoformat(str(t)) + time_offset).timestamp()
)


# Define a Schema() object for Arize to pick up data from the correct columns for logging
schema = Schema(
    prediction_id_column_name="id",
    prediction_label_column_name="predictions",
    prediction_score_column_name="CTR_predicted",
    actual_label_column_name="actuals",
    actual_score_column_name="CTR",
    feature_column_names=features,
    timestamp_column_name="prediction_ts",
)

# arize_client.log returns a Response object from Python's requests module
response = arize_client.log(
    dataframe=df_prod,
    model_id=model_id,
    model_version=model_version,
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.PRODUCTION,
    schema=schema,
)

# If successful, the server will return a status_code of 200
if response.status_code != 200:
    print(f"logging failed with response code {response.status_code}, {response.text}")
else:
    print(f"✅ You have successfully logged production set to Arize")

# Step 2. Confirm Data in Arize ✅
Note that the Arize performs takes about 10 minutes to index the data. While the model should appear immediately, the data will not show up untill the indexing is done. Feel free to head over to the **Data Ingestion** tab for your model to watch Arize works its magic!🔮

**⚠️ DON'T SKIP:**
In order to move on to the next step, make sure your actuals and training/validation sets are loaded into the platform. To check:
1. Naviate to models from the left bar, locate and click on model **click_through_rate**
2. On the **Overview Tab** and make sure you see the actuals as shown below.
3. Actual data will show up under **Model Health**. Once the number changes from **0 Actuals** to **Actuals** (with summary statistics such as cardinality listed in the tables), your production actuals will have been fully recorded on Arize!
4. Verify the list of features under **Model Health**. 

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/initial_overview.png)

Arize can automatically configure monitors that are best suited to your data. From the banner at the top of the screen, simply click **Set up Model** then select **Training Version v1.0** and click **NEXT**. Select **Log Loss** as the **Default Metric** and **click** as **Positive Class**. Click **NEXT** three more times with the default settings proposed.

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/initial_setup_banner.png)

You will now see that the baseline has been set and **Drif**, **Data Quality**, and **Performance** monitors have been created!!!

<img src="https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/initial_setup_monitors.png" width="300"/>





# Step 3. First glance
Now that eveything is setup, take a quick glance at the **Drift** tab. From the top right of the screen, pick **-30 Days** as the Date range and **Log Loss** as the metric. You will notice some changes in the log-loss metric starting around day 6. We will investigate this change in later sections.

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/first_glance.png)



# Step 4. Analyzing feature drift and impact on performance
During intial model setup, Arize automatically created a set drift monitors and dashboards for each feature available in the dataset. Drift is represented as the Population Stability Index (PSI) over an given period. We will use these graphs to monitor overall trends. From the model overview page, select the **domain** feature and you should notice the change in domain distribution. Use the PSI graph to select a period of interest. A new domain called **new_site.com** appears that is not part of the training baseline. Notice that the default threshold set by Arize was crossed. Also notice the correlation between the domain drift with the change in model performance in the **Log Loss** graph.

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/domain_drift.png)




# Step 5. Analyzing and detecting bad data
In the production data, bogus values are being recorded against the **device** feature that were not part of the training data. You might have already noticed some drift on the **device** feature. To further investigate, from the **device** feature page, navigate to the **Data Quality** tab. Make sure to view the last 30 days by selecting the correct range in the top right corner of the screen. Arize keeps tracks of feature cardinality as well as fields with no data and can pin-point the exact time that this issue started.

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/bad_data_cardinality.png)

To automatically be notified of bad values getting introduced in production, create a custom monitor for feature device. From the model **Monitors** tab click **New Monitor** and chose **Data Quality Monitor**. Enter these parameters:

* Dimension -> Feature --> device
* Aggregation -> Count
* Filters -> *(feature device != [Desktop, Laptop, Phone, Tablet, Unknown])*
* Set a low threshold


![image.png](https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/bad_data_monitor.png)


**Congratulations!** 🎉   You have created your first monitor that will keep you alerted of any suspicious changes in your production environment.

# Step 6. Analyzing additional model performance
Now that a performance change has been identified, let's see if we can narrow it down to a particular feature. From **Templates** on the left bar, create a new dashboard of type **Scored model** with **Positive Class** set to **click**. Select a few features that might be of interest like **category** and **device**.

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/performance_create_scored.png)

From the **Conditional Prediction Score** graph notice the distribution. In a CTR use-case the actual clicks are sparse and therefore we expect to see much fewer positives than negatives. The x-axis shows the range of prediction values from the model. The blue and or orange stacks show the actual distribution of click (positive) vs noclick (not positive) for a given probability. In an ideal scenario the percentage of positive values (clicks in blue) would match that of the probability that the stack sits on. Therefore the further to the right and the higher is the ratio of blue versus orange. If the model is working properly, at 0% probability to the left there should be no or few positives (blue).

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/performance_distribution.png)

Now from the **Prediction Score vs Actual Score by Day** graph notice the increase in prediction score which we can attribute to the lack of training for domain **new_site.com**. Also notive an increase in actual score which needs further investigation.

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/performance_pred_vs_actual.png)

Now edit the dashboard and add a widget of type **Timeseries**. Edit the widget and set the **Chart type** to **Data Metrics** then add two plots:

* Average Prediction: Use **Aggregation Function** - **Average** with **Average of** set to **Prediction Score**.
* Actual Click Rate: Use **Aggregation Function** - **Percent (Count/Total)** with **Count where** set *(Actuals = click)*

Notice the drop in prediction which coincides with the domain drift from step 4 which resulted in lower CTR performance. But we will focus on the blue line which shows an increase in actual clicks.

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/performance_count_vs_pred.png)

Now refine your graph by adding two more similar. This time add these filters to each one of them respectively:

* Pred Shopping: Use **Aggregation Function** - **Average** with **Average of** set to **Prediction Score** with filter *(feature category = [shopping])*. Also add a filter *(feature domain != [new_site.com])*
* Actual Shopping: Use **Aggregation Function** - **Percent (Count/Total)** with **Count where** set *(Actuals = click)* and with filter *(feature category = [shopping])*. Also add a filter *(feature domain != [new_site.com])*

By excluding domain **new_site.com** and focusing on category **shopping** we can conclude that the difference in predicted CTR versus actual clicks in the later part of the graph is due to the model not accounting for higher CTR for the shopping pages.

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/performance_shopping.png)








# 📚 Conclusion

In this walkthrough we've shown how Arize can be used to log prediction data for a model, pinpoint model performance degredation, and set up monitors to catch future issues. We have been able to identify 3 areas of concern:

1. A new domain appeared in our production system for which the model is untrained for. Re-training to include new_site.com is required.
2. Bogus values are appearing in production. Further investigation is required.
3. The model is pessimistic during parts of the month. The model needs to be re-trained to account for day of the month.

Summary of the analysis and tools:

Concern                   | Detection                       | Root Cause Analysis Tools
--------------------------|---------------------------------|----
Untrained domain or site  | Use a **Drift Monitor**             | - Built-in distribution graphs on each feature<br>- Review distribution at different time intervals
Bad input data            | Use a **Data Quality Monitor**      | - Built-in cardinality graphs<br>- Built-in Empty over Time graph<br>- Create Drift Monitor graph that filters out expected values to focus on bad ones
Inaccuracy                | Use a **Model Performance Monitor** | - Log Loss timeseries widget<br>- Timeseries widget of average prediction score versus Percent (Count/Total) of actual clicks<br>- Filter on features and values to narrow down issues

<br>

Though we covered a lot of ground, this is just scratching the surface of what the Arize platform can do. We urge you to explore more of Arize, either on your own or through one of our many other tutorials.

### About Arize
Arize is an end-to-end ML observability and model monitoring platform. The platform is designed to help ML engineers and data science practitioners surface and fix issues with ML models in production faster with:
- Automated ML monitoring and model monitoring
- Workflows to troubleshoot model performance
- Real-time visualizations for model performance monitoring, data quality monitoring, and drift monitoring
- Model prediction cohort analysis
- Pre-deployment model validation
- Integrated model explainability

### Website
Visit Us At: https://arize.com/model-monitoring/

### Additional Resources
- [What is ML observability?](https://arize.com/what-is-ml-observability/)
- [Playbook to model monitoring in production](https://arize.com/the-playbook-to-monitor-your-models-performance-in-production/)
- [Using statistical distance metrics for ML monitoring and observability](https://arize.com/using-statistical-distance-metrics-for-machine-learning-observability/)
- [ML infrastructure tools for data preparation](https://arize.com/ml-infrastructure-tools-for-data-preparation/)
- [ML infrastructure tools for model building](https://arize.com/ml-infrastructure-tools-for-model-building/)
- [ML infrastructure tools for production](https://arize.com/ml-infrastructure-tools-for-production-part-1/)
- [ML infrastructure tools for model deployment and model serving](https://arize.com/ml-infrastructure-tools-for-production-part-2-model-deployment-and-serving/)
- [ML infrastructure tools for ML monitoring and observability](https://arize.com/ml-infrastructure-tools-ml-observability/)

Visit the [Arize Blog](https://arize.com/blog) and [Resource Center](https://arize.com/resource-hub/) for more resources on ML observability and model monitoring.
