<img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/>

# <center>Getting Started with the Arize Platform</center>
## <center>Investigating Embedding Drift in Image Classification</center>

**In this walkthrough, we are going to ingest embedding data and look at embedding drift.** 

In this scenario, you are in charge of maintaining an Image Classification model. Your model, resnet-50, will classify the input images into the 10 predefined categories of the Fashion MNIST (see [dataset](https://huggingface.co/datasets/arize-ai/fashion_mnist_quality_drift)). However, once the model is released into production, you notice that the performance of the model has degraded over a period of time.


This notebook will show you how Arize can automatically surface and troubleshoot the reason for this performance degradation by analyzing _image vectors_ associated with the input image so that you can take the right action to retrain your model/clean your data, saving you time and effort to correctly wrangle the datasets and visualize them. In this example, there are worse quality images in the production set during some period of time.

It is worth noting that, according to our research, inspecting embedding drift can surface problems with your data before they cause performance degradation.

In this tutorial, we will start from scratch. We will:
* Download the data
* Preprocess the data
* Train the model
* Extract image vectors and predictions
* Log the inferences into the Arize Plaftorm

We will be using [🤗 Hugging Face](https://huggingface.co/)'s open source libraries to make this process extremely easy. In particular, we will use:
* [🤗 Datasets](https://huggingface.co/docs/datasets/index): a library for easily accessing and sharing datasets, and evaluation metrics for Computer Vision, Natural Language Processing (NLP), and audio tasks.
* [🤗 Transformers](https://huggingface.co/docs/transformers/index): a library to easily download and use state-of-the-art pre-trained models. Using pre-trained models can lower your compute costs, reduce your carbon footprint, and save you time from training a model from scratch.

Before we start, if this is your first Arize Tutorial, we recommend that you complete [Send Data to Arize in 5 Easy Steps](https://colab.research.google.com/github/Arize-ai/client_python/blob/main/arize/examples/tutorials/Arize_Tutorials/Quick_Start/Send_data_to_Arize_in_5_easy_steps_classification.ipynb) before continuing. If you are familiar with sending data to Arize, it only takes a few more lines to send embedding data. 

Let's get started!

# Step 0. Setup and Getting the Data

We will first install 🤗Hugging Face's `datasets` and `transformers` libraries, mentioned above. In addition, we will import some metrics from `sklearn`. Find out more [here](https://github.com/scikit-learn/scikit-learn).

We'll explain each of the imports below as we use them through this tutorial.


## Install Dependencies and Import Libraries 📚

In [None]:
!pip install -q datasets transformers arize umap-learn pandas==1.3.5 pickle5

import tensorflow as tf 
import pandas as pd
import numpy as np
import torch
from datasets import load_dataset
from transformers import AutoModelForImageClassification, TrainingArguments, Trainer, AutoFeatureExtractor
from keras.preprocessing.image import image
from transformers import ConvNextFeatureExtractor, ConvNextModel, ImageFeatureExtractionMixin
from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor

from sklearn.metrics import accuracy_score, f1_score

from matplotlib import pyplot as plt

from datetime import datetime
import uuid
from arize.pandas.logger import Client, Schema
from arize.utils.types import Environments, ModelTypes, EmbeddingColumnNames

## Check if GPU is available
Here we use Pytorch to check whether a GPU is available or not. When appropriate, we will use PyTorch's `nn.Module.to()` method to ensure that the model will run on the GPU if we have one.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## **🌐 Download the Data**

The easiest way to load a dataset is from the [Hugging Face Hub](https://huggingface.co/datasets). There are already over 6000 datasets on the Hub. The [arize-ai/fashion_mnist_quality_drift](https://huggingface.co/datasets/arize-ai/fashion_mnist_quality_drift) dataset has been crafted by Arize for this example notebook.

Thanks to Hugging Face 🤗 Datasets, we can download the dataset in one line of code. The `Dataset` object comes equipped with methods that make it very easy to inspect, pre-process, and post-process your data.


In [None]:
dataset = load_dataset("arize-ai/fashion_mnist_label_drift")
dataset

You can select the splits of the dataset as you would in a dictionary.

In [None]:
train_ds, val_ds, prod_ds = dataset['training'], dataset['validation'], dataset['production']

## Inspect the Data

It is often convenient to convert a `Dataset` object to a Pandas `DataFrame` so we can access high-level APIs for data visualization. 🤗 Datasets provides a `set_format()` method that allows us to change the output format of the `Dataset`. This does not change the underlying data format, an Arrow table. When the `DataFrame` format is no longer needed, we can reset the output format using `reset_format()`.

In [None]:
train_ds.set_format("pandas")
display(train_ds[:].head())
train_ds.reset_format()

Let's also take a look at the categories we will be classifiying into:

In [None]:
labels = train_ds.features["label"]
labels

# Step 1. Setting up your Image Classification Model

## Pre-processing the data

In order to input our data into our model for fine-tuning, we first need to perform some transformations: *convert to RGB* and *feature extraction*.




### Convert greyscale images to RGB

We define the function `convert_to_rgb()` and we apply it to the entire dataset using the `map()` method which will convert all the images from greyscale to RGB.

In [None]:
def convert_to_rgb(batch):
  return {'image': [image.convert("RGB") for image in batch['image']]}

In [None]:
process_batch_size = 100
train_ds = train_ds.map(convert_to_rgb, batched = True, batch_size = process_batch_size)
val_ds = val_ds.map(convert_to_rgb, batched = True, batch_size = process_batch_size)
prod_ds = prod_ds.map(convert_to_rgb, batched = True, batch_size = process_batch_size)

### Feature Extractor

For audio and vision tasks, a feature extractor processes the audio signal or image into the correct input format. 🤗 Transformers provides the `AutoFeatureExtractor` class, which allows us to quickly download the FeatureExtractor required by the pre-trained model of our choosing. In this tutorial, we will use `microsoft/resnet-50`.


In [None]:
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50", do_resize = True)
feature_extractor

Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset. Training deep learning neural network models on more data can result in more skillful models, and the augmentation techniques can create variations of the images that can improve the ability of the fit models to generalize what they have learned to new images.

With the feature extractor configuration above, we can now apply some transformations to augment our dataset and improve training results. In this case we chose transformations from the torchvision package: [RandomResizedCrop](https://pytorch.org/vision/main/generated/torchvision.transforms.RandomResizedCrop.html?highlight=randomresizedcrop#torchvision.transforms.RandomResizedCrop) and [Normalize](https://pytorch.org/vision/main/generated/torchvision.transforms.Normalize.html?highlight=normalize#torchvision.transforms.Normalize).

In [None]:
normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
_transforms = Compose([RandomResizedCrop(feature_extractor.size), ToTensor(), normalize])

### Image augmentation on the entire training set

Here we will use 🤗 Dataset’s [`with_transform()`](https://huggingface.co/docs/datasets/package_reference/main_classes.html?#datasets.Dataset.with_transform) method to apply the transforms over the entire dataset in batches. Since the transformations are meant to help training, we only apply them to the training and validation dataset.




In [None]:
def augmentation(dataset):
    dataset["pixel_values"] = [_transforms(img.convert("RGB")) for img in dataset["image"]]
    del dataset["image"]      #deleting dataset["image"] as the model only takes in "pixel_values" as inputs  
    return dataset

In [None]:
train_ds = train_ds.with_transform(augmentation)
val_ds = val_ds.with_transform(augmentation)

## Build the Model

Similar to how we obtained the feature extractor, 🤗 Transformers provides the `AutoModelForImageClassification` class, which allows us to quickly download a pre-trained model with a token classification [task head](https://huggingface.co/course/en/chapter2/2?fw=pt#model-heads-making-sense-out-of-numbers) on top. The pre-trained model to use in this tutorial is [microsoft/resnet-50](https://huggingface.co/microsoft/resnet-50).

It is important to pass `output_hidden_states = True` to be able to compute the embedding vectors associated with the image (explained below).

_NOTE_: You may skip the fine-tuning section if you would like to use a model that Arize has already fine-tuned for you. To skip, set `SKIP_TRAINING = True` and go ahead to [_B) Download the model_](#B\)-Download-the-fine-tuned-model).


In [None]:
model_name = f"microsoft/resnet-50"
SKIP_TRAINING = False # Make True if you want to skip training

### A) Fine-tune the model

Before downloading the pre-trained model, we will need to provide the mapping of each label to a label ID (integer) and vice versa to help the model recover the label name from the label ID.

In [None]:
id2label = {idx: label for idx, label in enumerate(labels.names)}
label2id = {label: idx for idx, label in enumerate(labels.names)}

Let's download the pre-trained model.

In [None]:
model = AutoModelForImageClassification.from_pretrained(
    model_name,
    num_labels=labels.num_classes,
    id2label=id2label,
    label2id=label2id,
    output_hidden_states=True,
    ignore_mismatched_sizes=True
)


Further, we use the [`TrainingArguments`](https://huggingface.co/docs/transformers/v4.21.1/en/main_classes/trainer#transformers.TrainingArguments) class to define the training parameters. This class stores a lot of information and gives you control over the training and evaluation.

In [None]:
training_batch_size = 8
training_epochs = 3
logging_steps= len(train_ds) // training_batch_size

training_args = TrainingArguments(
    output_dir=model_name,
    per_device_train_batch_size=training_batch_size,
    per_device_eval_batch_size=training_batch_size,
    evaluation_strategy="epoch",
    num_train_epochs=training_epochs,                        
    fp16=True,
    logging_steps=logging_steps,
    log_level="error",
    optim="adamw_torch",
    learning_rate=2e-4,
    remove_unused_columns=False,
)


Now, we will define the evaluation function that calculates the accuracy and f1 score of the model.

In [None]:
def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions[0].argmax(-1)
    f1 = f1_score(labels, preds, average="weighted")
    acc = accuracy_score(labels, preds)
    return {"accuracy": acc, "f1": f1}

Next, we need a _data collator_ so that we can unpack and stack the batches that are coming in as lists of dicts into batch tensors.

In [None]:
def collate_fn(dataset):
    pixel_values = torch.stack([ds["pixel_values"] for ds in dataset])
    labels = torch.tensor([ds["label"] for ds in dataset])
    return {"pixel_values": pixel_values, "labels": labels}

Finally, we can fine-tune our model using the `Trainer` class.

In [None]:
if SKIP_TRAINING == False:
  trainer = Trainer(
      model=model,
      args=training_args,
      data_collator=collate_fn,
      train_dataset=train_ds,
      eval_dataset=val_ds,
      tokenizer=feature_extractor,
      compute_metrics=compute_metrics,
  )

  print("Evaluation before training")
  eval = trainer.evaluate(eval_dataset=val_ds)
  eval_df = pd.DataFrame({'Epoch':0, 'Validation Loss': eval['eval_loss'], 'Accuracy': eval['eval_accuracy'], 'F1': eval['eval_f1']}, index=[0])
  display(eval_df)

  torch.cuda.empty_cache() # Free up some memory

  print("\n\nTraining...")
  trainer.train()

### B) Download the fine-tuned model 

If you decided to skip step 1, you can download the already fine-tuned model [arize-ai/resnet-50-fashion-mnist-quality-drift](https://huggingface.co/arize-ai/resnet-50-fashion-mnist-quality-drift) from Arize's page in the Hugging Face Hub.


In [None]:
if SKIP_TRAINING == True: # Make sure you marked SKIP_TRAINING = True if you wanted to skip training
    model_ckpt = f"arize-ai/resnet-50-fashion-mnist-quality-drift"

    model = (AutoModelForImageClassification
            .from_pretrained(model_ckpt, 
                             num_labels = tags.num_classes,
                             output_hidden_states=True
                             )
            .to(device))

# Step 2. Post-Processing your data

## Get model outputs
Now we will extract the prediction labels and the image embedding vectors. The latter are formed from the hidden states of our pre-trained (and then fine-tuned) model. We will choose the last hidden state layer, with a shape of `(batch_size, embedding_size, 7, 7)`*. To obtain the embedding vector, we will average on the last 2 dimensions.


***NOTE:** The last 2 components of the shape (7, 7) are due to the output size of the last convolutional layer in the resnet-50 architecture. See Table 1 on page 5 in [Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf) for more information. In the same table, you can also see that the `embedding_size` is 2048.

In [None]:
def postprocess(batch):
    inputs = feature_extractor([x.convert("RGB") for x in batch["image"]], return_tensors="pt").to(device)
    with torch.no_grad():
        outputs = model(**inputs)

    pred_labels = torch.argmax(outputs.logits, dim=1).cpu().numpy()

    last_hidden_states = outputs.hidden_states[-1]
    embeddings= torch.mean(last_hidden_states, (2,3)).cpu().numpy()

    return {'pred_label':pred_labels, 'image_vector': embeddings}

Before applying the post-processing function defined above, we need to apply  [`reset_format()`](https://huggingface.co/docs/datasets/v2.4.0/en/package_reference/main_classes#datasets.Dataset.reset_format) on the training and validation set in order to reset the dataset to their original formats that contained the `"image"` feature.

In [None]:
train_ds.reset_format()
val_ds.reset_format()

Next, we apply the post-processing to all datasets.

In [None]:
train_ds = train_ds.map(postprocess, batched = True, batch_size = process_batch_size)
val_ds = val_ds.map(postprocess, batched = True, batch_size = process_batch_size)
prod_ds = prod_ds.map(postprocess, batched = True, batch_size = process_batch_size)

# Step 3. Prepare your data to be sent to Arize


From this point forward, it is convenient to use Pandas DataFrames. We can do so easily using the `to_pandas()` method that returns a Pandas DataFrame.

In [None]:
train_df = train_ds.to_pandas()
val_df = val_ds.to_pandas()
prod_df = prod_ds.to_pandas()

## Update the timestamps

The data that you are working with was constructed in April of 2022. Hence, we will update the timestamps so they are current at the time that you're sending data to Arize.

In [None]:
last_ts = max(prod_df['prediction_ts'])
now_ts = datetime.timestamp(datetime.now())
delta_ts = now_ts - last_ts    

train_df['prediction_ts'] = (train_df['prediction_ts'] + delta_ts).astype(float)
val_df['prediction_ts'] = (val_df['prediction_ts'] + delta_ts).astype(float)
prod_df['prediction_ts'] = (prod_df['prediction_ts'] + delta_ts).astype(float)

## Add prediction ids

The Arize platform uses prediction IDs to link a prediction to an actual. Visit the [Arize documentation](https://docs.arize.com/arize/data-ingestion/model-schema/5.-prediction-id?q=prediction_id) for more details.

You can generate prediction IDs as follows:

In [None]:
def add_prediction_id(df):
    return [str(uuid.uuid4()) for _ in range(df.shape[0])]

In [None]:
train_df['prediction_id'] = add_prediction_id(train_df)
val_df['prediction_id'] = add_prediction_id(val_df)
prod_df['prediction_id'] = add_prediction_id(prod_df)

## Convert integer labels to strings



In [None]:
train_df['label'] = train_df['label'].map(lambda label: id2label[label])
train_df['pred_label'] = train_df['pred_label'].map(lambda label: id2label[label])

val_df['label'] = val_df['label'].map(lambda label: id2label[label])
val_df['pred_label'] = val_df['pred_label'].map(lambda label: id2label[label])

prod_df['label'] = prod_df['label'].map(lambda label: id2label[label])
prod_df['pred_label'] = prod_df['pred_label'].map(lambda label: id2label[label])

# Step 4. Sending Data into Arize 💫

## Select the columns we want to send to Arize (optional)

This step is not really necessary, since we will select the columns we want to send to Arize using the `Schema` definition (below). However, for the purpose of visibility, this is our final `DataFrame` with the data that will be sent to Arize.

In [None]:
arize_columns = [
    'prediction_id', 
    'prediction_ts', 
    'label',
    'pred_label',
    'image_vector',
    'url'
    ]

train_df = train_df[arize_columns]
val_df = val_df[arize_columns]
prod_df = prod_df[arize_columns]

train_df.head()

## Import and Setup Arize Client

The first step is to setup the Arize client. After that we will log the data.

Copy the Arize `API_KEY` and `SPACE_KEY` from your admin page (shown below) to the variables in the cell below. We will also be setting up some metadata to use across all logging.

<img src="https://storage.googleapis.com/arize-assets/fixtures/copy-keys.png" width="700">

In [None]:
SPACE_KEY = "SPACE_KEY"
API_KEY = "API_KEY"
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)
model_id = "CV-demo-fashion-mnist-quality-drift"
model_version = "1.0"
model_type = ModelTypes.SCORE_CATEGORICAL
if SPACE_KEY == "SPACE_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ NEED TO CHANGE SPACE AND/OR API_KEY")
else:
    print("✅ Import and Setup Arize Client Done! Now we can start using Arize!")


Now that our Arize client is set up, let's go ahead and log all of our data to the platform. For more details on how **`arize.pandas.logger`** works, visit our documentation.

[![Buttons_OpenOrange.png](https://storage.googleapis.com/arize-assets/fixtures/Buttons_OpenOrange.png)](https://docs.arize.com/arize/sdks-and-integrations/python-sdk/arize.pandas)

## Define the Schema 

A Schema instance specifies the column names for corresponding data in the dataframe. While we could define different Schemas for training and production datasets, the dataframes have the same column names, so the Schema will be the same in this instance.

To ingest non-embedding features, it suffices to provide a list of column names that contain the features in our dataframe. Embedding features, however, are a little bit different.

Arize allows you to ingest not only the embedding vector but the raw data associated with that embedding, or a URL link to that raw data. Therefore, up to 3 columns can be associated with the same _embedding object_*. To be able to do this, Arize's SDK provides the `EmbeddingColumnNames` class, used below.


***NOTE**: This is how we refer to the 3 possible pieces of information that can be sent as embedding objects:
* Embedding `vector` (required)
* Embedding `data` (optional): raw text associated with the embedding vector
* Embedding `link_to_data` (optional): link to the data file (image, audio, ...) associated with the embedding vector



Learn more [here](https://docs.arize.com/arize/data-ingestion/model-schema/7b.-embedding-features).

In [None]:
features = []
arize_columns = [
    'prediction_id', 
    'prediction_ts', 
    'label',
    'pred_label',
    'image_vector',
    'url'
    ]


embedding_features = [
    EmbeddingColumnNames(
        vector_column_name="image_vector",  # Will be name of embedding feature in the app
        link_to_data_column_name="url",
    ),
]

# Define a Schema() object for Arize to pick up data from the correct columns for logging
schema = Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="pred_label",
    actual_label_column_name="label",
    feature_column_names=features,
    embedding_feature_column_names=embedding_features
)



## Log Training Data

In [None]:
# Logging Training DataFrame
response = arize_client.log(
    dataframe=train_df,
    model_id=model_id,
    model_version=model_version,
    model_type=model_type,
    environment=Environments.TRAINING,
    schema=schema,
    sync=True
)


# If successful, the server will return a status_code of 200
if response.status_code != 200:
    print(f"❌ logging failed with response code {response.status_code}, {response.text}")
else:
    print(f"✅ You have successfully logged training set to Arize")


## Log Validation Data

In [None]:
# Logging Training DataFrame
response = arize_client.log(
    dataframe=val_df,
    model_id=model_id,
    model_version=model_version,
    batch_id="validation",
    model_type=model_type,
    environment=Environments.VALIDATION,
    schema=schema,
    sync=True
)


# If successful, the server will return a status_code of 200
if response.status_code != 200:
    print(f"❌ logging failed with response code {response.status_code}, {response.text}")
else:
    print(f"✅ You have successfully logged training set to Arize")


## Log Production Data

In [None]:
# send production data
response = arize_client.log(
    dataframe=prod_df,
    model_id=model_id,
    model_version=model_version,
    model_type=model_type,
    environment=Environments.PRODUCTION,
    schema=schema,
    sync=True
)

if response.status_code != 200:
    print(f"❌ logging failed with response code {response.status_code}, {response.text}")
else:
    print(f"✅ You have successfully logged production set to Arize")

# Step 5. Confirm Data in Arize ✅
Note that the Arize platform takes about 15 minutes to index embedding data. While the model should appear immediately, the data will not show up until the indexing is complete. Feel free to head over to the **Data Ingestion** tab for your model to watch Arize work its magic!🔮

You will be able to see the predictions, actuals, and feature importances that have been sent in the last 30 minutes, last day or last week.

An example view of the Data Ingestion tab from a model, when data is sent continuously over 30 minutes, is shown in the image below.

<img src="https://storage.googleapis.com/arize-assets/fixtures/Embeddings/fashion_mnist_ingestion.png" width="700">

# Check the Embedding Data in Arize
Now, you can see how Arize surfaces the low quality images before your customer does and troubleshoots the degradation in performance to save you the time and effort. 

First, set the baseline to the training set that we logged before.

<img src="https://storage.googleapis.com/arize-assets/fixtures/Embeddings/fashion mnist baseline setup.gif" width="700">


If your model contains embedding data, you will see it in your Model's Overview page. 

<img src="https://storage.googleapis.com/arize-assets/fixtures/Embeddings/fashion_mnist_embedding.png" width="700">

 Click on the Embedding Name or the Euclidean Distance value to see how your embedding data is drifting over time. In the picture below we represent the global euclidean distance between your production set (at different points in time) and the baseline (which we set to be our training set). We can see there is a period of a week where suddenly the distance is remarkably higher. This shows us that during that time image data was sent to our model that was different than what it was trained on. This is the period of time when the quality of some images is worse.
 
<img src="https://storage.googleapis.com/arize-assets/fixtures/Embeddings/fashion_mnist _drift.png" width="700">

In addition to the drift tracking plot above, below you can find the UMAP visualization of your data, according to the point in time selected. Notice that the production data and our baseline (training) data are superimposed, which is indicative that the model is seeing data in production similar to the data it was trained on.

<img src="https://storage.googleapis.com/arize-assets/fixtures/Embeddings/fashion_mnist_no_drift_umap.png" width="700">

Next, select a point in time when the drift was high and select a UMAP visualization in 2D. We can see that both training and production data are superimposed for the most part, but another cluster of production data has appeared. This indicates that the model is seeing data in production qualitatively different to the data it was trained on, and in this case causing performance degradation.

<img src="https://storage.googleapis.com/arize-assets/fixtures/Embeddings/fashion_mnist_drift_umap.png" width="700">

For further inspection, you may select a 3D UMAP view and clicked _Explore UMAP_ to expand the view. With this view we can interact in 3D with our dataset. We can zoom, rotate, and drag so we can see the areas of our dataset that are most interesting to us. We can also use the Lasso feature to select a part of the UMAP plot for further investigation. Check out the workflow below:

<img src="https://storage.googleapis.com/arize-assets/fixtures/Embeddings/umap 3d fashion mnist drift.gif" width="700">

You can see that the coloring has been made to distinguish production data vs baseline data (training in this example). There are more coloring options to help understand/debug your dataset, including:
* Color by prediction label
* Color by actual label
* Color by accuracy (correct vs incorrect predictions)
* Color by Confusion Matrix

Here is an example of coloring datapoint by Prediction label. 


<img src="https://storage.googleapis.com/arize-assets/fixtures/Embeddings/fashion_mnist_umap_3d_pred_label.gif" width="700">


# Wrap Up 🎁
Congratulations, you've now sent your first machine learning embedding data to the Arize platform!!

Additionally, if you want to remove this example model from your account, just click **Models** -> **CV-demo-fashion-mnist-quality-drift** -> **config** -> **delete**

### Overview
Arize is an end-to-end ML observability and model monitoring platform. The platform is designed to help ML engineers and data science practitioners surface and fix issues with ML models in production faster with:
- Automated ML monitoring and model monitoring
- Workflows to troubleshoot model performance
- Real-time visualizations for model performance monitoring, data quality monitoring, and drift monitoring
- Model prediction cohort analysis
- Pre-deployment model validation
- Integrated model explainability

### Website
Visit Us At: https://arize.com/model-monitoring/

### Additional Resources
- [What is ML observability?](https://arize.com/what-is-ml-observability/)
- [Monitor Unstructured Data with Arize](https://arize.com/blog/monitor-unstructured-data-with-arize)
- [Getting Started With Embeddings Is Easier Than You Think](https://arize.com/blog/getting-started-with-embeddings-is-easier-than-you-think)
- [Playbook to model monitoring in production](https://arize.com/the-playbook-to-monitor-your-models-performance-in-production/)
- [Using statistical distance metrics for ML monitoring and observability](https://arize.com/using-statistical-distance-metrics-for-machine-learning-observability/)
<!-- - [ML infrastructure tools for data preparation](https://arize.com/ml-infrastructure-tools-for-data-preparation/) -->
- [ML infrastructure tools for model building](https://arize.com/ml-infrastructure-tools-for-model-building/)
- [ML infrastructure tools for production](https://arize.com/ml-infrastructure-tools-for-production-part-1/)
<!-- - [ML infrastructure tools for model deployment and model serving](https://arize.com/ml-infrastructure-tools-for-production-part-2-model-deployment-and-serving/) -->

- [ML infrastructure tools for ML monitoring and observability](https://arize.com/ml-infrastructure-tools-ml-observability/)

Visit the [Arize Blog](https://arize.com/blog) and [Resource Center](https://arize.com/resource-hub/) for more resources on ML observability and model monitoring.
