# Overview

Observability for all model types (LLM, NLP, CV, Tabular)

Phoenix Inferences allows you to observe the performance of your model through visualizing all the model’s inferences in one interactive UMAP view.

This powerful visualization can be leveraged during EDA to understand model drift, find low performing clusters, uncover retrieval issues, and export data for retraining / fine tuning.

# Quickstart

The following Quickstart can be executed in a Jupyter notebook or Google Colab.

We will begin by logging just a training set. Then proceed to add a production set for comparison.

## Step 1: Install and load dependencies

Use `pip` or `conda` to install `arize-phoenix`.

In [None]:
!pip install arize-phoenix

In [None]:
import phoenix as px

## Step 2: Prepare Model Data

Phoenix visualizes data taken from pandas dataframe, where each row of the dataframe compasses all the information about each inference (including feature values, prediction, metadata, etc.)

For this Quickstart, we will show an example of visualizing the inferences from a computer vision model. See example notebooks for all model types [here](https://docs.arize.com/phoenix/notebooks).

Let’s begin by working with the training set for this model.

### Download the dataset and load it into a Pandas dataframe.

In [None]:
import pandas as pd
train_df = pd.read_parquet("http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_training.parquet")

### Preview the dataframe (optional)

Note that each row contains all the data specific to this CV model for each inference.

In [None]:
train_df.head()

## Step 3: Define a Schema

Before we can log these inferences, we need to define a Schema object to describe them.

The Schema object informs Phoenix of the fields that the columns of the dataframe should map to.

Here we define a Schema to describe our particular CV training set:

In [None]:
train_schema = px.Schema(
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="predicted_action",
    actual_label_column_name="actual_action",
    embedding_feature_column_names={
        "image_embedding": px.EmbeddingColumnNames(
            vector_column_name="image_vector",
            link_to_data_column_name="url",
        ),
    },
)

***Important:*** The fields used in a Schema will vary depending on the model type that you are working with.

For examples on how Schema are defined for other model types (NLP, tabular, LLM-based applications), see example notebooks under [https://docs.arize.com/phoenix/notebooks#embedding-analysis](Embedding Analysis) and [https://docs.arize.com/phoenix/notebooks#structured-data-analysis](Structured Data Analysis).

## Step 4: Wrap into Inference Object

Wrap your `train_df` and schema `train_schema` into a Phoenix `inferences` object:

In [None]:
train_ds = px.Inferences(dataframe=train_df, schema=train_schema, name="training")

## Step 5: Launch Phoenix!

We are now ready to launch Phoenix with our Inferences!

Here, we are passing `train_ds` as the `primary` inferences, as we are only visualizing one inference set (see Step 6 for adding additional inference sets).

In [None]:
session = px.launch_app(primary=train_ds)

Running this will fire up a Phoenix visualization. Follow in the instructions in the output to view Phoenix in a browser, or in-line in your notebook. 
Optional - try the following exercises to familiarize yourself more with Phoenix:


**You are now ready to observe the training set of your model!**

# Optional actions and activities

## Exercises to familiarize yourself more with Phoenix:

- [ ] Click on `image_embedding` under the Embeddings section to enter the UMAP projector view
- [ ] Select a point where the model accuracy is <0.78, and see the embedding visualization below update to include only points from this selected timeframe
- [ ] Select the cluster with the lowest accuracy; from the list of automatic clusters generated by Phoenix
  - Note that Phoenix automatically generates clusters for you on your data using a clustering algorithm called HDBSCAN (more information: [https://docs.arize.com/phoenix/concepts/embeddings-analysis#clusters](https://docs.arize.com/phoenix/concepts/embeddings-analysis#clusters)
- [ ] Change the colorization of your plot - e.g. select Color By ‘correctness’, and ‘dimension'
- [ ] Describe in words an insight you've gathered from this visualization

*Discuss your answers in our [https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q](community)!*

## Export data

In [None]:
prod_df = pd.read_parquet("http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_training.parquet")
prod_schema = px.Schema(
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="predicted_action",
    embedding_feature_column_names={
        "image_embedding": px.EmbeddingColumnNames(
            vector_column_name="image_vector",
            link_to_data_column_name="url",
        ),
    },
)
prod_ds = px.Inferences(dataframe=prod_df, schema=prod_schema, name="production")