# <center>Phoenix in Flight</center>
## <center>Investigating Embedding Drift for a Sentiment Classification Model</center>

Imagine you're in charge of maintaining a model that takes as input online reviews of your U.S.-based product and classifies the sentiment of each review as positive, negative, or neutral. Your model initially performs well in production, but its performance gradually degrades over time.

Phoenix helps you surface the reason for this regression by analyzing the *embeddings* representing the text of each review. Your model was trained on English reviews, but as you'll discover, it's encountering Spanish reviews in production that it can't correctly classify.

According to our research, embedding drift often precedes performance degradation. So Phoenix can help you proactively detect and fix this issue before it becomes noticable to your users.

In this tutorial, you will:
* Download curated datasets of embeddings and predictions
* Visually explore embeddings in Phoenix
* Investigate problematic clusters
* Export data for labeling and re-training

Let's get started!

### 1. Install Dependencies and Import Libraries 📚

In [None]:
%pip install -q arize-phoenix

In [None]:
import pandas as pd
import phoenix as px

### 2. Download the Data 📊

Load your training and production data into two pandas dataframes and inspect a few rows of the training dataframe.

In [None]:
training_dataframe = pd.read_parquet(
    "https://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/nlp/sentiment-classification-language-drift/sentiment_classification_language_drift_training.parquet",
)
production_dataframe = pd.read_parquet(
    "https://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/nlp/sentiment-classification-language-drift/sentiment_classification_language_drift_production.parquet",
)
training_dataframe.head()

The columns of the dataframe are:
- **prediction_ts:** the Unix timestamps of your predictions
- **review_age**, **reviewer_gender**, **product_category**, **language:** the features of your model
- **text:** the text of each review
- **text_vector:** the embedding vectors representing each review
- **pred_label:** the label your model predicted
- **label:** the ground-truth label for each review

### 3. Launch Phoenix 🔥🐦

#### a) Define Your Schema

To launch Phoenix with your data, you first need to define a schema that tells Phoenix which columns of your dataframes correspond to features, predictions, actuals (i.e., ground truth), embeddings, etc.

The trickiest part is defining embedding features. In this case, each embedding feature has two pieces of information: the embedding vector itself contained in the "text_vector" column and the review text contained in the "text" column.

In [None]:
embedding_features = {
    "text_embedding": px.EmbeddingColumnNames(
        vector_column_name="text_vector", raw_data_column_name="text"
    ),
}
schema = px.Schema(
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="pred_label",
    actual_label_column_name="label",
    embedding_feature_column_names=embedding_features,
)

You'll notice that the schema above doesn't explicitly specify features. That's because feature columns are automatically inferred if you don't pass `feature_column_names` to your `Schema` object.

#### b) Define Your Datasets 
Next, define your primary and reference datasets. In this case, your reference dataset contains training data and your primary dataset contains production data.

In [None]:
primary_dataset = px.Dataset(dataframe=production_dataframe, schema=schema, name="primary")
reference_dataset = px.Dataset(dataframe=training_dataframe, schema=schema, name="reference")

#### c) Create a Phoenix Session

In [None]:
session = px.launch_app(primary=primary_dataset, reference=reference_dataset)

#### d) Launch the Phoenix UI

In [None]:
session.view()

### 4. Explore Your Data 📈

Phoenix is under active development. At the moment, we display your model schema and a few data quality statistics. Check back soon for more updates.

### 5. Close the App 🧹

When you're done, don't forget to close the app.

In [None]:
px.close_app()