# Phoenix

ML observability in your notebook 🔥🐦

In [None]:
import pandas as pd
import phoenix as px
from phoenix.datasets import Dataset, EmbeddingColumnNames, Schema

First, load your training and production data into two pandas dataframes.

Run this cell for a sentiment classification example.

In [None]:
# sentiment classification data
training_dataframe = pd.read_parquet(
    "https://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/nlp/sentiment-classification-language-drift/sentiment_classification_language_drift_training.parquet",
)
production_dataframe = pd.read_parquet(
    "https://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/nlp/sentiment-classification-language-drift/sentiment_classification_language_drift_production.parquet",
)
training_dataframe.head()

Run this cell for a credit card fraud example.

In [None]:
# credit card fraud data
training_dataframe = pd.read_parquet(
    "https://storage.googleapis.com/arize-assets/phoenix/datasets/structured/credit-card-fraud/credit_card_fraud_train.parquet",
)
production_dataframe = pd.read_parquet(
    "https://storage.googleapis.com/arize-assets/phoenix/datasets/structured/credit-card-fraud/credit_card_fraud_production.parquet",
)
training_dataframe.head()


Next, define your schema. The schema tells phoenix which columns of your dataframes correspond to features, predictions, actuals (i.e., ground truth), embeddings, etc.

You'll notice that the schemas below don't explicitly specify features. That's because feature are automatically discovered if you don't pass `feature_column_names` to your `Schema` object.

Run this cell for the sentiment classification schema.

In [None]:
# sentiment classification schema
schema = Schema(
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="pred_label",
    actual_label_column_name="label",
    embedding_feature_column_names={
        "text_embedding": EmbeddingColumnNames(
            vector_column_name="text_vector", raw_data_column_name="text"
        ),
    },
)

Run this cell for the credit card fraud schema.

In [None]:
# credit card fraud schema
schema = Schema(
    prediction_id_column_name="prediction_id",
    prediction_label_column_name="predicted_label",
    prediction_score_column_name="predicted_score",
    actual_label_column_name="actual_label",
    timestamp_column_name="prediction_timestamp",
    tag_column_names=["age"],
)

Define your primary and reference datasets. In most cases, your reference dataset is the training data for your model and your primary dataset is your production data.

In [None]:
primary_dataset = Dataset(dataframe=production_dataframe, schema=schema, name="primary")
reference_dataset = Dataset(dataframe=training_dataframe, schema=schema, name="reference")

Launch a `phoenix` session and open the UI.

In [None]:
session = px.launch_app(primary=primary_dataset, reference=reference_dataset)

In [None]:
session.view()

When you're done, close the app.

In [None]:
px.close_app()