# <center>Phoenix in Flight</center>
## <center>Surfacing Feature Drift and Data Quality Issues for a Fraud-Detection Model</center>

Imagine you maintain a fraud-detection service for your e-commerce company. In the past few weeks, there's been an alarming spike in undetected cases of fraudulent credit card transactions. These false negatives are hurting your bottom line, and you've been tasked with solving the issue.

Phoenix provides opinionated workflows to surface feature drift and data quality issues quickly so you can get straight to the root-cause of the problem. As you'll see, your fraud-detection service is receiving more and more traffic from an untrustworthy merchant, and a missing feature in your pipeline is causing your model's false negative rate to skyrocket.

In this tutorial, you will:
* Download curated datasets of credit card transaction and fraud-detection data
* Investigate troublesome "slices" of your features to detect drift caused by a fraudulent merchant
* Uncover a data quality issue causing a spike in false negatives
* Generate a report to share these insights with your co-workers and other stakeholders at your company

Let's get started!

### 1. Install Dependencies and Import Libraries 📚

In [None]:
%pip install -q arize-phoenix

In [None]:
import pandas as pd
import phoenix as px

### 2. Download the Data 📊

Load your training and production data into two pandas dataframes and inspect a few rows of the training dataframe.

In [None]:
training_dataframe = pd.read_parquet(
    "https://storage.googleapis.com/arize-assets/phoenix/datasets/structured/credit-card-fraud/credit_card_fraud_train.parquet",
)
production_dataframe = pd.read_parquet(
    "https://storage.googleapis.com/arize-assets/phoenix/datasets/structured/credit-card-fraud/credit_card_fraud_production.parquet",
)
training_dataframe.head()

The columns of the dataframe are:
- **prediction_id:** the unique ID for each prediction
- **prediction_timestamp:** the timestamps of your predictions
- **predicted_label:** the label your model predicted
- **predicted_score:** the score of each prediction
- **actual_label:** the true, ground-truth label for each prediction (fraud vs. not_fraud)
- **age:** a tag used to filter your data in the Phoenix UI
- the rest of the columns are features

### 3. Launch Phoenix 🔥🐦

#### a) Define Your Schema

To launch Phoenix with your data, you first need to define a schema that tells Phoenix which columns of your dataframes correspond to features, predictions, actuals (i.e., ground truth), tags, etc.

In [None]:
schema = px.Schema(
    prediction_id_column_name="prediction_id",
    prediction_label_column_name="predicted_label",
    prediction_score_column_name="predicted_score",
    actual_label_column_name="actual_label",
    timestamp_column_name="prediction_timestamp",
    tag_column_names=["age"],
)

You'll notice that the schema above doesn't explicitly specify features. That's because feature columns are automatically inferred if you don't pass `feature_column_names` to your `Schema` object.

#### b) Define Your Datasets 
Next, define your primary and reference datasets. In this case, your reference dataset contains training data and your primary dataset contains production data.

In [None]:
primary_dataset = px.Dataset(dataframe=production_dataframe, schema=schema, name="primary")
reference_dataset = px.Dataset(dataframe=training_dataframe, schema=schema, name="reference")

#### c) Create a Phoenix Session

In [None]:
session = px.launch_app(primary=primary_dataset, reference=reference_dataset)

#### d) Launch the Phoenix UI

In [None]:
session.view()

### 4. Explore Your Data 📈

Phoenix is under active development. At the moment, we display your model schema and a few data quality statistics. Check back soon for more updates.

### 5. Close the App 🧹

When you're done, don't forget to close the app.

In [None]:
px.close_app()