<center><img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/></center>

# Batch Ingestion for Ranking with Multiple Labels

In this tutorial, we'll outline how to send rank, prediction labels, and attributions in batch to Arize using the `pandas` SDK in order to calculate rank-aware metrics and classification metrics. In this case, there can be more than one attribution per prediction. Docs on this ingestion method can be found <a href='https://docs.arize.com/arize/model-types/ranking#case-3-ranking-with-multiple-labels'>here</a>.

## Install and Import Dependencies

In [None]:
!pip install -q arize
from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments, Metrics

import pandas as pd
import datetime
import numpy as np

## Download and Display Data
For this tutorial, we'll use a sample Parquet file that represents a model that predicts the action a user took on a recommended hotel in an ordered list. 

In [None]:
file_url = "https://storage.googleapis.com/arize-assets/documentation-sample-data/data-ingestion/ranking-assets/ranking-multiple-labels-sample-data.parquet"
df = pd.read_parquet(file_url)
df.head()

## Add Relevance Score Column
We'll add a column for relevance scores to our DataFrame in order to get Average Relevancy Score in the Arize platform. The `relevance_score` column will equal 1 if the action `book` is in `attributions` and 0 otherwise. You can find more information on relevance scores in the Arize documentation <a href="https://docs.arize.com/arize/model-types/ranking">here</a>.

In [None]:
df["relevance_score"] = df.apply(
    lambda x: 1 if "book" in x["attributions"] else 0, axis=1
)
df.head()

## Add Timestamps for Predictions
Generate sample timestamps for each prediction. More information on timestamps in Arize can be found <a href="https://docs.arize.com/arize/sending-data/model-schema-reference#6.-timestamp">here</a>.

In [None]:
# Generate new sample timestamps for each row based on search_id
current_time = datetime.datetime.now().timestamp()

earlier_time = (
    datetime.datetime.now() - datetime.timedelta(days=30)
).timestamp()

optional_prediction_timestamps = np.linspace(
    earlier_time, current_time, num=len(df.groupby(by="search_id").size())
)

df_times = pd.DataFrame(
    data={
        "pred_timestamp": pd.Series(
            optional_prediction_timestamps.astype(int)
        ),
        "search_id": df.groupby(by="search_id").size().index,
    }
)
df_times.head()

In [None]:
# Add new timestamps to our dataframe
df = df.merge(df_times, on="search_id", how="left")
df.head()

## Create Arize Client
Sign up/login to your Arize account <a href="https://app.arize.com/auth/login">here</a>. Find your <a href="https://docs.arize.com/arize/sending-data/sdk-reference/python-sdk/arize.init#retrieving-space-and-api-keys">Space and API keys</a>. Copy/paste into the cell below. 

In [None]:
SPACE_KEY = "SPACE_KEY"  # update value here with your Space Key
API_KEY = "API_KEY"  # update value here with your API key

arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

if SPACE_KEY == "SPACE_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ NEED TO CHANGE SPACE AND/OR API_KEY")
else:
    print(
        "✅ Import and Setup Arize Client Done! Now we can start using Arize!"
    )

## Define Schema
Create your <a href="https://docs.arize.com/arize/sending-data-to-arize/model-schema-reference">model schema</a>.

In [None]:
feature_column_names = [
    "prop_log_historical_price",
    "price_usd",
    "promotion_flag",
    "search_destination_id",
    "search_length_of_stay",
    "search_booking_window",
    "search_adults_count",
    "search_children_count",
    "search_room_count",
    "search_saturday_night_bool",
    "destination",
]

schema = Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="pred_timestamp",
    prediction_group_id_column_name="search_id",
    prediction_label_column_name="predicted_action",
    rank_column_name="rank",
    relevance_labels_column_name="attributions",
    relevance_score_column_name="relevance_score",
    feature_column_names=feature_column_names
)

## Log Data to Arize
Log the DataFrame using the <a href="https://docs.arize.com/arize/sending-data-to-arize/data-ingestion-methods/sdk-reference/python-sdk/arize.pandas">pandas API</a>. 

In [None]:
response = arize_client.log(
    dataframe=df,
    model_id="ranking-multiple-labels-batch-ingestion-tutorial",
    model_version="1.0",
    model_type=ModelTypes.RANKING,
    metrics_validation=[Metrics.RANKING, Metrics.RANKING_LABEL],
    validate=True,
    environment=Environments.PRODUCTION,
    schema=schema
)

if response.status_code == 200:
    print(f"✅ You have successfully logged production dataset to Arize")
else:
    print(
        f"Logging failed with response code {response.status_code}, {response.text}"
    )