<center><img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/></center>

# Batch Ingestion for Multi-Label Multiclass Classification

In this tutorial, we'll outline how to send prediction labels and actuals from multiclass models to Arize in batch. Multiclass classification models are defined as a classification model with more than two classes. For a multiclass model, Multi-Label indicates the output of the model can be assigned to multiple classes. For more information on multiclass ingestion, please see our documentation <a href="https://docs.arize.com/arize/model-types/multiclass-classification">here</a>. For a full list of all model types, please see our documentation <a href="https://docs.arize.com/arize/">here</a>.

## Install and Import Dependencies

In [None]:
!pip install -q arize
from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments

import pandas as pd
import datetime
import numpy as np

## Download and Display Data
For this tutorial, we will use a sample Parquet file containing 100 predictions. 

In [None]:
file_url = "https://storage.googleapis.com/arize-assets/documentation-sample-data/data-ingestion/multiclass-classification-assets/multi-label-multiclass-sample-data.parquet"
df = pd.read_parquet(file_url)
df.head()

## Add Timestamps for Predictions
Generate sample timestamps for each prediction. More information on timestamps in Arize can be found <a href="https://docs.arize.com/arize/sending-data/model-schema-reference#6.-timestamp">here</a>.

In [None]:
current_time = datetime.datetime.now().timestamp()

earlier_time = (
    datetime.datetime.now() - datetime.timedelta(days=30)
).timestamp()

optional_prediction_timestamps = np.linspace(
    earlier_time, current_time, num=df.shape[0]
)

df["prediction_ts"] = pd.Series(optional_prediction_timestamps.astype(int))
df["prediction_ts"].head()

## Create Arize Client
Sign up/login to your Arize account <a href="https://app.arize.com/auth/login">here</a>. Find your <a href="https://docs.arize.com/arize/api-reference/arize.pandas/client">Space and API keys</a>. Copy/paste into the cell below. 

In [None]:
SPACE_ID = "SPACE_ID"  # update value here with your Space ID
API_KEY = "API_KEY"  # update value here with your API key

arize_client = Client(space_id=SPACE_ID, api_key=API_KEY)

if SPACE_ID == "SPACE_ID" or API_KEY == "API_KEY":
    raise ValueError("❌ CHANGE SPACE_ID AND/OR API_KEY")
else:
    print(
        "✅ Import and Setup Arize Client Done! Now we can start using Arize!"
    )

## Define Schema
Create your <a href="https://docs.arize.com/arize/sending-data-to-arize/model-schema-reference">model schema</a>.

In [None]:
schema = Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="prediction_ts",
    prediction_score_column_name="prediction_scores",
    multi_class_threshold_scores_column_name="threshold_scores",
    feature_column_names=["feature1", "feature2", "feature3", "feature4"],
    actual_score_column_name="actual_scores",
)

## Log Data to Arize
Log the DataFrame using the <a href="https://docs.arize.com/arize/sending-data-to-arize/data-ingestion-methods/sdk-reference/python-sdk/arize.pandas">pandas API</a>. 

In [None]:
response = arize_client.log(
    dataframe=df,
    model_id="multiclass-classification-multi-label-batch-ingestion-tutorial",
    model_version="1.0",
    model_type=ModelTypes.MULTI_CLASS,
    environment=Environments.PRODUCTION,
    schema=schema,
)

if response.status_code == 200:
    print(f"✅ You have successfully logged production dataset to Arize")
else:
    print(
        f"Logging failed with response code {response.status_code}, {response.text}"
    )