<img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/>

# <center>Ingesting an Object Detection Model</center>

In this tutorial, we are going to ingest model data from the [Facebook DETR](https://huggingface.co/facebook/detr-resnet-101) using input images from the [COCO dataset](https://cocodataset.org/#home).


Guides for other model types are available [here](https://docs.arize.com/arize/sending-data-to-arize/model-types).


# Step 0. Install Dependencies and Import Libraries 📚

In [None]:
!pip install arize 
from arize.pandas.logger import Client, Schema
from arize.utils.types import Environments, ModelTypes, EmbeddingColumnNames, Schema, ObjectDetectionColumnNames
import pandas as pd
import uuid
import requests
import numpy as np
import datetime


# Step 1. Download and Display the data

We have curated a dataset for you so that you can send it to Arize in this tutorial.

In [None]:
url="https://storage.googleapis.com/arize-assets/fixtures/Embeddings/arize-demo-models-data/CV/Object-Detection/raw_data_with_predictions_and_embeddings_Object_Detection_no_negatives.parquet"
df = pd.read_parquet(url)

In [None]:
df.head()

# Step 2. Prepare your data to be sent to Arize


## Add Timestamps for Predictions

In [None]:
current_time = datetime.datetime.now().timestamp()

earlier_time = (
    datetime.datetime.now() - datetime.timedelta(days=30)
).timestamp()

optional_prediction_timestamps = np.linspace(
    earlier_time, current_time, num=df.shape[0]
)

df["prediction_ts"] = pd.Series(optional_prediction_timestamps.astype(int))
df[["prediction_ts"]].head()

## Add prediction ids

The Arize platform uses prediction IDs to link a prediction to an actual. Visit the [Arize documentation](https://docs.arize.com/arize/data-ingestion/model-schema/5.-prediction-id?q=prediction_id) for more details.

You can generate prediction IDs as follows:

In [None]:
def add_prediction_id(df):
    return [str(uuid.uuid4()) for _ in range(df.shape[0])]

In [None]:
df["prediction_id"] = add_prediction_id(df)

# Step 3. Sending Data into Arize 💫

## Import and Setup Arize Client

The first step is to setup the Arize client. After that we will log the data.

Copy the Arize `API_KEY` and `SPACE_KEY` from your Space Settings page (shown below) to the variables in the cell below. We will also be setting up some metadata to use across all logging.

<img src="https://storage.googleapis.com/arize-assets/fixtures/copy-keys.png" width="700">

In [None]:
SPACE_KEY = "SPACE_KEY"  # Change this line.
API_KEY = "API_KEY"  # Change this line.
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)
if SPACE_KEY == "SPACE_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ CHANGE SPACE AND API KEYS")
else:
    print("✅ Arize client setup done! Now you can start using Arize!")

## Define Schema

Now that our Arize client is set up, let's go ahead and log all of our data to the platform. For more details on how **`arize.pandas.logger`** works, visit our documentation.

[![Buttons_OpenOrange.png](https://storage.googleapis.com/arize-assets/fixtures/Buttons_OpenOrange.png)](https://docs.arize.com/arize/sdks-and-integrations/python-sdk/arize.pandas)

We will use the `ObjectDetectionColumnNames` and `EmbeddingColumnNames` classes from Arize's SDK. 

[Here](https://docs.arize.com/arize/sending-data/model-schema-reference#8.-embedding-features-unstructured) is more information about defining embedding features using `EmbeddingColumnNames`.

In [None]:
tags=[
  "drift_type"
]
embedding_feature_column_names={
    "image_embedding": EmbeddingColumnNames(
        vector_column_name="image_vector",
        link_to_data_column_name="url"
    )
}
object_detection_prediction_column_names=ObjectDetectionColumnNames(
    bounding_boxes_coordinates_column_name="prediction_bboxes",
    categories_column_name="prediction_categories",
    scores_column_name="prediction_scores"
)
object_detection_actual_column_names=ObjectDetectionColumnNames(
    bounding_boxes_coordinates_column_name="actual_bboxes",
    categories_column_name="actual_categories",
)

# Defina the Schema, including embedding information
schema = Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="prediction_ts",
    tag_column_names=tags,
    embedding_feature_column_names=embedding_feature_column_names,
    object_detection_prediction_column_names=object_detection_prediction_column_names,
    object_detection_actual_column_names=object_detection_actual_column_names,
)

## Log Data to Arize

Log the DataFrame using the [pandas API](https://docs.arize.com/arize/sending-data-to-arize/data-ingestion-methods/sdk-reference/python-sdk/arize.pandas).

In [None]:
# Log the dataframe with the schema mapping 
response = arize_client.log(
    model_id= "CV-object-detection",
    model_version= "v1",
    model_type=ModelTypes.OBJECT_DETECTION,
    environment=Environments.PRODUCTION,
    dataframe=df,
    schema=schema,
    sync=True,
)

# If successful, the server will return a status_code of 200
if response.status_code != 200:
    print(f"❌ logging failed with response code {response.status_code}, {response.text}")
else:
    print(f"✅ You have successfully logged training set to Arize")