<center><img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/></center>

# Batch Ingestion for Binary Classification (Classification, AUC and Log Loss Metrics)

This example walks through the Arize `pandas` batch SDK for [ingesting binary classification data with support for classification, AUC and log loss metrics](https://docs.arize.com/arize/model-types/binary-classification#case-2-supports-classification-and-auc-log-loss-metrics). Guides for other model types are available [here](https://docs.arize.com/arize/sending-data-to-arize/model-types).

## Install and Import Dependencies

In [None]:
!pip install -q arize

import datetime

from arize.pandas.logger import Client
from arize.utils.types import ModelTypes, Environments, Schema, Metrics
import numpy as np
import pandas as pd

## Download and Display Data

Note that predicted labels, actual labels and predicted scores are present, but actual scores are removed.

In [None]:
df = pd.read_csv(
    "https://storage.googleapis.com/arize-assets/documentation-sample-data/data-ingestion/binary-classification-assets/binary_classification_data.csv?raw=true",
    index_col=False,
)
feature_column_names = df.columns[1:-4]
df = df.drop(["actual_score"], axis=1)
df.head()

## Add Timestamps for Predictions

Generate sample timestamps for each prediction. More information on timestamps in Arize can be found [here](https://docs.arize.com/arize/sending-data-to-arize/model-schema-reference/timestamp).

In [None]:
current_time = datetime.datetime.now().timestamp()

earlier_time = (
    datetime.datetime.now() - datetime.timedelta(days=30)
).timestamp()

optional_prediction_timestamps = np.linspace(
    earlier_time, current_time, num=df.shape[0]
)

df.insert(1, "prediction_ts", optional_prediction_timestamps.astype(int))
df[["prediction_ts"]].head()

## Create Arize Client

Sign up/ log in to your Arize account [here](https://app.arize.com/auth/login). Find your [space ID and API key](https://docs.arize.com/arize/api-reference/arize.pandas/client). Copy/paste into the cell below.

In [None]:
SPACE_ID = "SPACE_ID"  # Change this line.
API_KEY = "API_KEY"  # Change this line.
arize_client = Client(space_id=SPACE_ID, api_key=API_KEY)
if SPACE_ID == "SPACE_ID" or API_KEY == "API_KEY":
    raise ValueError("❌ CHANGE SPACE_ID AND/OR API_KEY")
else:
    print("✅ Arize client setup done! Now you can start using Arize!")

## Define Schema

Create your [model schema](https://docs.arize.com/arize/sending-data-to-arize/model-schema-reference).

In [None]:
schema = Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="predicted_label",
    prediction_score_column_name="predicted_score",
    actual_label_column_name="actual_label",
    feature_column_names=feature_column_names
)

## Log Data to Arize

Log the DataFrame using the [pandas API](https://docs.arize.com/arize/sending-data-to-arize/data-ingestion-methods/sdk-reference/python-sdk/arize.pandas).

In [None]:
response = arize_client.log(
    dataframe=df,
    environment=Environments.PRODUCTION,
    model_id="binary-classification-auc-log-loss-batch-ingestion-tutorial",
    model_version="1.0.0",
    model_type=ModelTypes.BINARY_CLASSIFICATION,
    metrics_validation=[Metrics.CLASSIFICATION, Metrics.AUC_LOG_LOSS],
    validate=True,
    schema=schema
)

if response.status_code == 200:
    print(f"✅ Successfully logged data to Arize!")
else:
    print(
        f'❌ Logging failed with status code {response.status_code} and message "{response.text}"'
    )