<center><img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/></center>

# Data Upload Quickstart (Python Pandas SDK)

This example walks through the Arize `pandas` batch SDK for easily sending an example [SciKit Learn](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html#sklearn.datasets.load_breast_cancer)
dataset to Arize. 

Guides for other model types are available [here](https://docs.arize.com/arize/sending-data-to-arize/model-types).

## Install and Import Dependencies

In [None]:
!pip install -q arize

from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments, Schema, Metrics
import pandas as pd

## 👇  Download & Display Data

Download the `load_breast_cancer` dataset, assign the dataset to a variable, and preview the data to better understand what we're working with

In [None]:
from sklearn.datasets import load_breast_cancer

breast_cancer_dataset = load_breast_cancer()
breast_cancer_dataset

## ⚒️ Extract Features, Predictions, and Actuals

The dataset contains all the information we need to create a Pandas dataframe, extract:

*   Feature data: The values of each feature
*   Feature names: The corresponding names of each feature
*   Actual data: A numerical representation of ground truth data
*   Actual labels: The corresponding labels associated with actual data

In [None]:
breast_cancer_features = breast_cancer_dataset["data"]  # feature data
breast_cancer_feature_names = breast_cancer_dataset[
    "feature_names"
]  # feature names
breast_cancer_targets = breast_cancer_dataset["target"]  # target data
breast_cancer_target_names = breast_cancer_dataset[
    "target_names"
]  # target names

## 🪢 Assign Actual Labels

Assign `breast_cancer_taget_names` to their corresponding `breast_cancer_targets` to use as a human-comprehensible list of actual labels. 

In [None]:
target_name_transcription = []

for i in breast_cancer_targets:
    target_name_transcription.append(breast_cancer_target_names[i])

## 🐼 Create A Pandas Dataframe

Create a Pandas dataframe to use the Arize Python Pandas logger using the feature & feature values, and list of actual labels (`target_name_transcription`). 

**Note**: We've also added an identical column called, **prediction_label**, this is because data will not populate in the Arize platform without a record of prediction_labels. 

In [None]:
df = pd.DataFrame(breast_cancer_features, columns=breast_cancer_feature_names)

df["actual_label"] = target_name_transcription
df["prediction_label"] = target_name_transcription

df["prediction_label"] = (
    df["prediction_label"].iloc[::-1].reset_index(drop=True)
)  # this is optional, but makes the data more interesting in the platform

## 🪵 Log Data to Arize


1.   Sign up/ log in to your Arize account [here](https://app.arize.com/auth/login). Find your [space ID and API key](https://docs.arize.com/arize/api-reference/arize.pandas/client). Copy/paste into the cell below.
2.   Define the [Schema](https://docs.arize.com/arize/api-reference/arize.pandas/schema) so Arize knows what your columns correspond to
3.  [Log](https://docs.arize.com/arize/api-reference/arize.pandas/log) the model data to Arize!



In [None]:
API_KEY = "API_KEY"
SPACE_ID = "SPACE_ID"
arize_client = Client(space_id=SPACE_ID, api_key=API_KEY)

if SPACE_ID == "SPACE_ID" or API_KEY == "API_KEY":
    raise ValueError("❌ CHANGE SPACE_ID AND/OR API_KEY")
else:
    print("✅ Arize client setup done! Now you can start using Arize!")

In [None]:
schema = Schema(
    prediction_id_column_name="prediction_id",
    actual_label_column_name="actual_label",
    prediction_label_column_name="prediction_label",
    feature_column_names=[
        "mean radius",
        "mean texture",
        "mean perimeter",
        "mean area",
        "mean smoothness",
        "mean compactness",
        "mean concavity",
        "mean concave points",
        "mean symmetry",
        "mean fractal dimension",
        "radius error",
        "texture error",
        "perimeter error",
        "area error",
        "smoothness error",
        "compactness error",
        "concavity error",
        "concave points error",
        "symmetry error",
        "fractal dimension error",
        "worst radius",
        "worst texture",
        "worst perimeter",
        "worst area",
        "worst smoothness",
        "worst compactness",
        "worst concavity",
        "worst concave points",
        "worst symmetry",
        "worst fractal dimension",
    ],
)

In [None]:
response = arize_client.log(
    dataframe=df,
    schema=schema,
    model_id="breast_cancer_dataset",
    model_version="v1",
    model_type=ModelTypes.BINARY_CLASSIFICATION,
    metrics_validation=[Metrics.CLASSIFICATION],
    environment=Environments.PRODUCTION,
)
if response.status_code == 200:
    print(f"✅ Successfully logged data to Arize!")
else:
    print(
        f'❌ Logging failed with status code {response.status_code} and message "{response.text}"'
    )

## 🎉 Check Out Arize

Now that you've uploaded some data to Arize, check it out on the platform. Follow our [verify data upload steps](https://docs.arize.com/arize/sending-data-guides/troubleshooting-data-upload#looks-great-verify-your-data) and learn how to quickly configure monitors [here](https://docs.arize.com/arize/monitors/monitors-quickstart). 