<img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/>

# <center>Getting Started with the Arize Platform</center>
## <center>Optimized Prompt Engineering Workflows (LLM Observability)</center>

# Step 0. Install Dependencies, Import Libraries, Use GPU 📚



In [None]:
!pip install -q arize 

from datetime import datetime, timedelta
import uuid
import pandas as pd

from arize.pandas.logger import Client
from arize.utils.types import ModelTypes, Environments, EmbeddingColumnNames, Schema

# Step 1. Download the data


In [None]:
# Download tutorial dataset
df = pd.read_json("https://storage.googleapis.com/arize-assets/fixtures/Embeddings/GENERATIVE/llm_prompt_engineering_demo.json")
df.head()

In [None]:
df['prompt_template'].unique()

# Step 2. Prepare Your Data

## Add prediction ids

The Arize platform uses prediction IDs to link a prediction to an actual. Visit the [Arize documentation](https://docs.arize.com/arize/data-ingestion/model-schema/5.-prediction-id?q=prediction_id) for more details.

You can generate prediction IDs as follows:

In [None]:
def add_prediction_id(df):
    return [str(uuid.uuid4()) for _ in range(df.shape[0])]
df['prediction_id'] = add_prediction_id(df)

In [None]:
from datetime import datetime, timedelta
now_dt = datetime.now() - timedelta(days=1)
start_dt = now_dt - timedelta(days=36)
df["prediction_ts"] = pd.date_range(
    start=start_dt,
    end=now_dt,
    periods=len(df),
)

# Step 5. Sending Data into Arize 💫






In [None]:
SPACE_KEY = "YOUR_SPACE_KEY"
API_KEY = "YOUR_API_KEY"

arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

if SPACE_KEY == "YOUR_SPACE_KEY" or API_KEY == "YOUR_API_KEY":
    raise ValueError("❌ CHANGE SPACE AND API KEYS")
else:
    print("✅ Arize client setup done! Now you can start using Arize!")

## Define the Schema 

A Schema instance specifies the column names for corresponding data in the dataframe. 

To ingest non-embedding features, it suffices to provide a list of column names that contain the features in our dataframe. Prompt and response pairs, however, are a little bit different since embedding vectors need to be logged into the platform.

Arize allows you to ingest prompt and response pairs directly by providing `prompt_column_names` and `response_column_names` as fields of the Schema. You ingest not only the embedding vector but the raw data associated with that embedding. Therefore, up to 2 columns can be associated with the prompt or response objects:
* Embedding `vector` (required)
* Embedding `data` (optional,but recommended): raw text associated with the embedding vector

Learn more [here](https://docs.arize.com/arize/sending-data/model-schema-reference#8.-embedding-features-unstructured).


In [None]:
# Declare prompt and response columns
prompt_columns=EmbeddingColumnNames(
    vector_column_name="prompt_vector",
    data_column_name="prompt"
)

response_columns=EmbeddingColumnNames(
    vector_column_name="response_vector",
    data_column_name="response"
)

In [None]:
df['prediction'] = '1'
df['user_feedback'] = df['user_feedback'].astype(str)
df['conversation_id'] = df['conversation_id'].astype(str)

In [None]:
schema = Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="prediction",
    actual_label_column_name="user_feedback",
    tag_column_names=['step','task_type','conversation_id','api_call_duration', 'response_len', 'prompt_len','prompt_template'],
    prompt_column_names=prompt_columns,
    response_column_names=response_columns,
)

## Log LLM Data into Arize

In [None]:
response = arize_client.log(
    dataframe=df,
    schema=schema,
    model_id="llm-prompt-engineering-demo",
    model_version="1.0",
    model_type=ModelTypes.GENERATIVE_LLM,
    environment=Environments.PRODUCTION,
)
if response.status_code == 200:
    print(f"✅ Successfully logged data to Arize!")
else:
    print(
        f'❌ Logging failed with status code {response.status_code} and message "{response.text}"'
    )