# Export a Cluster and Ask GPT About It!

## Load ChatGPT

The following is an example of an analysis of data collected from GPT-3.5 (ChatGPT) and GPT response dataset. This example was collected using the OpenAI python API below and can be analyzed in Phoenix. The notebook below:

* Imports a dataset of previously generated prompt/response pairs 
* Loads the dataset into Phoenix for analysis 
* Export a Cluster from Phoenix for further analysis 
* Ask GPT about the Cluster of Data


In [None]:
!pip install openai
!pip install ipywidgets

In [None]:
import pandas as pd
import json

In [None]:
conversations_df = pd.read_csv(
    "https://storage.googleapis.com/arize-assets/fixtures/Embeddings/GENERATIVE/dataframe_llm_gpt.csv"
)

In [None]:
import numpy as np
import ast
import re


def string_to_array(s):
    numbers = re.findall(r"[-+]?\d*\.\d+|[-+]?\d+", s)
    return np.array([float(num) for num in numbers])

In [None]:
conversations_df["prompt_vector"] = conversations_df["prompt_vector"].apply(string_to_array)
conversations_df["response_vector"] = conversations_df["response_vector"].apply(string_to_array)

In [None]:
conversations_df

Installing Arize to make use of the embeddings generators available for use from the SDK generators package

In [None]:
!pip install arize

In [None]:
!pip install 'arize[AutoEmbeddings]'

In [None]:
import arize
from arize.pandas.embeddings import EmbeddingGenerator, UseCases

if not all(col in conversations_df.columns for col in ["prompt_vector", "response_vector"]):
    generator = EmbeddingGenerator.from_use_case(
        use_case=UseCases.NLP.SEQUENCE_CLASSIFICATION,
        model_name="distilbert-base-uncased",
        tokenizer_max_length=512,
        batch_size=100,
    )

Generate embeddings for each Prompt and Response column

In [None]:
# Very fast on GPU (seconds) but can take a 2-3 minute on a CPU
conversations_df = conversations_df.reset_index(drop=True)
if not all(col in conversations_df.columns for col in ["prompt_vector", "response_vector"]):
    conversations_df["prompt_vector"] = generator.generate_embeddings(
        text_col=conversations_df["prompt"]
    )
    conversations_df["response_vector"] = generator.generate_embeddings(
        text_col=conversations_df["response"]
    )

**Install Phoenix**

In [None]:
!pip install arize-phoenix

In [None]:
import phoenix as px

# Define a Schema() object for Phoenix to pick up data from the correct columns for logging
schema = px.Schema(
    feature_column_names=[
        "step",
        "conversation_id",
        "api_call_duration",
        "response_len",
        "prompt_len",
    ],
    prompt_column_names=px.EmbeddingColumnNames(
        vector_column_name="prompt_vector", raw_data_column_name="prompt"
    ),
    response_column_names=px.EmbeddingColumnNames(
        vector_column_name="response_vector", raw_data_column_name="response"
    ),
)

In [None]:
# Create the dataset from the conversaiton dataframe & schema
conv_ds = px.Dataset(conversations_df, schema, "production")

In [None]:
# Click the link below to open in a view in Phoenix of ChatGPT data
px.launch_app(conv_ds)

**Download a Cluster!**

All you need to do is click the download a cluster button in Phoenix! That is it. The export works by exporting the cluster back to the notebook below in a dataframe. Run the below after you click the download button in Phoenix.

In [None]:
pre_prompt = "The following is JSON points for a cluster of datapoints. Can you summarize the cluster of data, what do the points have in common?\n"
pre_prompt_baseline = "The following is JSON points for a cluster of datapoints and a baseline sample data of the entire data set. Can you summarize the cluster of data, what do the points have in common and how does it compare to the baseline?\n"

In [None]:
prompt_cluster_json = px.active_session().exports[-1].prompt.to_json()
prompt_baseline_jason = conversations_df.sample(n=10).prompt.to_json()
response_cluster_json = px.active_session().exports[-1].response.to_json()
chat_intial_input = pre_prompt + prompt_cluster_json

In [None]:
# @title OpenAI Key
import openai

openai.api_key = "YOUR_OPEN_AI_KEY"
messages = []

In [None]:
# @title Chat GPT - Cluster Analysis
import ipywidgets as widgets
from IPython.display import display

# Create the output widget
output = widgets.Output(
    layout={"border": "1px solid black", "width": "100%", "height": "300px", "overflow": "scroll"}
)

# Create the input widget
input_box = widgets.Textarea(
    value="",
    placeholder="Type your message here...",
    description="",
    disabled=False,
    layout=widgets.Layout(width="100%", height="100px"),
)

# Create the submit button
submit_button = widgets.Button(
    description="Send", disabled=False, button_style="success", tooltip="Send your message"
)

# Display the output widget and the input components
display(output)
display(input_box)
display(submit_button)

with output:
    message = chat_intial_input
    if message:
        messages.append(
            {"role": "user", "content": message},
        )
        chat = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=messages)
    reply = chat.choices[0].message.content
    print(f"ChatGPT RESPONSE: {reply}")
    print("\n")
    print("-- Ask another question related to your data below --")
    messages.append({"role": "assistant", "content": reply})


def process_input(input_text):
    # Simulate a simple chatbot response (you can replace this with your own logic)
    response = f"You said: {input_text}"
    return response


def on_submit_button_click(button):
    with output:
        user_input = input_box.value.strip()
        if user_input:
            messages.append(
                {"role": "user", "content": user_input},
            )
            chat = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=messages)
            reply = chat.choices[0].message.content
            print(f"ChatGPT RESPONSE: {reply}")
            messages.append({"role": "assistant", "content": reply})
        input_box.value = ""


# Set the button click event handler
submit_button.on_click(on_submit_button_click)

The example above is just for test purposes and application specific integrations will look different. 