<center>
    <p style="text-align:center">
        <img alt="arize llama-index logos" src="https://storage.googleapis.com/arize-assets/phoenix/assets/docs/notebooks/llama-index-knowledge-base-tutorial/arize_llamaindex.png" width="400">
        <br>
        <a href="https://docs.arize.com/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q">Community</a>
    </p>
</center>
<h1 align="center">Evaluating and Improving a LlamaIndex Search and Retrieval Application</h1>

Imagine you're an engineer at Arize AI and you've built and deployed a documentation question-answering service using LlamaIndex. Users send questions about Arize's core product via a chat interface, and your service retrieves documents from your documentation in order to generate a response to the user. As the engineer in charge of evaluating and maintaining this system, you want to evaluate the quality of the responses from your service.

Phoenix helps you:
- identify gaps in your documentation
- detect queries for which the LLM gave bad responses
- detect failures to retrieve relevant context

In this tutorial, you will:

- Download an pre-indexed knowledge base of the Arize documentation and run a LlamaIndex application
- Visualize user queries and knowledge base documents to identify areas of user interest not answered by your documentation
- Find clusters of responses with negative user feedback
- Identify failed retrievals using cosine similarity, Euclidean distance, and LLM-assisted ranking metrics

Parts of this notebook require an [OpenAI API key](https://platform.openai.com/account/api-keys) to run. If you don't have an OpenAI key, you can still run Phoenix by skipping cells preceded by the 💭 emoji.


## Chatbot Architecture

Your chatbot was built using LlamaIndex's low-level API. The architecture of your chatbot is shown below and can be explained in five steps.

![llama-index chatbot architecture](http://storage.googleapis.com/arize-assets/phoenix/assets/docs/notebooks/llama-index-knowledge-base-tutorial/llama_index_chatbot_architecture.png)

1. The user sends a query about Arize to your service.
1. `langchain.embeddings.OpenAIEmbeddings` makes a request to the OpenAI embeddings API to embed the user query using the `text-embedding-ada-002` model.
1. `llama_index.retrievers.RetrieverQueryEngine` does a similarity search against the entries of your index knowledge base for the two most similar pieces of context by cosine similarity.
1. `llama_index.indices.query.ResponseSynthesizer` generates a response by formatting the query and retrieved context into a single prompt and sending a request to OpenAI chat completions API with the `gpt-3.5-turbo`.
2. The response is returned to the user.

Phoenix makes your search and retrieval system *observable* by capturing the inputs and outputs of these steps for analysis, including:

- your query embeddings
- the retrieved context and similarity scores for each query
- the generated response that is return to the user

With that overview in mind, let's dive into the notebook.

## 1. Install Dependencies and Import Libraries

Install Phoenix and LlamaIndex.

In [None]:
!pip install -q arize-phoenix gcsfs llama-index

Import libraries.

In [1]:
from functools import reduce
import hashlib
import json
import logging
import os
import sys
import tempfile
import textwrap
from tqdm import tqdm
from typing import Any, Dict, List, Tuple
import urllib
import zipfile

from gcsfs import GCSFileSystem
from llama_index.graph_stores.simple import SimpleGraphStore
from langchain.chat_models import ChatOpenAI
from llama_index import StorageContext, load_index_from_storage
from llama_index.callbacks import CallbackManager, OpenInferenceCallbackHandler
from llama_index.callbacks.open_inference_callback import as_dataframe
from llama_index.embeddings.base import BaseEmbedding
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.indices.query.schema import QueryBundle
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.response.schema import Response
from llama_index import ServiceContext, LLMPredictor
from llama_index import StorageContext, load_index_from_storage
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.response.schema import Response
import numpy as np
import numpy.typing as npt
import openai
import pandas as pd
import phoenix as px
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
)


pd.set_option("display.max_colwidth", None)


## 2. Configure Your OpenAI API Key

💭 Configure your OpenAI API key.

In [None]:
openai_api_key = "copy paste your api key here"
assert openai_api_key != "copy paste your api key here", "❌ Please set your OpenAI API key"
openai.api_key = openai_api_key
os.environ["OPENAI_API_KEY"] = openai_api_key


## 3. Download Your Knowledge Base

Download your pre-built index from cloud storage and instantiate your storage context.

In [2]:
file_system = GCSFileSystem(project="public-assets-275721")
index_path = "arize-assets/phoenix/datasets/unstructured/llm/llama-index/arize-docs/index/"
storage_context = StorageContext.from_defaults(
    fs=file_system,
    persist_dir=index_path,
    graph_store=SimpleGraphStore(),  # pass default graph store to prevent unauthorized request to GCS
)


Download and unzip a pre-built knowledge base index consisting of chunks of the Arize documentation.

## 4. Run Your Question-Answering Service

💭 Start a LlamaIndex application from your downloaded index.

In [3]:
callback_handler = OpenInferenceCallbackHandler()
service_context = ServiceContext.from_defaults(
    llm_predictor=LLMPredictor(llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)),
    embed_model=OpenAIEmbedding(model="text-embedding-ada-002"),
    callback_manager=CallbackManager(handlers=[callback_handler]),
)
index = load_index_from_storage(
    storage_context,
    service_context=service_context,
)
query_engine = index.as_query_engine()


💭 Ask a few questions of your question-answering service and view the responses.

In [7]:
max_line_length = 80
for query in [
    "What's the difference between primary and baseline datasets?",
    "How do I send in extra metadata with each record?",
    "How does Arize's surrogate explainability model work?",
]:
for query in query_df[":feature.text:prompt"].to_list():
    print("Query")
    print("=====")
    print()
    print(textwrap.fill(query, max_line_length))
    print()
    response = query_engine.query(query)
    print("Response")
    print("========")
    print()
    print(textwrap.fill(str(response), max_line_length))
    print()

Query
=====

How do I use the SDK to upload a ranking model?

Response

To use the SDK to upload a ranking model, you can follow these steps:  1.
Install the SDK: Start by installing the SDK for the programming language you
are using. The SDK provides the necessary tools and libraries to interact with
the Arize platform.  2. Import the SDK: Import the SDK into your project by
adding the appropriate import statements or dependencies.  3. Authenticate: Use
your Arize API key to authenticate your SDK client. This key can be obtained
from your Arize account.  4. Create a model: Use the SDK to create a model
object that represents your ranking model. Provide the necessary information
such as the model name, description, and any other relevant details.  5. Upload
the model: Use the SDK's upload function to upload your ranking model to the
Arize platform. This function typically takes the model object created in the
previous step as a parameter.  6. Verify the upload: After the upload is
comp

## 5. Load Your Data Into Pandas Dataframes

To use Phoenix, you must load your data into pandas dataframes. First, load your knowledge base into a dataframe.

In [11]:
def storage_context_to_dataframe(storage_context: StorageContext) -> pd.DataFrame:
    """Converts the storage context to a pandas dataframe.

    Args:
        storage_context (StorageContext): Storage context containing the index
        data.

    Returns:
        pd.DataFrame: The dataframe containing the index data.
    """
    document_ids = []
    document_texts = []
    document_embeddings = []
    docstore = storage_context.docstore
    vector_store = storage_context.vector_store
    for node_id, node in docstore.docs.items():
        document_ids.append(node.hash)  # use node hash as the document ID
        document_texts.append(node.text)
        document_embeddings.append(np.array(vector_store.get(node_id)))
    return pd.DataFrame(
        {
            "document_id": document_ids,
            "text": document_texts,
            "text_vector": document_embeddings,
        }
    )


database_df = storage_context_to_dataframe(storage_context)
database_df = database_df.drop_duplicates(subset=["text"])
database_df.head()


Unnamed: 0,document_id,text,text_vector
0,86dacab3db386524f430c7dcdb08d55a580bc705630861bb9ce7b255d1df9d59,\nAccess tutorials of what's possible with Arize below:\n\n1. Model Type Examples\n2. Explainability Tutorials\n3. Cloud Storage Examples\n\n,"[-0.000973847636487335, 0.000874826917424798, 0.012771095149219036, -0.0024746579583734274, 0.006788518745452166, 0.011021440848708153, 0.016849027946591377, -0.013322167098522186, -0.01576066017150879, -0.04863210767507553, 0.0020492991898208857, 0.0037197363562881947, -0.023158803582191467, 0.00039328463026322424, -0.003361539449542761, -0.0017617085250094533, 0.003912611398845911, 0.013914569281041622, 0.03207239508628845, -0.02147803269326687, -0.016173964366316795, -0.01239223312586546, -0.02178112231194973, -0.015168257988989353, 0.0011589734349399805, 0.023489445447921753, 0.02442626841366291, -0.020072799175977707, 0.00850717443972826, -0.014768730849027634, 0.02654789574444294, 0.014038560912013054, 0.013659698888659477, -0.042790744453668594, -0.005744926165789366, 0.0018064831383526325, 0.02817355841398239, 0.022318419069051743, 0.003936721011996269, -0.010649467818439007, 0.033257197588682175, -0.016036197543144226, 0.0007155326311476529, 0.005011311732232571, -0.007060610689222813, 0.01162073202431202, -0.004267364274710417, -0.005741482134908438, -0.00934755988419056, -0.028104674071073532, 0.010470368899405003, 0.01386635098606348, -0.025280430912971497, 0.004907985683530569, -0.004325915593653917, 0.01137274969369173, -0.00628911005333066, 0.028765961527824402, -0.005538274068385363, 0.011042105965316296, 0.0034820865839719772, 0.0002385108673479408, -0.025652404874563217, -0.0013062129728496075, -0.014134998433291912, 0.013721694238483906, -0.014713623560965061, 0.01664237678050995, -0.005004423204809427, 0.0038540600799024105, 0.026079485192894936, 0.00795610249042511, -0.0048735435120761395, -0.033202093094587326, 0.010298158973455429, -0.028848621994256973, -0.0051456354558467865, -0.012268241494894028, -0.012337125837802887, 0.011338307522237301, -0.01657349243760109, 0.004267364274710417, -0.014589632861316204, -0.011393414810299873, 0.01478250790387392, 0.01162073202431202, 0.025762619450688362, 0.012523112818598747, -0.004257031716406345, -0.006199560593813658, 0.015526454895734787, -0.0011245313799008727, -0.0023558330722153187, 0.027016308158636093, -0.0014612020459026098, 0.01671125926077366, -0.00034958633477799594, 0.03226526826620102, -0.004629005212336779, -0.01576066017150879, ...]"
1,a909d43e39685381fca64b936735b972fd1433137148bda0a5f7e43f17d51c61,"\n{% hint style=""info"" %}\nYour model type determines which performance metrics are available to you. Learn more about model types here.\n{% endhint %}\n\nModel TypePandas BatchPython Single RecordCSVParquet Binary Classification (Only Classification Metrics)Colab LinkColab LinkDownload File*Open Parquet Reader HereBinary Classification (Classification, AUC/Log Loss Metrics) Colab LinkColab LinkDownload File*Open Parquet Reader HereBinary Classification (Classification, AUC/Log Loss, Regression) Colab LinkColab LinkDownload FileDownload File*Open Parquet Reader HereMulticlass Classification (Only Classification Metrics)Colab LinkColab LinkDownload File*Open Parquet Reader HereMulticlass Classification (Classification, AUC/Log Loss Metrics)Colab LinkColab LinkDownload File*Open Parquet Reader HereRegressionColab LinkColab LinkDownload FileDownload File*Open Parquet Reader HereTimeseries Forecasting Colab LinkColab LinkDownload File*Open Parquet Reader HereRanking with Relevance ScoreColab LinkColab LinkDownload File*Open Parquet Reader HereRanking with Single LabelColab LinkColab LinkDownload File*Open Parquet Reader HereRanking with Multiple LabelsColab LinkColab LinkDownload File*Open Parquet Reader HereNLP Classification Colab LinkNLP Named Entity Recognition (NER)Colab LinkCV Classification Colab LinkTabular Classification w/ Embeddings Colab LinkObject DetectionColab LinkLarge Language Models (LLMs)Colab Link\n\n","[-0.008054825477302074, -0.00042821429087780416, 0.026454031467437744, -0.03649497777223587, 0.001032714149914682, 0.02170940861105919, 0.004248092882335186, -0.012309813871979713, -0.025460971519351006, -0.025585103780031204, 0.006441101897507906, 0.04766691103577614, -0.0319710373878479, -0.010123701766133308, -0.031143484637141228, -0.015682082623243332, 0.018399206921458244, -0.004030860960483551, 0.02910219319164753, -0.0045239427126944065, -0.0022412827238440514, 0.011978793889284134, -0.02209560014307499, -0.015240722335875034, -0.024495495483279228, 0.011433989740908146, 0.03406749665737152, -0.028178095817565918, -0.00022111115686129779, -0.0018671608995646238, 0.00893754605203867, 0.01593034714460373, -0.024261023849248886, -0.03646739199757576, -0.01423386950045824, -0.003029179759323597, 0.016357915475964546, 0.008565148338675499, 0.016026895493268967, -0.0027309167198836803, 0.0069755613803863525, 0.004792897030711174, 0.007585879880934954, -0.004617042373865843, 0.01913020946085453, 0.02475755289196968, 0.0015128657687455416, 0.010668505914509296, 0.01387526374310255, -0.005599758587777615, 0.01877160556614399, 0.006223869509994984, -0.02191629633307457, 0.0026998836547136307, 0.004106719512492418, -0.014468342065811157, 0.01279944833368063, 0.021571483463048935, -0.0018447480397298932, -0.014716606587171555, 0.019833628088235855, -0.011068488471210003, -0.01456488948315382, 0.0051790871657431126, 0.004882548004388809, -0.015792421996593475, -0.012385672889649868, 0.01751648634672165, -0.000838325941003859, 0.018992284312844276, 0.011833972297608852, -0.0022309382911771536, -0.009199603460729122, -0.0043067107908427715, 0.03122624009847641, -0.012620145455002785, -0.036632902920246124, 0.011013318784534931, -0.00901340413838625, 0.004475669004023075, 0.0020188784692436457, -0.0164958406239748, -0.013302874751389027, 0.0010103012900799513, 0.012930477038025856, -0.010351277887821198, 0.01610965095460415, 0.0019826730713248253, -0.0028688418678939342, -0.0005766992690041661, 0.003917072433978319, -0.00048359984066337347, 0.03795698285102844, -0.000891340896487236, 0.012985646724700928, 0.028605664148926735, 0.01140640489757061, 0.021902503445744514, 0.02206801436841488, -0.02068876288831234, ...]"
2,502dd0091ad54150572de676675430d735c3273989275f0018a94d0fa59570e5,\nExamples for logging explainability metrics. Click here for more information on how to log feature importance and use explainability.\n\n| SHAP: Guide to Getting Started | Colab Link |\n| ------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| SHAP: Neural Network on Tabular Data | Colab Link |\n| Surrogate Model Explainability | Colab Link |\n| One Hot Encoding Decomposition | Colab Link |\n\n,"[-0.0014611489605158567, -0.006545685231685638, 0.03373091667890549, -0.037505004554986954, 0.014624593779444695, -0.0005348263657651842, 0.02037745714187622, -0.002810909878462553, -0.023221129551529884, -0.025671666488051414, 0.0267462320625782, 0.023758413270115852, -0.023247338831424713, -0.0021409434266388416, -0.019053906202316284, 0.0176648311316967, 0.018831130117177963, -0.002791253151372075, 0.036246977746486664, -0.00944177433848381, -0.037216708064079285, 0.012993086129426956, -0.01717996597290039, -0.00883241556584835, -0.00485520763322711, 0.019958114251494408, 0.02626136690378189, -0.038684409111738205, -0.01164987776428461, 0.0027863390278071165, 0.01697029545903206, 0.012010251171886921, -0.01792692206799984, -0.014768742956221104, 0.0034726858139038086, 0.012082325294613838, 0.011512281373143196, 0.0005942060379311442, 0.002047574147582054, -0.011033968068659306, -0.0005020651733502746, 0.03021891787648201, 0.002753577660769224, -0.005900289863348007, -0.0019296339014545083, 0.010273908264935017, -0.013825220987200737, -0.0035218275152146816, 0.0018133316189050674, 0.016590265557169914, 0.026864172890782356, 0.010647385381162167, -0.039916228502988815, -0.004085320048034191, 0.02778148651123047, 0.0068274312652647495, 0.008413072675466537, 0.02109820395708084, 0.014323190785944462, -0.014152832329273224, 0.0055137076415121555, -0.012062668800354004, -0.012468907982110977, 0.0016618111403658986, -0.01870008558034897, -0.02088853344321251, -0.024688830599188805, 0.016354383900761604, 0.0030320477671921253, 0.013353459537029266, 0.03899891674518585, 0.029354022815823555, -0.0033580216113477945, -0.013471399433910847, 0.0033973350655287504, -0.016642682254314423, -0.04494834691286087, 0.0021769809536635876, -0.0006802041316404939, 0.005071431864053011, -0.003944446798413992, -0.0073319533839821815, 0.008891385979950428, 0.014925996772944927, 0.007731640245765448, 0.005041946657001972, 0.010306669399142265, 0.004409655928611755, -0.01838557794690132, 0.0027224544901400805, 0.015109458938241005, -0.0054285284131765366, 0.01771724969148636, 0.005202476400882006, 0.008406520821154118, 0.01893596537411213, 0.014834265224635601, 0.026628293097019196, -0.005435080733150244, -0.023234233260154724, ...]"
3,593aa0d5755153a86bb5ad1df76f8eae7eebda58b93a1ff3ef093a16cb6ba985,\n| Google Cloud Services | Link |\n| --------------------- | --------------------------------------------- |\n| Amazon Web Services | Link |\n| Azure File Import | Link |\n\n,"[0.0016193436458706856, -0.016377530992031097, 0.012395649217069149, -0.00898650474846363, -0.014536592178046703, 0.016677534207701683, -0.008952413685619831, -0.02115033194422722, 0.009266055189073086, -0.031091397628188133, 0.02172306925058365, 0.01442749984562397, 0.019582126289606094, 0.0009937656577676535, -0.014018402434885502, 0.006293280981481075, 0.011638819240033627, -0.005904638208448887, -6.956785364309326e-05, -0.022991271689534187, -0.025527674704790115, -0.015300240367650986, -0.0174820926040411, 0.0007521425141021609, -0.0006362316198647022, 0.021409427747130394, 0.03919152542948723, -0.035291463136672974, 0.0174139104783535, -0.005052352324128151, 0.01749572902917862, -0.02953682839870453, -0.015886614099144936, 0.0014744549989700317, -0.002890954492613673, -0.022609446197748184, 0.0010704713640734553, -0.01052062027156353, 0.007397843524813652, -0.007186476606875658, 0.01815028488636017, 0.0031517541501671076, 0.021259425207972527, 0.0011795640457421541, -0.014632048085331917, 0.02226853184401989, 0.007997852750122547, -0.028554994612932205, 0.0011983143631368876, 0.017318453639745712, 0.002129010856151581, 0.03630056977272034, -0.04453706368803978, -0.0006170551641844213, -0.011291086673736572, -0.01198655180633068, -0.03346416354179382, 0.013588850386440754, -0.0011966096935793757, -0.00571713550016284, 0.00022756039106752723, 0.00870013702660799, -0.02475038915872574, -0.007861487567424774, -0.01797300949692726, -0.01235473994165659, -0.014045675285160542, 0.011502454057335854, 0.008488769643008709, -0.006848971359431744, 0.033982351422309875, 0.00868650060147047, 0.00690351752564311, -0.011066082864999771, 0.029700467362999916, -0.03679148852825165, -0.00584668293595314, -0.018804840743541718, -0.004725074395537376, 0.003998926375061274, -0.008113764226436615, 0.00048196781426668167, 0.013057023286819458, -0.0078273955732584, 0.0187230221927166, 0.013302481733262539, -0.004367114044725895, 0.003978471737354994, -0.008372859098017216, -0.00674669723957777, 0.01801392063498497, 0.03188231959939003, 0.008870594203472137, 0.02042759396135807, 0.012470650486648083, 0.02355037070810795, -0.016513895243406296, 0.0468280091881752, 0.007261477876454592, -0.03436417877674103, ...]"
4,a3a5a89d8e0e85004e34009481dd9f524caec7c71b7d8935be9809c8c21dca46,\n| Sending 10 Million Inferences to Arize in 90 Seconds | Colab Link |\n| ---------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- |\n\n,"[-0.008373280055820942, -0.010231654159724712, -0.005843630991876125, 0.0013142870739102364, 0.02358650602400303, 0.01851307600736618, -0.023091882467269897, 0.01690201461315155, -0.02050570398569107, -0.022342879325151443, 0.016364993527531624, 0.027529368177056313, -0.01331245619803667, -0.027557631954550743, -0.0047660572454333305, 0.007334569934755564, 0.011623667553067207, 0.0026020763907581568, 0.01359509862959385, -0.0035930911544710398, -0.01936100237071514, 0.0008969479240477085, -0.007673740386962891, 0.019120756536722183, -0.005052232649177313, 0.00988541729748249, 0.02437790483236313, -0.03154288977384567, -0.002531415782868862, -0.023120146244764328, 0.013333654031157494, 0.005606918130069971, -0.0007569516310468316, -0.013524438254535198, -0.026497723534703255, 0.015474670566618443, 0.0044480846263468266, 0.00541260140016675, 0.018823983147740364, -0.018640264868736267, 0.024010470137000084, -0.018852246925234795, -0.000633295567240566, -0.024561623111367226, -0.002796393120661378, 0.020237194374203682, -0.012238414958119392, -0.025819381698966026, -0.009150546975433826, -0.005924890749156475, 0.027133667841553688, 0.02894257940351963, -0.02861754037439823, -0.0027504635509103537, -0.004942708648741245, -0.0011650165542960167, -0.0034623690880835056, 0.016393257305026054, 0.013566833920776844, 0.014132118783891201, 0.02478773705661297, -0.01649218238890171, -0.036517396569252014, 0.008535799570381641, -0.01408972218632698, -0.003653152845799923, -0.007327503524720669, 0.0252823606133461, 0.006048547104001045, -0.018993567675352097, 0.024971453472971916, 0.018230432644486427, 0.02003934420645237, -0.036432601511478424, 0.024505093693733215, -0.0012595250736922026, 0.003559527453035116, -0.01635086163878441, -0.0116166016086936, 0.0131993992254138, 0.019007699564099312, -0.014669139869511127, -0.025861777365207672, -0.006656228099018335, 0.024745339527726173, -0.011856847442686558, 0.005917824804782867, 0.003981724381446838, -0.004299697000533342, -0.013609230518341064, 0.01933273859322071, 0.015827972441911697, 0.011750857345759869, 0.005822433158755302, -0.0005639598821289837, 0.03524550423026085, -0.020406778901815414, 0.0328713096678257, -0.00520061980932951, -0.012196018360555172, ...]"


The columns of your dataframe are:
- **text:** the chunked text in your knowledge base
- **text_vector:** the embedding vector for the text, computed during the LlamaIndex build using "text-embedding-ada-002" from OpenAI

💭 Your query data is saved in a buffer on the callback handler you defined in step 4.

In [8]:
query_data_buffer = callback_handler.flush_query_data_buffer()
sample_query_df = as_dataframe(query_data_buffer)
sample_query_df

Unnamed: 0,:id.id:,:timestamp.iso_8601:,:feature.text:prompt,:feature.[float].embedding:prompt,:prediction.text:response,:feature.[str].retrieved_document_ids:prompt,:feature.[float].retrieved_document_scores:prompt
0,4b7b62d0-fd55-4c47-83ca-2b35405b0774,2023-07-25T13:10:15.139446,What's the difference between primary and baseline datasets?,"[0.011037665419280529, 0.015439888462424278, 0.006235888227820396, -0.013263748958706856, -0.03145056590437889, 0.017038103193044662, 0.0020958720706403255, 0.0059076836332678795, -0.030480222776532173, -0.03370518982410431, -0.004259525798261166, 0.029937971383333206, -0.01611056737601757, 0.0021636534947901964, -0.005147818475961685, 0.01947823353111744, 0.021418921649456024, -0.013463526032865047, 0.0020387929398566484, -0.01766597293317318, -0.02936718240380287, 0.00017056830984074622, -0.007819832302629948, -0.011765423230826855, 0.004744697827845812, 0.010630976408720016, 9.712917744764127e-06, -0.009047032333910465, 0.007020725402981043, -0.014576567336916924, 0.011130418628454208, 0.00801247451454401, -0.013962967321276665, -0.0013770326040685177, 0.0016231860499829054, 0.000490077305585146, 0.0027219580952078104, 0.0037672186736017466, 0.03844275325536728, 0.0109021021053195, 0.0012691174633800983, 0.03182158246636391, -0.004330874420702457, -0.002663095248863101, 0.021176334470510483, 0.015297191217541695, 0.01262874435633421, -0.0023652140516787767, -0.02498636208474636, 0.015825172886252403, 0.0009676686022430658, 0.010638111270964146, -0.023830512538552284, -0.00605751620605588, -0.0198349766433239, -0.010224288329482079, -0.00853332132101059, 0.007527302484959364, 0.003763651242479682, -0.004577028099447489, -0.0010720163118094206, 0.010531088337302208, -0.01722360961139202, 0.010110130533576012, -0.02655603736639023, -0.019749358296394348, 0.005422511603683233, 0.017694512382149696, 0.002345592947676778, 0.011051935143768787, 0.013513470068573952, 0.01655293069779873, -0.012336214073002338, 0.004473572131246328, 0.024272873997688293, -0.01939261332154274, 0.0009542906773276627, -0.006043246481567621, 0.010017376393079758, 0.012157841585576534, 0.01211503241211176, -0.006482041906565428, -0.025271758437156677, -0.012521721422672272, 0.015982139855623245, 0.027626270428299904, 0.009938892908394337, 0.01758035272359848, 0.009204000234603882, -0.00178818020503968, 0.007719944231212139, 0.009104111231863499, 0.04834597185254097, 0.009674902074038982, -0.01973508857190609, 0.0046055675484240055, 0.012443237006664276, 0.017437655478715897, -0.008961413986980915, -0.018136873841285706, ...]","The primary dataset refers to the current data that you are working with or evaluating. It is the dataset that you want to compare and analyze in terms of model performance or any other metrics.\n\nOn the other hand, a baseline dataset is a reference dataset that is used for comparison with the primary dataset. It can be from previous periods in production, training data, validation data, or any other relevant dataset. The purpose of a baseline dataset is to provide a benchmark or reference point to measure the performance of the primary dataset against. It helps in identifying any changes or improvements in the model's performance and understanding the root causes of any performance degradation.\n\nIn summary, the primary dataset is the current data being analyzed, while the baseline dataset is a reference dataset used for comparison and benchmarking purposes.","[c55dc0b1919e3dc9e51077722110a540e5f201f65105d3c5f51a3aab1cce875e, 2023685f6dc07bf2bc98d7c6aa810ffd7a798147a38ea3ec7e006df60f2050c1]","[0.851456493535011, 0.8494208316976325]"
1,fd6050d4-15aa-4eef-a706-e1473111cc2a,2023-07-25T13:10:19.638752,How do I send in extra metadata with each record?,"[-0.0028620215598493814, -0.006446495186537504, 0.0014466408174484968, -0.02693912573158741, -0.00015553993580397218, 0.02830067090690136, -0.020048044621944427, -0.02635560743510723, -0.01932559348642826, -0.014351787976920605, 0.007266200613230467, 0.02756432443857193, -0.004560478497296572, -0.006696574855595827, 0.005244723986834288, 0.009801729582250118, 0.007842772640287876, 0.01503256056457758, 0.03264927491545677, -0.009850355796515942, -0.005449650343507528, 0.011864885687828064, -0.00842629186809063, 0.01660250499844551, -0.010746474377810955, 0.006463862024247646, 0.023896491155028343, 2.5344434106955305e-05, 0.0026154154911637306, 0.01085762120783329, 0.016032878309488297, 0.01514370646327734, -0.03178789094090462, -0.01638021133840084, -0.027328139171004295, -0.005869923159480095, -0.008579118177294731, 0.007377346977591515, 0.028870295733213425, -0.006495121866464615, -0.0014249325031414628, 0.007189787458628416, -0.00243827560916543, -0.010802047327160835, -0.004039479419589043, 0.018047409132122993, 0.009808676317334175, -0.023785345256328583, -0.0011514079524204135, 0.006575008388608694, 0.0061790491454303265, -0.007064747624099255, -0.043708350509405136, -0.014963094145059586, -0.0007858402677811682, 0.0039248596876859665, 0.00700222747400403, 0.032037969678640366, 0.0013467826647683978, -0.01818634197115898, 0.01799183525145054, -0.0005236041033640504, -0.01142724696546793, -0.006602794863283634, -0.008898664265871048, -0.012663750909268856, -0.010169901885092258, -0.00977394264191389, -0.015199280343949795, -0.005150944460183382, 0.014935307204723358, 0.02421603724360466, -0.01203160546720028, 0.004171465989202261, 0.006324928719550371, -0.024257717654109, -0.012552605010569096, 2.0785691958735697e-05, 0.014601867645978928, 0.00837071891874075, 0.007870559580624104, -0.0065055419690907, -0.010378302074968815, 0.032037969678640366, 0.026438966393470764, 0.011607859283685684, 0.020395377650856972, 0.013497349806129932, 0.01451850775629282, -0.014976986683905125, 0.010683954693377018, 0.03095429204404354, 0.049043379724025726, 0.011670379899442196, 0.0011496712686493993, 0.012927724048495293, 0.012927724048495293, 0.04890444874763489, 0.0035063237883150578, 0.00930851697921753, ...]","To send in extra metadata with each record, you need to add an entry in your container metadata. Follow these steps:\n\n1. In the Arize UI, copy the value of `arize_ingestion_key`.\n2. In the Azure UI, navigate to your Container and go to Settings.\n3. Click on Metadata.\n4. Add a new entry with the key as `arize_ingestion_key` and paste the copied value as the tag value.\n\nBy adding this entry in the container metadata, Arize will be able to associate the extra metadata with each record.\n\nAdditionally, to access the custom metric editor, follow these steps:\n\n1. Go to the top navigation bar of any model in the Arize UI.\n2. Click on 'Custom Metrics' to reveal the editor.\n3. From there, you can define and configure custom metrics for your model.","[9e422f968dafc34e58ed42a7f854acaca783b8a9bdf3f4473f291586da127590, e7d446da74138324a7649f913245190fb8e48772ea8f06a207f48568392b4faf]","[0.7536101922434472, 0.7528422288110171]"
2,8c8de20d-cd63-44a6-b19f-42afc4c88070,2023-07-25T13:10:26.501630,How does Arize's surrogate explainability model work?,"[0.005659774411469698, -0.006679392419755459, -0.013258611783385277, -0.02233857847750187, -0.007001377176493406, 0.00813547894358635, 0.01830303855240345, 0.00978833343833685, -0.046480271965265274, -0.02979431301355362, 0.013215680606663227, 0.015011639334261417, -0.03162604570388794, 0.007126593496650457, -0.018274417147040367, 0.008400221355259418, 0.02647429332137108, 0.009194450452923775, 0.020277876406908035, -0.0064468481577932835, -0.013594906777143478, -0.005956715904176235, -0.025243595242500305, 0.0019193862099200487, 0.010618338361382484, 0.03943954035639763, 0.016113542020320892, -0.013773787766695023, 0.016514234244823456, -0.0004324432520661503, 0.018503384664654732, 0.03328605741262436, -0.020835984498262405, -0.014246031641960144, -0.024442212656140327, 0.0037457547150552273, 0.022109612822532654, -0.0012754169292747974, 0.013945512473583221, -0.006579219363629818, 0.02455669455230236, -0.02777654118835926, 0.00431817164644599, -0.010632648132741451, 0.00420726602897048, 0.011705930344760418, -0.003613383276388049, -0.014131547883152962, -0.014331893995404243, -0.010832994244992733, 0.034259166568517685, 0.007065773941576481, -0.042702317237854004, 0.010775752365589142, -0.01655716635286808, -0.013265767134726048, -0.009194450452923775, 0.019447872415184975, 0.02169460989534855, -0.007269697729498148, 0.04885580390691757, -0.00988850649446249, -0.01177032757550478, 0.023240135982632637, -0.007999529130756855, 0.009709625504910946, 0.006346675101667643, 0.010747131891548634, 0.0074128019623458385, 0.004819036927074194, 0.03780815377831459, 0.02004891075193882, -0.013201369903981686, -0.021923577412962914, 0.024699799716472626, -0.014560860581696033, -0.03088190406560898, -0.02233857847750187, -0.021222366020083427, 0.013780943118035793, 0.0002468549064360559, 0.004446965642273426, -0.014725430868566036, 0.005112400744110346, 0.018374590203166008, -0.005459428299218416, 0.017344240099191666, 0.020478222519159317, -0.012056536041200161, 0.01803114078938961, 0.020921846851706505, 0.011670154519379139, 0.00519468542188406, 0.015483883209526539, -0.01039652619510889, 0.008958328515291214, 0.005967448465526104, 0.021136503666639328, -0.01222826074808836, -0.017215445637702942, ...]","Arize's surrogate explainability model works by training an interpretable surrogate model to mimic the predictions of a black box model. The surrogate model approximates the predictions of the black box model as closely as possible. When the black box model is not available, SHAP values can be generated from the surrogate model. \n\nTo enable surrogate explainability in Arize, the user can pass a flag with their request using the Arize Python SDK. When the flag is enabled, a tree-based surrogate model is trained using the dataset's features and predictions. SHAP values are then generated from the surrogate model. The pandas logger in the Arize Python SDK can compute surrogate models for regression and binary classification models.\n\nThe feature importance values generated from the surrogate model are sent to the Arize platform using the Pandas logger. This allows users to monitor, troubleshoot, and explain the performance of their models. The surrogate explainability approach provides insights into the importance of different features in the model's predictions.","[e2691423028af6ef04b449d3d5404449b6bb931a1282c6e6dc3051611efebf96, 3f24f308b645d33f736d3d012f9e3a99a077a246eee448eae3c3de277277d26f]","[0.8546762236587125, 0.8425211049722136]"
3,b6e0bd1b-4b06-44d3-ba90-711767713897,2023-07-25T13:12:18.099817,How do I use the SDK to upload a ranking model?,"[-0.009536871686577797, 0.008512362837791443, 0.013284236192703247, -0.0024701666552573442, 0.009846286848187447, 0.02992391213774681, -0.019211260601878166, -0.019087495282292366, 0.005610733292996883, -0.03492956608533859, 0.02467072568833828, -0.013057331554591656, -0.016777193173766136, -0.01948629692196846, -0.022594204172492027, -0.00867738388478756, 0.013456134125590324, 0.003432792378589511, 0.008017297834157944, 0.0022381050512194633, -0.016515908762812614, 0.017973599955439568, 0.000521708803717047, -0.014439387246966362, -0.011668399907648563, 0.015250743366777897, 0.021700337529182434, -0.01339425053447485, 0.01630963198840618, 0.002007762435823679, 0.001018492504954338, 0.011455247178673744, -0.021095257252454758, -0.008017297834157944, -0.018331145867705345, 0.012892310507595539, -0.009103690274059772, 0.008443604223430157, 0.01030009612441063, -0.02304801344871521, 0.020943988114595413, 0.018523670732975006, 0.0022191964089870453, -0.022319167852401733, -0.004476210102438927, 0.0006446155020967126, 0.005782630760222673, -0.02347431890666485, -0.011572137475013733, -0.005009092390537262, -0.021219024434685707, 0.01405433751642704, -0.028026163578033447, -0.021507810801267624, -0.0272835660725832, 0.0057860687375068665, 0.003264332888647914, 0.032564256340265274, 0.01922501251101494, -0.009358098730444908, 0.020146382972598076, -0.004098035860806704, -0.01346301008015871, -0.02137029357254505, 0.009543747641146183, -0.006717753130942583, -0.0023498383816331625, -0.008113560266792774, -0.00468248687684536, 0.0010442771017551422, 0.03674480319023132, 0.016515908762812614, -0.014274366199970245, -0.00344310631044209, 0.02953886054456234, 0.014893196523189545, -0.007095927372574806, -0.0130229527130723, -0.0029514795169234276, -0.003675167914479971, 0.00533913541585207, -0.016268376260995865, -0.00961250625550747, 0.0016949090640991926, 0.01498945988714695, 0.003070088801905513, 0.02622467838227749, 0.03297680988907814, -0.00771475862711668, -0.018193628638982773, 0.03839501738548279, -0.0005238575395196676, 0.0015066813211888075, -0.022277912124991417, -0.025812124833464622, -0.011936560273170471, 0.019582560285925865, 0.01680469699203968, 0.014893196523189545, -0.009591879323124886, ...]","To use the SDK to upload a ranking model, you can follow these steps:\n\n1. Install the SDK: Start by installing the SDK for the programming language you are using. The SDK provides the necessary tools and libraries to interact with the Arize platform.\n\n2. Import the SDK: Import the SDK into your project by adding the appropriate import statements or dependencies.\n\n3. Authenticate: Use your Arize API key to authenticate your SDK client. This key can be obtained from your Arize account.\n\n4. Create a model: Use the SDK to create a model object that represents your ranking model. Provide the necessary information such as the model name, description, and any other relevant details.\n\n5. Upload the model: Use the SDK's upload function to upload your ranking model to the Arize platform. This function typically takes the model object created in the previous step as a parameter.\n\n6. Verify the upload: After the upload is complete, you can verify the success of the operation by checking the response from the upload function. It should indicate whether the upload was successful or if any errors occurred.\n\nBy following these steps, you should be able to use the SDK to upload your ranking model to the Arize platform. This will enable you to track and monitor the performance of your model and make improvements as needed.","[096f842ed9e4be0a4d61c7e7b3ac9b54d88fa02ebcf1fbe4cc93cc161c84246f, 98797bdfd3ea9102fe5438559c80b04571a95a5f57f3005b60f0e0a60cabcb8b]","[0.8024018669959406, 0.7964861566463088]"
4,6201d0a5-b389-4392-8c8d-b75794e27bce,2023-07-25T13:12:28.255549,What drift metrics are supported in Arize?,"[-0.009346794337034225, -0.00042854511411860585, -0.0009247552370652556, -0.017379986122250557, -0.0047960965894162655, 0.030313927680253983, -0.016629355028271675, 0.006423665676265955, -0.027657851576805115, -0.017495466396212578, 0.01267410907894373, 0.019270997494459152, -0.01836157962679863, -0.005113671068102121, -0.01797182857990265, 0.005756037775427103, 0.0026596863754093647, 0.003597974544391036, 0.02191263996064663, -0.006073612254112959, -0.015792112797498703, -0.017293374985456467, -0.003929984290152788, 0.008646687492728233, -0.011115106754004955, 0.0181161817163229, 0.01463007926940918, -0.005117279943078756, 0.015128093771636486, -0.02220134437084198, 0.04408511146903038, 0.027499062940478325, -0.00033742288360372186, 0.014211458154022694, -0.034817710518836975, -0.011475986801087856, 0.007636222988367081, 0.020497988909482956, 0.028119778260588646, -0.011779126711189747, 0.0045903949066996574, -0.017235632985830307, 0.0013929972192272544, -0.005752428900450468, 0.00744856521487236, 0.011887390166521072, 0.0019072513096034527, -0.00457235099747777, -0.016066381707787514, 0.0014363027876242995, 0.01836157962679863, 0.011937913484871387, -0.030573762953281403, -0.015359057113528252, 0.012652456760406494, -0.01846262626349926, 0.011591468937695026, 0.023904697969555855, -0.001355104730464518, 0.0010952710872516036, 0.0184770617634058, 0.0051641943864524364, -0.02142184227705002, 0.02220134437084198, -0.01135328784584999, 0.012407057918608189, -0.011685297824442387, 0.024020180106163025, -0.010862491093575954, -0.0012973640114068985, 0.03210389241576195, 0.005229152739048004, -0.003307466162368655, -0.007004682905972004, 0.011649210005998611, -0.03346080332994461, 0.006892810110002756, -0.02184046432375908, 0.015950899571180344, 0.023255113512277603, 0.012883419170975685, 0.004918795544654131, -0.021349666640162468, -1.1397326488804538e-05, 0.005903998389840126, 0.020815564319491386, 0.007686746306717396, 0.011512075550854206, 0.004507392644882202, 0.0072320373728871346, 0.020194850862026215, 0.0012766133295372128, 0.016817012801766396, 0.0055755977518856525, 0.0018801853293552995, 0.0036990209482610226, 0.012399840168654919, -0.0030837205704301596, -0.00024336851492989808, -0.006593279540538788, ...]","Arize supports various drift metrics including Population Stability Index, KL Divergence, and Wasserstein Distance. These metrics are used to measure distributional drift between the model's production values and a baseline dataset. Users can configure the baseline to be a pre-production dataset (such as training, test, or validation) or a fixed or moving time period from production. Baselines can be saved in Arize for comparing different versions or environments across different time windows. For more information on baselines, you can visit the provided link.","[4ba270d946086c744bf09cbb8ceaf5180803458c9f6d5f2531dcdcda0991538b, b40b5c5c61d22f0599e00702092bd4966a01689922311c647f2127fb39720c84]","[0.8768752151560731, 0.873443430009533]"
...,...,...,...,...,...,...,...
151,b4aaa3ef-fb2e-496b-9725-b89d02124217,2023-07-25T13:26:09.107017,Do you support IoU for image segmentation?,"[0.012811388820409775, 0.00710085965692997, 0.026178909465670586, -0.02489030919969082, -0.014635773375630379, 0.02400863729417324, -0.005137441214174032, 0.020590458065271378, -0.011129427701234818, -0.014025384560227394, 0.0034893909469246864, 0.027765920385718346, 0.005076402332633734, 0.0022872635163366795, -0.003808149602264166, -0.0045338342897593975, 0.01896275393664837, 0.023981507867574692, 0.018759291619062424, -0.023493196815252304, -0.009745879098773003, 0.010756411589682102, -0.008023225702345371, -0.017253665253520012, -0.00824703462421894, 0.015910807996988297, 0.0314689464867115, -0.019980069249868393, 0.011726252734661102, -0.005659663118422031, 0.027358995750546455, 0.01923403888940811, -0.0046491301618516445, 0.0042557683773338795, 0.013645586557686329, 0.00177691038697958, 0.015490318648517132, 0.006836357526481152, 0.014391617849469185, -0.005985204130411148, 0.011753381229937077, 0.02082104980945587, 0.00015100771270226687, -0.025432879105210304, -0.026138216257095337, 0.006643067579716444, -0.01007820200175047, -0.004889894742518663, -0.012947030365467072, 0.025405749678611755, 0.012262037955224514, 0.031251922249794006, -0.01740287058055401, -0.0217434149235487, -0.005778349936008453, 0.012485847808420658, 0.004571136087179184, 0.002721318043768406, 0.00360468658618629, -0.04530443251132965, 0.022543704137206078, -0.005446027033030987, -0.024754667654633522, 0.007921493612229824, -0.01539536938071251, -0.008925245143473148, -0.019220473244786263, 0.014961314387619495, 0.024144278839230537, -0.03350357711315155, 0.028973134234547615, 0.03741006925702095, -0.025419313460588455, -0.012187435291707516, 0.04175061360001564, -0.02440199814736843, -0.00908801518380642, -0.023072706535458565, -0.008430151268839836, -0.002585676033049822, 0.018325235694646835, -0.015164777636528015, -0.0045134881511330605, 0.01377444714307785, 0.006510816980153322, 0.02527010813355446, -0.007216155529022217, 0.030221041291952133, -0.0013250529300421476, -0.023018449544906616, 0.0256092119961977, 0.01477141585201025, 0.02840343862771988, -0.0045134881511330605, -0.006175102666020393, 0.020359866321086884, -0.009467813186347485, 0.0073314509354531765, 0.01652119681239128, 0.0001780301536200568, ...]","Based on the given context information, it is not explicitly mentioned whether Arize supports IoU (Intersection over Union) for image segmentation. To get a definitive answer, it is recommended to refer to the provided link or contact Arize directly for more information.","[3324b6c702a3066995f3fc2972550faf67b5447c94a4facce13526834bcd2373, 42393552362b66b5afb31244d564a4d511c460734678121b6e3011d1a3ce6795]","[0.7497239928821453, 0.7449576160096634]"
152,6edcdcc3-017e-4b19-822d-4f55f79e66fb,2023-07-25T13:26:11.688980,This is a test question?,"[0.010555022396147251, -0.010006960481405258, 0.0021326045971363783, -0.02446289174258709, -0.042374834418296814, 0.010206841863691807, -0.0032319517340511084, -0.020568430423736572, -0.02202562987804413, -0.03891882300376892, 0.012553835287690163, 0.015294142998754978, -0.004123357590287924, 0.012302371673285961, 0.001898872316814959, 0.009736153297126293, 0.03285790607333183, 0.0009010778157971799, -7.132860628189519e-05, -0.01584865339100361, -0.010987023822963238, 0.0030175629071891308, 0.006628321949392557, 0.021496910601854324, -0.015384412370622158, 0.0050969733856618404, 0.013514555059373379, -0.03293528035283089, 0.016596594825387, -0.01295359805226326, 0.018079586327075958, -0.004407060332596302, -0.021122939884662628, -0.019020961597561836, -0.0143398717045784, 0.0038041924126446247, 0.0012774672359228134, -0.0250947754830122, 0.020400788635015488, -0.005013152025640011, 0.013772466219961643, 0.024321040138602257, -0.002616188256070018, -0.012798851355910301, 0.00932349544018507, -0.007356921210885048, 0.0046810912899672985, -0.028189711272716522, -0.016093669459223747, 0.01641605794429779, 0.026281166821718216, 0.00311427959240973, -0.034302208572626114, -0.004145924933254719, -0.005412914790213108, -0.004291000310331583, -0.0011396458139643073, 0.020052608102560043, 0.001990753225982189, 0.014739633537828922, -0.022915424779057503, 0.016596594825387, -0.0040621040388941765, 0.019833384081721306, 0.015242560766637325, -0.0039073568768799305, -0.003488251008093357, 0.013398494571447372, -0.00513566005975008, -0.0069378153420984745, 0.0029369655530899763, 0.021226104348897934, 0.0052130334079265594, 0.006866890005767345, 0.008182237856090069, 0.0022567242849618196, -0.006963606458157301, -0.006583187263458967, -0.005922289565205574, 0.002240604953840375, 0.011838131584227085, -0.02220616675913334, -0.020400788635015488, 0.017731405794620514, -0.00849817879498005, -0.0031110558193176985, 0.01326953899115324, 0.03063986822962761, -0.0116446977481246, -0.004690762609243393, -0.0014338259352371097, 0.017615346238017082, 0.020323414355516434, 0.011135322973132133, -0.014791215769946575, 0.024076025933027267, 0.010426066815853119, 0.02096819318830967, -2.0542865968309343e-06, -0.02027183212339878, ...]",,"[456b19d2d5f803943e3daf41d5d9b29e41fe2cbbd27a514ca22c59ae94d7aecc, a062dd4952b935e430d409c4a5e5cd760e8781e3200d5c7e12a232dddba8a528]","[0.8015116877694872, 0.7831292562752485]"
153,7b61f1af-ef0e-4046-bef1-fdfa8e544438,2023-07-25T13:26:12.101673,?,"[-0.004491159692406654, -0.02421266958117485, -0.009229766204953194, -0.026996446773409843, -0.01248987764120102, 0.014166329056024551, -0.010368021205067635, -0.012867233715951443, -0.0052458723075687885, -0.007782201748341322, 0.038750167936086655, 0.00754094123840332, -0.0007206888403743505, -0.00881529226899147, 0.009885500185191631, 0.02004319056868553, 0.022406307980418205, -0.003226087661460042, 0.006025329697877169, -0.027144914492964745, -0.0107206329703331, -0.018632743507623672, 0.0011367087718099356, -0.01365906372666359, -0.011549579910933971, 0.0067924149334430695, 0.018372924998402596, -0.02583344653248787, 0.0166407972574234, -0.012168196961283684, 0.008376074023544788, 0.011339250020682812, -0.009823638014495373, -0.004695303272455931, -0.033083636313676834, -0.01698722317814827, 0.010751564055681229, -0.0140920951962471, 0.005830465350300074, 0.007336797658354044, 0.02017928659915924, 0.0065449681133031845, 0.006483106408268213, -0.02087213844060898, -0.018867818638682365, -0.016046924516558647, -0.01295384019613266, -0.014104467816650867, -0.0307328924536705, 0.020686551928520203, 0.0230867862701416, 0.011085616424679756, 0.004726233892142773, -0.015663383528590202, -0.005626321770250797, -0.006891393568366766, -0.011017569340765476, 0.00452827662229538, -0.005267523694783449, 0.0026600530836731195, 0.016294371336698532, 0.00824616476893425, -0.012366154231131077, 0.029965808615088463, -0.002367756562307477, -0.0061552394181489944, -0.021750574931502342, 0.01706145703792572, 0.008939015679061413, -0.019498808309435844, 0.03385072201490402, 0.021268052980303764, 0.012415643781423569, -0.0030544213950634003, 0.021676339209079742, -0.022134115919470787, -0.005378874950110912, -0.007138840388506651, -0.018855446949601173, 6.21516810497269e-05, 0.029718361794948578, -0.03773563727736473, -0.027021190151572227, 0.020067935809493065, 0.009588563814759254, 0.0034518828615546227, 0.013609574176371098, 0.017543978989124298, 0.0036622127518057823, -0.0013021888444200158, 0.0003858623676933348, 0.017791425809264183, 0.022888829931616783, 0.00793066993355751, -0.008209047839045525, 0.030114276334643364, -0.015663383528590202, 0.02746659517288208, 0.003210622351616621, -0.016331488266587257, ...]",,"[456b19d2d5f803943e3daf41d5d9b29e41fe2cbbd27a514ca22c59ae94d7aecc, a062dd4952b935e430d409c4a5e5cd760e8781e3200d5c7e12a232dddba8a528]","[0.8490410848844393, 0.845363099566625]"
154,a968bd30-f128-4c56-afda-7c09f74a78a8,2023-07-25T13:26:12.537027,This is a question?,"[0.022363832220435143, -0.01064034178853035, 0.005307428073137999, -0.019114386290311813, -0.04113416001200676, 0.01489647850394249, -0.003991721197962761, -0.019012443721294403, -0.0054858289659023285, -0.021318912506103516, 0.012239579111337662, 0.026275910437107086, -0.006791978608816862, 0.014437733218073845, 0.006760121323168278, 0.010570256039500237, 0.028620608150959015, 0.020809195935726166, 0.004753110930323601, -0.028110891580581665, -0.01518956571817398, -0.00443453760817647, 0.0215610284358263, 0.006040146108716726, 0.0005117080872878432, 0.001467029214836657, 0.021331654861569405, -0.036215391010046005, 0.013749615289270878, -0.010927057825028896, 0.0012057991698384285, -0.0004487898841034621, -0.018388040363788605, -0.029640043154358864, -0.014131903648376465, 0.0021121397148817778, -0.01241160836070776, -0.016986317932605743, 0.0035648331977427006, -0.0037496055010706186, 0.01924181543290615, 0.02492515929043293, -0.0022156760096549988, 0.009595422074198723, 0.008034413680434227, -0.016145285218954086, -0.003389617893844843, -0.03524692729115486, -0.0234852097928524, 0.021038567647337914, 0.026913056150078773, 0.015482652932405472, -0.023791039362549782, -0.013341842219233513, -0.006881179288029671, 0.0002162314922316, -0.01559733971953392, 0.017202947288751602, -0.0015809190226718783, 0.012806639075279236, 0.004679839126765728, 0.01964958943426609, -0.017050033435225487, 0.009824794717133045, 0.017177462577819824, -0.00194329593796283, -0.013953502289950848, 0.010563883930444717, -0.007301695644855499, -0.007378153037279844, 0.019088901579380035, 0.03432943671941757, -0.0020086034201085567, 0.002129661152139306, 0.008595102466642857, 0.0011428809957578778, -0.018719356507062912, -0.016540315002202988, -0.0004667096072807908, 4.746241756947711e-05, -0.0031586524564772844, -0.02532019093632698, -0.01601785607635975, 0.014233846217393875, -0.007352667395025492, -0.006336419377475977, -0.01155783236026764, 0.0358840748667717, -0.009620907716453075, -0.0009071369422599673, -0.005992360413074493, 0.014055445790290833, 0.013991731218993664, 0.010506541468203068, -0.005412557162344456, 0.03007330186665058, -0.007269838359206915, 0.0168079175055027, 0.006250404287129641, -0.019025186076760292, ...]",,"[456b19d2d5f803943e3daf41d5d9b29e41fe2cbbd27a514ca22c59ae94d7aecc, a062dd4952b935e430d409c4a5e5cd760e8781e3200d5c7e12a232dddba8a528]","[0.8035149329039903, 0.7885423714018287]"


The columns of the dataframe are:
- **text:** the query text
- **text_vector:** the embedding representation of the query, captured from LlamaIndex at query time
- **response:** the final response from the LlamaIndex application
- **context_text_0:** the first retrieved context from the knowledge base
- **context_similarity_0:** the cosine similarity between the query and the first retrieved context
- **context_text_1:** the second retrieved context from the knowledge base
- **context_similarity_1:** the cosine similarity between the query and the first retrieved context

Download a dataframe containing query data.

In [40]:
query_df = pd.read_parquet(
    # "http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/llm/llama-index/arize-docs/query_data_complete.parquet",
    "query_data_complete.parquet"
)
query_df.head()

Unnamed: 0,index,:id.id:,:timestamp.iso_8601:,:feature.text:prompt,:feature.[float].embedding:prompt,:prediction.text:response,:feature.[str].retrieved_document_ids:prompt,:feature.[float].retrieved_document_scores:prompt,retrieved_document_text_0,openai_relevance_0,retrieved_document_text_1,openai_relevance_1,user_feedback
0,0,4b7b62d0-fd55-4c47-83ca-2b35405b0774,2023-07-25T13:10:15.139446,What's the difference between primary and baseline datasets?,"[0.011037665419280529, 0.015439888462424278, 0.006235888227820396, -0.013263748958706856, -0.03145056590437889, 0.017038103193044662, 0.0020958720706403255, 0.0059076836332678795, -0.030480222776532173, -0.03370518982410431, -0.004259525798261166, 0.029937971383333206, -0.01611056737601757, 0.0021636534947901964, -0.005147818475961685, 0.01947823353111744, 0.021418921649456024, -0.013463526032865047, 0.0020387929398566484, -0.01766597293317318, -0.02936718240380287, 0.00017056830984074622, -0.007819832302629948, -0.011765423230826855, 0.004744697827845812, 0.010630976408720016, 9.712917744764127e-06, -0.009047032333910465, 0.007020725402981043, -0.014576567336916924, 0.011130418628454208, 0.00801247451454401, -0.013962967321276665, -0.0013770326040685177, 0.0016231860499829054, 0.000490077305585146, 0.0027219580952078104, 0.0037672186736017466, 0.03844275325536728, 0.0109021021053195, 0.0012691174633800983, 0.03182158246636391, -0.004330874420702457, -0.002663095248863101, 0.021176334470510483, 0.015297191217541695, 0.01262874435633421, -0.0023652140516787767, -0.02498636208474636, 0.015825172886252403, 0.0009676686022430658, 0.010638111270964146, -0.023830512538552284, -0.00605751620605588, -0.0198349766433239, -0.010224288329482079, -0.00853332132101059, 0.007527302484959364, 0.003763651242479682, -0.004577028099447489, -0.0010720163118094206, 0.010531088337302208, -0.01722360961139202, 0.010110130533576012, -0.02655603736639023, -0.019749358296394348, 0.005422511603683233, 0.017694512382149696, 0.002345592947676778, 0.011051935143768787, 0.013513470068573952, 0.01655293069779873, -0.012336214073002338, 0.004473572131246328, 0.024272873997688293, -0.01939261332154274, 0.0009542906773276627, -0.006043246481567621, 0.010017376393079758, 0.012157841585576534, 0.01211503241211176, -0.006482041906565428, -0.025271758437156677, -0.012521721422672272, 0.015982139855623245, 0.027626270428299904, 0.009938892908394337, 0.01758035272359848, 0.009204000234603882, -0.00178818020503968, 0.007719944231212139, 0.009104111231863499, 0.04834597185254097, 0.009674902074038982, -0.01973508857190609, 0.0046055675484240055, 0.012443237006664276, 0.017437655478715897, -0.008961413986980915, -0.018136873841285706, ...]","The primary dataset refers to the current data that you are working with or evaluating. It is the dataset that you want to compare and analyze in terms of model performance or any other metrics.\n\nOn the other hand, a baseline dataset is a reference dataset that is used for comparison with the primary dataset. It can be from previous periods in production, training data, validation data, or any other relevant dataset. The purpose of a baseline dataset is to provide a benchmark or reference point to measure the performance of the primary dataset against. It helps in identifying any changes or improvements in the model's performance and understanding the root causes of any performance degradation.\n\nIn summary, the primary dataset is the current data being analyzed, while the baseline dataset is a reference dataset used for comparison and benchmarking purposes.","[c55dc0b1919e3dc9e51077722110a540e5f201f65105d3c5f51a3aab1cce875e, 2023685f6dc07bf2bc98d7c6aa810ffd7a798147a38ea3ec7e006df60f2050c1]","[0.851456493535011, 0.8494208316976325]","\nA model baseline refers to a reference dataset used for comparison with your current data. A model baseline can be from training, validation, or previous periods in production. Baselines help compare model changes, enable detailed analysis, and help identify the root causes of performance degradation. **A model baseline is required in Arize**.\n\nBy default, model baselines use a moving time range from your model's **production data**. This data spans a period of two weeks and is delayed by three days. However, you have the flexibility to choose a baseline from either production data or pre-production data, which includes both training and validation sets.\n\nProduction Baseline#production-baselinePre-Production Baseline#pre-production-baselines\n\n",irrelevant,"\nA baseline is the reference data or benchmark used to compare model performance against for monitoring purposes. Baselines can be training data, validation data, prior time periods of production data, a previous model version, among others.\n\n",irrelevant,-1.0
1,1,fd6050d4-15aa-4eef-a706-e1473111cc2a,2023-07-25T13:10:19.638752,How do I send in extra metadata with each record?,"[-0.0028620215598493814, -0.006446495186537504, 0.0014466408174484968, -0.02693912573158741, -0.00015553993580397218, 0.02830067090690136, -0.020048044621944427, -0.02635560743510723, -0.01932559348642826, -0.014351787976920605, 0.007266200613230467, 0.02756432443857193, -0.004560478497296572, -0.006696574855595827, 0.005244723986834288, 0.009801729582250118, 0.007842772640287876, 0.01503256056457758, 0.03264927491545677, -0.009850355796515942, -0.005449650343507528, 0.011864885687828064, -0.00842629186809063, 0.01660250499844551, -0.010746474377810955, 0.006463862024247646, 0.023896491155028343, 2.5344434106955305e-05, 0.0026154154911637306, 0.01085762120783329, 0.016032878309488297, 0.01514370646327734, -0.03178789094090462, -0.01638021133840084, -0.027328139171004295, -0.005869923159480095, -0.008579118177294731, 0.007377346977591515, 0.028870295733213425, -0.006495121866464615, -0.0014249325031414628, 0.007189787458628416, -0.00243827560916543, -0.010802047327160835, -0.004039479419589043, 0.018047409132122993, 0.009808676317334175, -0.023785345256328583, -0.0011514079524204135, 0.006575008388608694, 0.0061790491454303265, -0.007064747624099255, -0.043708350509405136, -0.014963094145059586, -0.0007858402677811682, 0.0039248596876859665, 0.00700222747400403, 0.032037969678640366, 0.0013467826647683978, -0.01818634197115898, 0.01799183525145054, -0.0005236041033640504, -0.01142724696546793, -0.006602794863283634, -0.008898664265871048, -0.012663750909268856, -0.010169901885092258, -0.00977394264191389, -0.015199280343949795, -0.005150944460183382, 0.014935307204723358, 0.02421603724360466, -0.01203160546720028, 0.004171465989202261, 0.006324928719550371, -0.024257717654109, -0.012552605010569096, 2.0785691958735697e-05, 0.014601867645978928, 0.00837071891874075, 0.007870559580624104, -0.0065055419690907, -0.010378302074968815, 0.032037969678640366, 0.026438966393470764, 0.011607859283685684, 0.020395377650856972, 0.013497349806129932, 0.01451850775629282, -0.014976986683905125, 0.010683954693377018, 0.03095429204404354, 0.049043379724025726, 0.011670379899442196, 0.0011496712686493993, 0.012927724048495293, 0.012927724048495293, 0.04890444874763489, 0.0035063237883150578, 0.00930851697921753, ...]","To send in extra metadata with each record, you need to add an entry in your container metadata. Follow these steps:\n\n1. In the Arize UI, copy the value of `arize_ingestion_key`.\n2. In the Azure UI, navigate to your Container and go to Settings.\n3. Click on Metadata.\n4. Add a new entry with the key as `arize_ingestion_key` and paste the copied value as the tag value.\n\nBy adding this entry in the container metadata, Arize will be able to associate the extra metadata with each record.\n\nAdditionally, to access the custom metric editor, follow these steps:\n\n1. Go to the top navigation bar of any model in the Arize UI.\n2. Click on 'Custom Metrics' to reveal the editor.\n3. From there, you can define and configure custom metrics for your model.","[9e422f968dafc34e58ed42a7f854acaca783b8a9bdf3f4473f291586da127590, e7d446da74138324a7649f913245190fb8e48772ea8f06a207f48568392b4faf]","[0.7536101922434472, 0.7528422288110171]","\nIn your container metadata, add an entry with the key as `arize_ingestion_key` and the provided tag value.&#x20;\n\n* **In Arize UI:** Copy the `arize_ingestion_key` value.\n* **In Azure UI:** Navigate to your Container -> Settings -> Metadata.\n\nClick on Metadata and fill out the key value pair defined in the Arize UI\n\n",relevant,\nNavigate to the custom metric editor in the top navigation bar of any model. Click on 'Custom Metrics' to uncover the editor.&#x20;\n\n,irrelevant,
2,2,8c8de20d-cd63-44a6-b19f-42afc4c88070,2023-07-25T13:10:26.501630,How does Arize's surrogate explainability model work?,"[0.005659774411469698, -0.006679392419755459, -0.013258611783385277, -0.02233857847750187, -0.007001377176493406, 0.00813547894358635, 0.01830303855240345, 0.00978833343833685, -0.046480271965265274, -0.02979431301355362, 0.013215680606663227, 0.015011639334261417, -0.03162604570388794, 0.007126593496650457, -0.018274417147040367, 0.008400221355259418, 0.02647429332137108, 0.009194450452923775, 0.020277876406908035, -0.0064468481577932835, -0.013594906777143478, -0.005956715904176235, -0.025243595242500305, 0.0019193862099200487, 0.010618338361382484, 0.03943954035639763, 0.016113542020320892, -0.013773787766695023, 0.016514234244823456, -0.0004324432520661503, 0.018503384664654732, 0.03328605741262436, -0.020835984498262405, -0.014246031641960144, -0.024442212656140327, 0.0037457547150552273, 0.022109612822532654, -0.0012754169292747974, 0.013945512473583221, -0.006579219363629818, 0.02455669455230236, -0.02777654118835926, 0.00431817164644599, -0.010632648132741451, 0.00420726602897048, 0.011705930344760418, -0.003613383276388049, -0.014131547883152962, -0.014331893995404243, -0.010832994244992733, 0.034259166568517685, 0.007065773941576481, -0.042702317237854004, 0.010775752365589142, -0.01655716635286808, -0.013265767134726048, -0.009194450452923775, 0.019447872415184975, 0.02169460989534855, -0.007269697729498148, 0.04885580390691757, -0.00988850649446249, -0.01177032757550478, 0.023240135982632637, -0.007999529130756855, 0.009709625504910946, 0.006346675101667643, 0.010747131891548634, 0.0074128019623458385, 0.004819036927074194, 0.03780815377831459, 0.02004891075193882, -0.013201369903981686, -0.021923577412962914, 0.024699799716472626, -0.014560860581696033, -0.03088190406560898, -0.02233857847750187, -0.021222366020083427, 0.013780943118035793, 0.0002468549064360559, 0.004446965642273426, -0.014725430868566036, 0.005112400744110346, 0.018374590203166008, -0.005459428299218416, 0.017344240099191666, 0.020478222519159317, -0.012056536041200161, 0.01803114078938961, 0.020921846851706505, 0.011670154519379139, 0.00519468542188406, 0.015483883209526539, -0.01039652619510889, 0.008958328515291214, 0.005967448465526104, 0.021136503666639328, -0.01222826074808836, -0.017215445637702942, ...]","Arize's surrogate explainability model works by training an interpretable surrogate model to mimic the predictions of a black box model. The surrogate model approximates the predictions of the black box model as closely as possible. When the black box model is not available, SHAP values can be generated from the surrogate model. \n\nTo enable surrogate explainability in Arize, the user can pass a flag with their request using the Arize Python SDK. When the flag is enabled, a tree-based surrogate model is trained using the dataset's features and predictions. SHAP values are then generated from the surrogate model. The pandas logger in the Arize Python SDK can compute surrogate models for regression and binary classification models.\n\nThe feature importance values generated from the surrogate model are sent to the Arize platform using the Pandas logger. This allows users to monitor, troubleshoot, and explain the performance of their models. The surrogate explainability approach provides insights into the importance of different features in the model's predictions.","[e2691423028af6ef04b449d3d5404449b6bb931a1282c6e6dc3051611efebf96, 3f24f308b645d33f736d3d012f9e3a99a077a246eee448eae3c3de277277d26f]","[0.8546762236587125, 0.8425211049722136]","\nArize is the machine learning observability platform for ML practitioners to monitor, troubleshoot, and explain models. Data Science and ML Engineering teams of all sizes (from individuals to enterprises) use Arize to:\n\n* Monitor real-time model performance, with support for delayed ground truth/feedback\n* Root cause model failures/performance degradation using tracing and explainability\n* Conduct multi-model performance comparisons\n* Surface drift, data quality, and model fairness/bias metrics&#x20;\n\n",irrelevant,"\n{% hint style=""info"" %}\nView this page for more information on surrogate model feature importance values. For an example of logging surrogate model feature importance values values, check out the Arize Surrogate Model Feature Importance tutorial.\n{% endhint %}\n\n! (1).png>)\n\nSurrogate explainability is based on the idea of training a surrogate model to mimic a blackbox model, where a surrogate model is an interpretable model trained to approximate the predictions of a black box model as closely as possible. SHAP values can then be generated from the surrogate model when the blackbox model is not available. The goal is to approximate the predictions of the black box model as closely as possible and generate feature importance values from the interpretable surrogate model.\n\nArize Python SDK (specifically the pandas logger) gives the user the option to pass a flag with their request to send data that would produce SHAP values using the surrogate explainability approach. When the flag is enabled, a tree-based surrogate model is trained using the dataset's features and predictions, and SHAP values are generated from the surrogate model before sending the combined dataset to the Arize platform. The pandas logger can compute surrogate models for regression and binary classification models. For binary classification, the prediction score should have values between 0 and 1.&#x20;\n\nIn this Colab we show how to use the `surrogate_explainability` flag without calculating library to generate feature importance values from a surrogate model using only the prediction outputs from a black box model. Both classification and regression examples are provided and feature importance values are sent to Arize using the Pandas logger. The library used to create a surrogate model is only supported for regression and classification models currently.\n\n",relevant,
3,3,b6e0bd1b-4b06-44d3-ba90-711767713897,2023-07-25T13:12:18.099817,How do I use the SDK to upload a ranking model?,"[-0.009536871686577797, 0.008512362837791443, 0.013284236192703247, -0.0024701666552573442, 0.009846286848187447, 0.02992391213774681, -0.019211260601878166, -0.019087495282292366, 0.005610733292996883, -0.03492956608533859, 0.02467072568833828, -0.013057331554591656, -0.016777193173766136, -0.01948629692196846, -0.022594204172492027, -0.00867738388478756, 0.013456134125590324, 0.003432792378589511, 0.008017297834157944, 0.0022381050512194633, -0.016515908762812614, 0.017973599955439568, 0.000521708803717047, -0.014439387246966362, -0.011668399907648563, 0.015250743366777897, 0.021700337529182434, -0.01339425053447485, 0.01630963198840618, 0.002007762435823679, 0.001018492504954338, 0.011455247178673744, -0.021095257252454758, -0.008017297834157944, -0.018331145867705345, 0.012892310507595539, -0.009103690274059772, 0.008443604223430157, 0.01030009612441063, -0.02304801344871521, 0.020943988114595413, 0.018523670732975006, 0.0022191964089870453, -0.022319167852401733, -0.004476210102438927, 0.0006446155020967126, 0.005782630760222673, -0.02347431890666485, -0.011572137475013733, -0.005009092390537262, -0.021219024434685707, 0.01405433751642704, -0.028026163578033447, -0.021507810801267624, -0.0272835660725832, 0.0057860687375068665, 0.003264332888647914, 0.032564256340265274, 0.01922501251101494, -0.009358098730444908, 0.020146382972598076, -0.004098035860806704, -0.01346301008015871, -0.02137029357254505, 0.009543747641146183, -0.006717753130942583, -0.0023498383816331625, -0.008113560266792774, -0.00468248687684536, 0.0010442771017551422, 0.03674480319023132, 0.016515908762812614, -0.014274366199970245, -0.00344310631044209, 0.02953886054456234, 0.014893196523189545, -0.007095927372574806, -0.0130229527130723, -0.0029514795169234276, -0.003675167914479971, 0.00533913541585207, -0.016268376260995865, -0.00961250625550747, 0.0016949090640991926, 0.01498945988714695, 0.003070088801905513, 0.02622467838227749, 0.03297680988907814, -0.00771475862711668, -0.018193628638982773, 0.03839501738548279, -0.0005238575395196676, 0.0015066813211888075, -0.022277912124991417, -0.025812124833464622, -0.011936560273170471, 0.019582560285925865, 0.01680469699203968, 0.014893196523189545, -0.009591879323124886, ...]","To use the SDK to upload a ranking model, you can follow these steps:\n\n1. Install the SDK: Start by installing the SDK for the programming language you are using. The SDK provides the necessary tools and libraries to interact with the Arize platform.\n\n2. Import the SDK: Import the SDK into your project by adding the appropriate import statements or dependencies.\n\n3. Authenticate: Use your Arize API key to authenticate your SDK client. This key can be obtained from your Arize account.\n\n4. Create a model: Use the SDK to create a model object that represents your ranking model. Provide the necessary information such as the model name, description, and any other relevant details.\n\n5. Upload the model: Use the SDK's upload function to upload your ranking model to the Arize platform. This function typically takes the model object created in the previous step as a parameter.\n\n6. Verify the upload: After the upload is complete, you can verify the success of the operation by checking the response from the upload function. It should indicate whether the upload was successful or if any errors occurred.\n\nBy following these steps, you should be able to use the SDK to upload your ranking model to the Arize platform. This will enable you to track and monitor the performance of your model and make improvements as needed.","[096f842ed9e4be0a4d61c7e7b3ac9b54d88fa02ebcf1fbe4cc93cc161c84246f, 98797bdfd3ea9102fe5438559c80b04571a95a5f57f3005b60f0e0a60cabcb8b]","[0.8024018669959406, 0.7964861566463088]","\nRanking models are used by search engines to display query results ranked in the order of the highest relevance. These predictions seek to maximize user actions that are then used to evaluate model performance.&#x20;\n\nThe complexity within a ranking model makes failures challenging to pinpoint as a model’s dimensions expand per recommendation. Notable challenges within ranking models include upstream data quality issues, poor-performing segments, the cold start problem, and more. &#x20;\n\n\n\n",irrelevant,"\n**Use the 'arize-demo-hotel-ranking' model, available in all free accounts, to follow along.**&#x20;\n\n",irrelevant,
4,4,6201d0a5-b389-4392-8c8d-b75794e27bce,2023-07-25T13:12:28.255549,What drift metrics are supported in Arize?,"[-0.009346794337034225, -0.00042854511411860585, -0.0009247552370652556, -0.017379986122250557, -0.0047960965894162655, 0.030313927680253983, -0.016629355028271675, 0.006423665676265955, -0.027657851576805115, -0.017495466396212578, 0.01267410907894373, 0.019270997494459152, -0.01836157962679863, -0.005113671068102121, -0.01797182857990265, 0.005756037775427103, 0.0026596863754093647, 0.003597974544391036, 0.02191263996064663, -0.006073612254112959, -0.015792112797498703, -0.017293374985456467, -0.003929984290152788, 0.008646687492728233, -0.011115106754004955, 0.0181161817163229, 0.01463007926940918, -0.005117279943078756, 0.015128093771636486, -0.02220134437084198, 0.04408511146903038, 0.027499062940478325, -0.00033742288360372186, 0.014211458154022694, -0.034817710518836975, -0.011475986801087856, 0.007636222988367081, 0.020497988909482956, 0.028119778260588646, -0.011779126711189747, 0.0045903949066996574, -0.017235632985830307, 0.0013929972192272544, -0.005752428900450468, 0.00744856521487236, 0.011887390166521072, 0.0019072513096034527, -0.00457235099747777, -0.016066381707787514, 0.0014363027876242995, 0.01836157962679863, 0.011937913484871387, -0.030573762953281403, -0.015359057113528252, 0.012652456760406494, -0.01846262626349926, 0.011591468937695026, 0.023904697969555855, -0.001355104730464518, 0.0010952710872516036, 0.0184770617634058, 0.0051641943864524364, -0.02142184227705002, 0.02220134437084198, -0.01135328784584999, 0.012407057918608189, -0.011685297824442387, 0.024020180106163025, -0.010862491093575954, -0.0012973640114068985, 0.03210389241576195, 0.005229152739048004, -0.003307466162368655, -0.007004682905972004, 0.011649210005998611, -0.03346080332994461, 0.006892810110002756, -0.02184046432375908, 0.015950899571180344, 0.023255113512277603, 0.012883419170975685, 0.004918795544654131, -0.021349666640162468, -1.1397326488804538e-05, 0.005903998389840126, 0.020815564319491386, 0.007686746306717396, 0.011512075550854206, 0.004507392644882202, 0.0072320373728871346, 0.020194850862026215, 0.0012766133295372128, 0.016817012801766396, 0.0055755977518856525, 0.0018801853293552995, 0.0036990209482610226, 0.012399840168654919, -0.0030837205704301596, -0.00024336851492989808, -0.006593279540538788, ...]","Arize supports various drift metrics including Population Stability Index, KL Divergence, and Wasserstein Distance. These metrics are used to measure distributional drift between the model's production values and a baseline dataset. Users can configure the baseline to be a pre-production dataset (such as training, test, or validation) or a fixed or moving time period from production. Baselines can be saved in Arize for comparing different versions or environments across different time windows. For more information on baselines, you can visit the provided link.","[4ba270d946086c744bf09cbb8ceaf5180803458c9f6d5f2531dcdcda0991538b, b40b5c5c61d22f0599e00702092bd4966a01689922311c647f2127fb39720c84]","[0.8768752151560731, 0.873443430009533]","\nDrift monitors measure distribution drift, which is the difference between two statistical distributions.&#x20;\n\nArize offers various distributional drift metrics to choose from when setting up a monitor. Each metric is tailored to a specific use case; refer to this guide to help choose the appropriate metric for various ML use cases.\n\n",relevant,"\nArize calculates drift metrics such as Population Stability Index, KL Divergence, and Wasserstein Distance. Arize computes drift by measuring distribution changes between the model’s production values and a baseline (reference dataset). Users can configure a baseline to be any time window of a:\n\n1. Pre-production dataset (training, test, validation) or\n2. Fixed or moving time period from production (e.g. last 30 days, last 60 days).&#x20;\n\nBaselines are saved in Arize so that users can compare several versions and/or environments against each other across moving or fixed time windows. For more details on baselines, visit here.\n\n",relevant,


In addition to the columns of the previous dataframe, this data has a few additional fields:

- **user_feedback:** approval or rejection from the user (-1 means thumbs down, +1 means thumbs up)
- **openai_relevance_0:** a binary classification (relevant vs. irrelevant) by GPT-4 predicting whether the first retrieved document is relevant to the query
- **openai_relevance_1:** a binary classification (relevant vs. irrelevant) by GPT-4 predicting whether the second retrieved document is relevant to the query

We'll go over how to compute the relevance classifications in section 6.

The query and database datasets are drawn from different distributions; the queries are short questions while the database entries are several sentences to a paragraph. The embeddings from OpenAI's "text-embedding-ada-002" capture these differences and naturally separate the query and context embeddings into distinct regions of the embedding space. When using Phoenix, you want to "overlay" the query and context embedding distributions so that queries appear close to their retrieved context in the Phoenix point cloud. To achieve this, we compute a centroid for each dataset that represents an average point in the embedding distribution and center the two distributions so they overlap.

In [None]:
database_embedding_column_name = "text_vector"
database_centroid = database_df[database_embedding_column_name].mean()
database_df[database_embedding_column_name] = database_df[database_embedding_column_name].apply(
    lambda x: x - database_centroid
)
query_embedding_column_name = ":feature.[float].embedding:prompt"
query_centroid = query_df[query_embedding_column_name].mean()
query_df[query_embedding_column_name] = query_df[query_embedding_column_name].apply(
    lambda x: x - query_centroid
)

## 6. Run LLM-Assisted Evaluations

Cosine similarity and Euclidean distance are reasonable proxies for retrieval quality, but they don't always work perfectly. A novel idea is to use LLMs to measure retrieval quality by simply asking the LLM whether each retrieved document is relevant to the corresponding query.

💭 Use OpenAI to predict whether each retrieved document is relevant or irrelevant to the query.

⚠️ It's strongly recommended to use GPT-4 for evaluations if you have access.

In [38]:
EVALUATION_SYSTEM_MESSAGE = "You will be given a query and a reference text. You must determine whether the reference text contains an answer to the input query. Your response must be binary (0 or 1) and should not contain any text or characters aside from 0 or 1. 0 means that the reference text does not contain an answer to the query. 1 means the reference text contains an answer to the query."
QUERY_CONTEXT_PROMPT_TEMPLATE = """# Query: {query}

# Reference: {reference}

# Binary: """
num_retrieved_documents = 2


@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def evaluate_query_and_retrieved_context(query: str, context: str, model_name: str) -> str:
    prompt = QUERY_CONTEXT_PROMPT_TEMPLATE.format(
        query=query,
        reference=context,
    )
    response = openai.ChatCompletion.create(
        messages=[
            {"role": "system", "content": EVALUATION_SYSTEM_MESSAGE},
            {"role": "user", "content": prompt},
        ],
        model=model_name,
    )
    return response["choices"][0]["message"]["content"]


def evaluate_retrievals(
    retrievals_data: Dict[str, str],
    model_name: str,
) -> List[str]:
    responses = []
    for query, retrieved_context in tqdm(retrievals_data.items()):
        response = evaluate_query_and_retrieved_context(query, retrieved_context, model_name)
        responses.append(response)
    return responses


def process_binary_responses(
    binary_responses: List[str], binary_to_string_map: Dict[int, str]
) -> List[str]:
    """
    Parse binary responses and convert to the desired format
    converts them to the desired format. The binary_to_string_map parameter
    should be a dictionary mapping binary values (0 or 1) to the desired
    string values (e.g. "irrelevant" or "relevant").
    """
    processed_responses = []
    for binary_response in binary_responses:
        try:
            binary_value = int(binary_response.strip())
            processed_response = binary_to_string_map[binary_value]
        except (ValueError, KeyError):
            processed_response = None
        processed_responses.append(processed_response)
    return processed_responses


# sample_query_df = query_df.head(10).copy()
# evaluation_model_name = "gpt-3.5-turbo"  # change to GPT-4 if you have access
evaluation_model_name = "gpt-4"
document_id_to_text = dict(zip(database_df["document_id"], database_df["text"]))
query_texts = sample_query_df[":feature.text:prompt"].to_list()
for retrieved_document_index in range(num_retrieved_documents):
    retrieved_document_ids = [
        doc_ids[retrieved_document_index]
        for doc_ids in sample_query_df[":feature.[str].retrieved_document_ids:prompt"].to_list()
    ]
    retrieved_document_texts = [document_id_to_text[doc_id] for doc_id in retrieved_document_ids]
    retrievals_data = dict(zip(query_texts, retrieved_document_texts))
    raw_responses = evaluate_retrievals(retrievals_data, evaluation_model_name)
    processed_responses = process_binary_responses(raw_responses, {0: "irrelevant", 1: "relevant"})
    sample_query_df[
        f"retrieved_document_text_{retrieved_document_index}"
    ] = retrieved_document_texts
    sample_query_df[f"openai_relevance_{retrieved_document_index}"] = processed_responses


sample_query_df[
    [
        ":feature.text:prompt",
        "retrieved_document_text_0",
        "retrieved_document_text_1",
        "openai_relevance_0",
        "openai_relevance_1",
    ]
].rename(columns={":feature.text:prompt": "prompt"}).head()


100%|██████████| 155/155 [01:43<00:00,  1.49it/s]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sample_query_df[
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sample_query_df[f"openai_relevance_{retrieved_document_index}"] = processed_responses
100%|██████████| 155/155 [01:41<00:00,  1.53it/s]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sa

Unnamed: 0,prompt,retrieved_document_text_0,retrieved_document_text_1,openai_relevance_0,openai_relevance_1
0,What's the difference between primary and baseline datasets?,"\nA model baseline refers to a reference dataset used for comparison with your current data. A model baseline can be from training, validation, or previous periods in production. Baselines help compare model changes, enable detailed analysis, and help identify the root causes of performance degradation. **A model baseline is required in Arize**.\n\nBy default, model baselines use a moving time range from your model's **production data**. This data spans a period of two weeks and is delayed by three days. However, you have the flexibility to choose a baseline from either production data or pre-production data, which includes both training and validation sets.\n\nProduction Baseline#production-baselinePre-Production Baseline#pre-production-baselines\n\n","\nA baseline is the reference data or benchmark used to compare model performance against for monitoring purposes. Baselines can be training data, validation data, prior time periods of production data, a previous model version, among others.\n\n",irrelevant,irrelevant
1,How do I send in extra metadata with each record?,"\nIn your container metadata, add an entry with the key as `arize_ingestion_key` and the provided tag value.&#x20;\n\n* **In Arize UI:** Copy the `arize_ingestion_key` value.\n* **In Azure UI:** Navigate to your Container -> Settings -> Metadata.\n\nClick on Metadata and fill out the key value pair defined in the Arize UI\n\n",\nNavigate to the custom metric editor in the top navigation bar of any model. Click on 'Custom Metrics' to uncover the editor.&#x20;\n\n,relevant,irrelevant
2,How does Arize's surrogate explainability model work?,"\nArize is the machine learning observability platform for ML practitioners to monitor, troubleshoot, and explain models. Data Science and ML Engineering teams of all sizes (from individuals to enterprises) use Arize to:\n\n* Monitor real-time model performance, with support for delayed ground truth/feedback\n* Root cause model failures/performance degradation using tracing and explainability\n* Conduct multi-model performance comparisons\n* Surface drift, data quality, and model fairness/bias metrics&#x20;\n\n","\n{% hint style=""info"" %}\nView this page for more information on surrogate model feature importance values. For an example of logging surrogate model feature importance values values, check out the Arize Surrogate Model Feature Importance tutorial.\n{% endhint %}\n\n! (1).png>)\n\nSurrogate explainability is based on the idea of training a surrogate model to mimic a blackbox model, where a surrogate model is an interpretable model trained to approximate the predictions of a black box model as closely as possible. SHAP values can then be generated from the surrogate model when the blackbox model is not available. The goal is to approximate the predictions of the black box model as closely as possible and generate feature importance values from the interpretable surrogate model.\n\nArize Python SDK (specifically the pandas logger) gives the user the option to pass a flag with their request to send data that would produce SHAP values using the surrogate explainability approach. When the flag is enabled, a tree-based surrogate model is trained using the dataset's features and predictions, and SHAP values are generated from the surrogate model before sending the combined dataset to the Arize platform. The pandas logger can compute surrogate models for regression and binary classification models. For binary classification, the prediction score should have values between 0 and 1.&#x20;\n\nIn this Colab we show how to use the `surrogate_explainability` flag without calculating library to generate feature importance values from a surrogate model using only the prediction outputs from a black box model. Both classification and regression examples are provided and feature importance values are sent to Arize using the Pandas logger. The library used to create a surrogate model is only supported for regression and classification models currently.\n\n",irrelevant,relevant
3,How do I use the SDK to upload a ranking model?,"\nRanking models are used by search engines to display query results ranked in the order of the highest relevance. These predictions seek to maximize user actions that are then used to evaluate model performance.&#x20;\n\nThe complexity within a ranking model makes failures challenging to pinpoint as a model’s dimensions expand per recommendation. Notable challenges within ranking models include upstream data quality issues, poor-performing segments, the cold start problem, and more. &#x20;\n\n\n\n","\n**Use the 'arize-demo-hotel-ranking' model, available in all free accounts, to follow along.**&#x20;\n\n",irrelevant,irrelevant
4,What drift metrics are supported in Arize?,"\nDrift monitors measure distribution drift, which is the difference between two statistical distributions.&#x20;\n\nArize offers various distributional drift metrics to choose from when setting up a monitor. Each metric is tailored to a specific use case; refer to this guide to help choose the appropriate metric for various ML use cases.\n\n","\nArize calculates drift metrics such as Population Stability Index, KL Divergence, and Wasserstein Distance. Arize computes drift by measuring distribution changes between the model’s production values and a baseline (reference dataset). Users can configure a baseline to be any time window of a:\n\n1. Pre-production dataset (training, test, validation) or\n2. Fixed or moving time period from production (e.g. last 30 days, last 60 days).&#x20;\n\nBaselines are saved in Arize so that users can compare several versions and/or environments against each other across moving or fixed time windows. For more details on baselines, visit here.\n\n",relevant,relevant


## 7. Compute Ranking Metrics

Now that you know whether each piece of retrieved context is relevant or irrelevant to the corresponding query, you can compute precision@k for k = 1, 2 for each query. This metric tells you what percentage of the retrieved context is relevant to the corresponding query.

precision@k = (# of top-k retrieved documents that are relevant) / (k retrieved documents)

If your precision@2 is greater than zero for a particular query, your LlamaIndex application successfully retrieved at least one relevant piece of context with which to answer the query. If the precision@k is zero for a particular query, that means that no relevant piece of context was retrieved.

Compute precision@k for k = 1, 2 and view the results.

In [None]:
num_retrieved_documents = 2
num_relevant_documents_array = np.zeros(len(query_df))
for retrieved_context_index in range(0, num_retrieved_documents):
    current_num_retrieved_documents = retrieved_context_index + 1
    num_relevant_documents_array += (
        query_df[f"openai_relevance_{retrieved_context_index}"]
        .map(lambda x: int(x == "relevant"))
        .to_numpy()
    )
    query_df[f"openai_precision_at_{current_num_retrieved_documents}"] = pd.Series(
        num_relevant_documents_array / current_num_retrieved_documents
    )

query_df[
    [
        "openai_relevance_0",
        "openai_relevance_1",
        "openai_precision_at_1",
        "openai_precision_at_2",
    ]
]


## 8. Launch Phoenix

Define your knowledge base dataset with a schema that specifies the meaning of each column (features, predictions, actuals, tags, embeddings, etc.). See the [docs](https://docs.arize.com/phoenix/) for guides on how to define your own schema and API reference on `phoenix.Schema` and `phoenix.EmbeddingColumnNames`.

In [None]:
database_schema = px.Schema(
    prediction_id_column_name="document_id",
    prompt_column_names=px.EmbeddingColumnNames(
        vector_column_name="text_vector",
        raw_data_column_name="text",
    ),
)
database_ds = px.Dataset(
    dataframe=database_df,
    schema=database_schema,
    name="database",
)

Define your query dataset. Because the query dataframe is in OpenInference format, Phoenix is able to infer the meaning of each column without a user-defined schema by using the `phoenix.Dataset.from_open_inference` class method.

In [None]:
query_schema = px.Schema(
    prediction_id_column_name=":id.id:",
    timestamp_column_name=":timestamp.iso_8601:",
    prompt_column_names=px.RetrievalEmbeddingColumnNames(
        vector_column_name="prompt_embedding",
        raw_data_column_name=":feature.text:prompt",
        context_retrieval_ids_column_name=":feature.[str].retrieved_document_ids:prompt",
        context_retrieval_scores_column_name=":feature.[float].retrieved_document_scores:prompt",
    ),
    response_column_names=":prediction.text:response",
    tag_column_names=[
        "openai_relevance_0",
        "openai_relevance_1",
        "openai_precision_at_1",
        "openai_precision_at_2",
        "user_feedback",
    ],
)
query_ds = px.Dataset(
    query_df.rename(columns={":feature.[float].embedding:prompt": "prompt_embedding"}),
    schema=query_schema,
    name="query",
)

Launch Phoenix. Follow the instructions in the cell output to open the Phoenix UI.

In [None]:
session = px.launch_app(primary=query_ds, corpus=database_ds)


## 9. Surface Problematic Clusters and Data Points

Select `centered_text_vector` in the embeddings section of the Phoenix homepage.

![select center_text_vector in embeddings section](http://storage.googleapis.com/arize-assets/phoenix/assets/docs/notebooks/search-and-retrieval/llama-index/select_centered_text_vector.png)

Phoenix helps you:

- reduces the dimensionality of your embeddings for visualization
- color the resulting point cloud using evaluation metrics and other color strategies
- cluster the points and surface up problematic clusters based on whatever metric you care about

Notice that, by default, the data points are colored based on the dataset they belong to (query vs. database) and the clusters are sorted based on "purity," with clusters containing all or mostly query data appearing near the top. If a cluster contains all query data, that's potentially problematic, because it's more likely that your database does not contain relevant context to answer the query.

![notice default coloring and cluster sorting strategies](http://storage.googleapis.com/arize-assets/phoenix/assets/docs/notebooks/search-and-retrieval/llama-index/notice_default_color_and_cluster_sorting.png)

Select the top cluster and examine the queries and responses from the chatbot. Notice this cluster contains questions around pricing and cost of the Arize platform. It turns out, the Arize documentation does not contain any information on pricing, so this cluster of queries is far from any database entries.

![select and examine top cluster by purity](http://storage.googleapis.com/arize-assets/phoenix/assets/docs/notebooks/search-and-retrieval/llama-index/select_and_examine_top_cluster_by_purity.png)

Now let's investigate clusters of queries/ responses that received low user feedback. Change the coloring of the data by selecting `Color By` > `dimension` > `user_feedback`.

![color by user feedback](http://storage.googleapis.com/arize-assets/phoenix/assets/docs/notebooks/search-and-retrieval/llama-index/color_by_user_feedback.png)

Sort your clusters by average user feedback.

![sort clusters by average user feedback](http://storage.googleapis.com/arize-assets/phoenix/assets/docs/notebooks/search-and-retrieval/llama-index/sort_clusters_by_average_user_feedback.png)

Sort your clusters in ascending order.

![sort clusters in ascending order](http://storage.googleapis.com/arize-assets/phoenix/assets/docs/notebooks/search-and-retrieval/llama-index/sort_clusters_in_ascending_order.png)

Select the top cluster by average user feedback. Notice that the same pricing cluster is being surfaced as problematic, this time, because your users are giving thumbs down to the responses from this cluster.

![select top cluster by average user feedback](http://storage.googleapis.com/arize-assets/phoenix/assets/docs/notebooks/search-and-retrieval/llama-index/select_top_cluster_by_average_user_feedback.png)

Now color your data by "openai_precision@2" and sort your clusters in ascending order by average "openai_precision@2". You should see that each entry of the pricing cluster you investigated before has a precision@2 of 0, meaning that no relevant context was retrieved for these queries. You were able to discover this completely automatically using Phoenix with LLM-assisted ranking metrics.

Congrats! You've identified a problematic cluster of queries. You now have tools at your disposal to investigate clusters of queries where your search and retrieval application is performing poorly based

- query purity
- user feedback
- LLM-assisted ranking metrics

As an actionable next step, you should augment your knowledge base to include information about the pricing and cost of the Arize platform, since your users seem especially interested in this topic.