<img src="https://github.com/comet-ml/opik/blob/main/apps/opik-documentation/documentation/static/img/opik-logo.svg?raw=true" width="200" height="100" alt="Opik Logo">

# Comet Assistant: RAG with Opik with Azure OpenAI

The below example walks through the process of building a simple RAG application with OpenAI and langchain, and evaluating the application with Opik.

The concepts covered in this tutorial include:

1. Setting up a simple vector store and RAG pipeline with langchain
2. Defining an assistant application using this RAG pipeline and the OpenAI API
3. Creating a dataset of questions for evaluation in Opik
4. Automating the evaluation of the application on the dataset using Opik metrics
5. Calculate metrics using an Azure OpenAI model

## Creating an account on Comet.com

[Comet](https://www.comet.com/site?from=llm&utm_source=opik&utm_medium=colab&utm_content=langchain&utm_campaign=opik) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm&utm_source=opik&utm_medium=colab&utm_content=langchain&utm_campaign=opik) and grab you API Key.

> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/overview/?from=llm&utm_source=opik&utm_medium=colab&utm_content=langchain&utm_campaign=opik) for more information.

In [None]:
%pip install --upgrade --quiet opik openai azure-identity langsmith langchain-community langchain chromadb tiktoken langchain_openai

In [None]:
# from opik import Opik, track
import opik
from opik.evaluation import evaluate, models
from opik.evaluation.metrics import AnswerRelevance, LevenshteinRatio

In [None]:
opik.configure(use_local=False)

OPIK: Existing Opik clients will not use updated values for "url", "api_key", "workspace".
OPIK: Opik is already configured. You can check the settings by viewing the config file at /root/.opik.config


# Setup the vector store for RAG

In [None]:
import os
import getpass

os.environ["AZURE_API_BASE"] = "https://comet-test-open-ai.openai.azure.com/"
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://comet-test-open-ai.openai.azure.com/"
os.environ["AZURE_API_VERSION"] = "2024-05-01-preview"

if "AZURE_API_KEY" not in os.environ:
    os.environ["AZURE_API_KEY"] = getpass.getpass("Enter your Azure OpenAI API key: ")

os.environ["AZURE_OPENAI_API_KEY"] = os.environ["AZURE_API_KEY"]
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_API_VERSION"]

**Set up Vector Store and Retriever.**

The below code sets up a vector store using [Chroma](https://www.trychroma.com/). Here we are loading Comet SDK reference documentation.

In [None]:
from bs4 import BeautifulSoup as Soup
from langchain_community.vectorstores import Chroma
from langchain_openai import AzureOpenAIEmbeddings

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader

# Load
url = "https://www.comet.com/docs/v2/api-and-sdk/python-sdk/reference/Experiment/"
loader = RecursiveUrlLoader(
    url=url, max_depth=20, extractor=lambda x: Soup(x, "html.parser").text
)
docs = loader.load()

# Split
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = splitter.split_documents(docs)

# Embed
vectorstore = Chroma.from_documents(documents=splits, embedding=AzureOpenAIEmbeddings(model="text-embedding-3-large"))

# Index
retriever = vectorstore.as_retriever()

#Define RAG Application
The below code defines our LLM application. In this case, we create a Comet bot that 1) retrieves relevant context from our vector store based on the input 2) inputs the relevant question + user question into OpenAI to retrieve a response.

In order to ensure that the OpenAI API calls are being tracked, we will be using the `track_openai` function from the Opik library. We will also use the `track` decorator to ensure each step of the application is tracked.

In [None]:
from openai import AzureOpenAI
from opik.integrations.openai import track_openai
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

# Initialize Azure OpenAI Service client with Entra ID authentication
token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)

PROJECT_NAME = "comet-assistant"

class CometBot:
    def __init__(self, retriever, model: str = "gpt-4o-mini"):
        self._retriever = retriever
        self._client = track_openai(AzureOpenAI(), project_name = PROJECT_NAME)
        self._model = model

    @opik.track(project_name=PROJECT_NAME)
    def retrieve_docs(self, question):
        return self._retriever.invoke(question)

    @opik.track(project_name=PROJECT_NAME)
    def get_answer(self, question: str, system: str):
        docs_retrieved = self.retrieve_docs(question)
        response = self._client.chat.completions.create(
            model=self._model,
            messages=[
                {
                    "role": "system",
                    "content": f"{system}"
                    "\n Use the following docs to produce the answer to the question.\n\n"""
                    f"## Docs\n\n{docs_retrieved}",
                },
                {"role": "user", "content": question},
            ],
        )

        return {
            "response": response.choices[0].message.content,
            "context": [str(doc) for doc in docs_retrieved],
        }


rag_bot = CometBot(retriever)

Testing the Bot and the Retriever with one system prompt

In [None]:
system_prompt = "You are a Comet expert. You love explaining Comet concepts. Keep the answers short."
response = rag_bot.get_answer("How can I log a system metric in Comet?", system_prompt)
print("Response:")
response["response"]

Response:


"You can log a system metric in Comet using the `log_system_info` method. Here's a basic example:\n\n```python\nimport comet_ml\n\ncomet_ml.login()\nexp = comet_ml.start()\n\nexp.log_system_info(key='metric_name', value='metric_value')\n\nexp.end()\n```\n\nReplace `'metric_name'` and `'metric_value'` with your desired key and value."

# Creating a Dataset: Comet Questions

Below we define a standard set of questions that we would like to evaluate the assistant on.

In [None]:
dataset_items = [
  {
    "question": "How do I log a hyperparameter to Comet?",
    "expected_answer": "You can log a hyperparameter to a Comet experiment with the log_parameter() method. Example: experiment.log_parameter('learning-rate', .02) "
  },
  {
    "question": "How do I log a metric to my Comet experiment?",
    "expected_answer": "You can log a hyperparameter to a Comet experiment with the log_metric() method. Example: experiment.log_metric('accuracy', .95)"
  },
  {
    "question": "How do I tag my Comet experiment?",
    "expected_answer": "You can tag your Comet experiment with the add_tag() method. Example: experiment.add_tag('baseline')"
  },
  {
    "question": "How do I rename my Comet experiment?",
    "expected_answer": "You can rename your Comet experiment with the set_name() method. Example: experiment.set_name('experiment-1')"
  },
  {
    "question": "How do I rename my Comet experiment?",
    "expected_answer": "You can rename your Comet experiment with the set_name() method. Example: experiment.set_name('experiment-1')"
  },
  {
    "question": "How do I use an existing artifact in a new Comet experiment?",
    "expected_answer": "You can use the get_artifact() method to get an existing artifact in a new Comet experiment. Example: experiment.get_artifact('my-artifact', version_or_alias = '1.0.0')"
  }]

Now that we have our dataset, we can create a dataset in Opik and insert the questions into it.

In [None]:
# Get or create a dataset
client = opik.Opik()

dataset = client.get_or_create_dataset(name="Comet_Questions",
                                       description="Questions about the Comet SDK")

# Inserting will not duplicate entries
dataset.insert(dataset_items)

# Evaluating the Assistant

In order to ensure our RAG application is working correctly and determine the system prompt to use in production, we will test it on our dataset with 3 different system prompts.

For this we will be using the `evaluate` function from the `opik` library. We will evaluate the application on two metrics: Hallucination and AnswerRelevance.

**Step 1: Fetch the dataset for evaluation**

In [None]:
client = opik.Opik()

dataset = client.get_dataset(name="Comet_Questions")

**Step 2: Define the system prompt to test**




In [None]:
system_prompt = opik.Prompt(
    name="Comet SDK Assistant - System Prompt",
    prompt="""
        You are an instructor for technical executives that want to extract value of AI models.
        If you know the answer to the question, respond by stating that it is possible to do what is being asked,
        but without going into technical details on how to do it.
        Make sure you include in your answer:
        - A description of the lifecycle of a machine learning model
        - Where in this lifecycle the current question is relevant
        - The business benefits of implementing the provided answer
        - An estimation of the time and cost of implementing the provided answer
        """.rstrip().lstrip()
)

**Step 3: Define Evaluation Task**

The evaluation task maps each input to the retrieved context and LLM output. These values will be used by Opik when calculating the metrics defined in the next step.

In [None]:
def evaluation_task(x):
    full_response = rag_bot.get_answer(x['question'], system_prompt.format())
    response = full_response["response"]
    context = full_response["context"]
    return {
        "response": response,
        "context": context
    }

**Step 4: Define Metrics**

Here we use Comet's built-in [Levenshtein Ratio](https://www.comet.com/docs/opik/evaluation/metrics/heuristic_metrics#levenshteinratio) and [AnswerRelevance](https://www.comet.com/docs/opik/evaluation/metrics/answer_relevance) metrics. We will use azure/gpt-4o as the model to compute the answer relevance metric


In [None]:
# Define the model
model = models.LiteLLMChatModel(model_name="azure/gpt-4o") # azure endpoint & api version already provided in the environment

# Define the metrics
answerrelevance_metric = AnswerRelevance(name="AnswerRelevance", model=model)
levenshteinratio_metric = LevenshteinRatio(name="LevenshteinRatio")

**Step 5: Run the evaluation**

Input the dataset, experiment config, evaluation task, and metrics into Opik's `evaluate` to run the evaluation.

In [None]:
TEST_ID = "50"

experiment_config = {"model": "gpt-4o-mini"}
experiment_name = f"comet-assistant-{TEST_ID}"

res = evaluate(
    dataset=dataset,
    experiment_name=experiment_name,
    experiment_config=experiment_config,
    project_name=f"{PROJECT_NAME}-{TEST_ID}",
    task=evaluation_task,
    prompt=system_prompt,
    scoring_metrics=[answerrelevance_metric,
                     levenshteinratio_metric],
    scoring_key_mapping={
        "input": "question",
        "output": "response",
        "reference": "expected_answer"
    }
)

Evaluation:   0%|          | 0/5 [00:00<?, ?it/s]OPIK: Started logging traces to the "comet-assistant-50" project at https://www.comet.com/opik/benjtlv/redirect/projects?name=comet-assistant-50.
Evaluation: 100%|██████████| 5/5 [00:06<00:00,  1.27s/it]


The evaluation results are now uploaded to the Opik platform and can be viewed in the UI.

# Evaluating the Assistant (II)

Prompt Engineering is an iterative process. Let's try a different system prompt.

In [None]:
system_prompt = opik.Prompt(
    name="Comet SDK Assistant - System Prompt",
    prompt="""
        You are a Comet expert expert and know how to explain Comet SDK concepts in simple terms.
        Keep the answers short and don't try to make up answers that you don't know.
        """.rstrip().lstrip()
)

In [None]:
TEST_ID = "51"

experiment_config = {"model": "gpt-4o-mini"}
experiment_name = f"comet-assistant-{TEST_ID}"

res = evaluate(
    dataset=dataset,
    experiment_name=experiment_name,
    experiment_config=experiment_config,
    project_name=f"{PROJECT_NAME}-{TEST_ID}",
    task=evaluation_task,
    prompt=system_prompt,
    scoring_metrics=[answerrelevance_metric,
                     levenshteinratio_metric],
    scoring_key_mapping={
        "input": "question",
        "output": "response",
        "reference": "expected_answer"
    }
)

Evaluation:   0%|          | 0/5 [00:00<?, ?it/s]OPIK: Started logging traces to the "comet-assistant-51" project at https://www.comet.com/opik/benjtlv/redirect/projects?name=comet-assistant-51.
Evaluation: 100%|██████████| 5/5 [00:08<00:00,  1.60s/it]
