### Sampling traces for hallucination detection with RAGAS 


Building off part 1, we'll now add sampling to the ragas faithfulness evaluator. Sampling allows you configure how often evaluators run on particular spans.

### Setup and Pre-requisites

Make sure you've followed the instructions in the `README` file to set up your environment to enable LLM Observability.

We'll also need to install some dependencies for this tutorial

In [2]:
%pip install llama-index=="0.10.42" ragas --quiet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


### Enabling sampling for Ragas

In addition to enabling the `ragas_faithfulness` evaluator, we'll also specify two sampling rules.

1. Rule 1 - the `ragas_faithfulness` evaluation should be run 50% percent of the time on the LLM span named `augmented_generation`.

`{'sample_rate': 0.5, 'evaluator_label': 'ragas_faithfulness', 'span_name': 'augmented_generation'}`


2. Rule 2 - don't run any evaluations on any other llm spans.

`{'sample_rate': 0}`

We'll set these rules via the `_DD_LLMOBS_EVALUATOR_SAMPLING_RULES` environment variable.

In [1]:
import os

os.environ["_DD_LLMOBS_EVALUATORS"] = "ragas_faithfulness"
os.environ["_DD_LLMOBS_EVALUATOR_SAMPLING_RULES"] = (
    '[{"sample_rate": 0.5, "evaluator_label": "ragas_faithfulness", "span_name": "augmented_generation"}, {"sample_rate": 0}]'
)

Enabling tracing

In [3]:
from ddtrace.llmobs import LLMObs

LLMObs.enable(ml_app="support-bot", agentless_enabled=True)

[EvaluatorRunnerSamplingRule(sample_rate=0.5, evaluator_label=ragas_faithfulness, span_name=augmented_generation), EvaluatorRunnerSamplingRule(sample_rate=0.0, evaluator_label=<object object at 0x10695be60>, span_name=<object object at 0x10695be60>)]
[EvaluatorRunnerSamplingRule(sample_rate=0.5, evaluator_label=ragas_faithfulness, span_name=augmented_generation), EvaluatorRunnerSamplingRule(sample_rate=0.0, evaluator_label=<object object at 0x10695be60>, span_name=<object object at 0x10695be60>)]


### Create & instrument your RAG Application

Create & instrument the RAG App just like we did in part 1.

In [4]:
doc_names = [
    "_index",
    "api",
    "auto_instrumentation",
    "core_concepts",
    "quickstart",
    "sdk",
    "span_kinds",
    "submit_evaluations",
    "trace_an_llm_application",
]
raw_doc_source_url = "https://raw.githubusercontent.com/DataDog/documentation/master/content/en/llm_observability"

import requests
from llama_index.core import Document
from llama_index.core.node_parser import MarkdownNodeParser
from llama_index.core import Document
from llama_index.core import VectorStoreIndex

raw_doc_texts = []
for doc_name in doc_names:
    doc = requests.get(f"{raw_doc_source_url}/{doc_name}.md")
    raw_doc_texts.append(Document(text=doc.text))
parser = MarkdownNodeParser()
base_nodes = parser.get_nodes_from_documents(raw_doc_texts)

TOP_K = 2

base_index = VectorStoreIndex(base_nodes)
base_retriever = base_index.as_retriever(similarity_top_k=TOP_K)

In [5]:
from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import workflow
from ddtrace.llmobs.utils import Prompt


from openai import OpenAI

oai_client = OpenAI()

prompt_template = """
You are an engineer meant to answer support questions about a software product.
The product is LLM Observability by Datadog, a monitoring solution for LLM applications.

You have access to the following reference information: "{context}"
"""


def augmented_generation(question, context):
    with LLMObs.annotation_context(
        prompt=Prompt(variables={"context": context}),
        name="augmented_generation",
    ):
        answer = (
            oai_client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=[
                    {
                        "role": "system",
                        "content": prompt_template.format(context=context),
                    },
                    {
                        "role": "user",
                        "content": question,
                    },
                ],
            )
            .choices[0]
            .message.content
        )
        return answer


@workflow
def ask_docs(question):
    nodes = base_retriever.retrieve(question)
    context = " ".join([node.text for node in nodes])
    answer = augmented_generation(question, context)
    LLMObs.annotate(input_data=question, output_data=answer)
    return answer

### Run the RAG App

Let's use an another LLM to generate a bunch of questions that will be passed into our rag workflow.

This question-generation LLM call will also be auto-instrumented, though there won't be any Ragas faithfulness evaluations tied to the call. For the `augmented_generation` LLM call, only ~50% of them have a ragas faithfulness score joined to them.

In [6]:
def generate_question():
    answer = (
        oai_client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {
                    "role": "user",
                    "content": "generate a question about how to setup & best use a SaaS tool to observe LLM-powered applications",
                }
            ],
        )
        .choices[0]
        .message.content
    )
    return answer


for i in range(50):
    question = generate_question()
    print(f"Question {i+1}: {question}")
    answer = ask_docs(question)
    print(f"Answer {i+1}: {answer}")

Question 1: How can one effectively set up and utilize a SaaS tool to observe and monitor applications powered by machine learning models (LLM) for optimal performance and efficiency?
Answer 1: To effectively set up and utilize a SaaS tool like LLM Observability by Datadog to monitor applications powered by machine learning models (LLM) for optimal performance and efficiency, you can follow these steps:

1. **Integration**: Begin by integrating your LLM applications with the monitoring tool. This may involve installing the necessary agents or SDKs provided by the monitoring tool within your application environment.

2. **Instrumentation**: Ensure that your machine learning models and LLM applications are properly instrumented to collect relevant metrics and logs. This could include metrics related to cost, latency, performance, usage trends, and other key indicators of application health.

3. **Dashboard Configuration**: Leverage the out-of-the-box dashboards provided by the monitoring

AttributeError: 'NoneType' object has no attribute 'choices'