# LLMOps with DeepEval

LLMOps covers:
1. **Testing**
2. **Evaluation**
3. **Monitoring**
4. Infrastructure
5. Orchestration
6. Deployment

We will discuss how to use the [DeepEval library](https://deepeval.com/) (and Confident-AI platform) to help you with 1,2 and 3.

**Resources**
- https://github.com/confident-ai/deepeval/blob/main/examples/notebooks/langgraph.ipynb


In [1]:
import os
OPEN_API_KEY = os.environ["OPENAI_API_KEY"]

## 1. Quickstart

In [2]:
from deepeval import assert_test
from deepeval.test_case import LLMTestCase, LLMTestCaseParams
from deepeval.metrics import GEval

def test_correctness():
    correctness_metric = GEval(
        name="Correctness",
        criteria="Determine if the 'actual output' is correct based on the 'expected output'.",
        evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],
        threshold=0.5
    )
    test_case = LLMTestCase(
        input="I have a persistent cough and fever. Should I be worried?",
        # Replace this with the actual output from your LLM application
        actual_output="A persistent cough and fever could be a viral infection or something more serious. See a doctor if symptoms worsen or don't improve in a few days.",
        expected_output="A persistent cough and fever could indicate a range of illnesses, from a mild viral infection to more serious conditions like pneumonia or COVID-19. You should seek medical attention if your symptoms worsen, persist for more than a few days, or are accompanied by difficulty breathing, chest pain, or other concerning signs."
    )
    assert_test(test_case, [correctness_metric])

In [3]:
import os
OPEN_API_KEY = os.environ["OPENAI_API_KEY"]

## 1. RAG Agent

In [4]:
!curl -o src/manual.txt "https://confident-bucket.s3.us-east-1.amazonaws.com/manual.txt"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 18482  100 18482    0     0  43839      0 --:--:-- --:--:-- --:--:-- 43900


In [6]:
from src.langgraph_rag_agent import app

In [8]:
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage

initial_state = {
    "query": "",
    "selected_tools": [],
    "retrieved_context": "",
    "tool_outputs": [],
    "next_action": "",
}


def run_rag_query(query: str):
    """Run a query through the RAG system"""

    initial_state["messages"] = [HumanMessage(content=query)]
    result = app.invoke(initial_state)
    final_message = result["messages"][-1]
    return final_message.content


run_rag_query("What is the average heart rate of the user?")

'The average heart rate of the user is 94 beats per minute.'

In [9]:
run_rag_query("What are the top tips for fitness?")

"While the context provided does not contain specific tips for fitness, I can offer some general advice based on widely accepted fitness principles:\n\n1. **Set Clear Goals**: Define what you want to achieve, whether it's weight loss, muscle gain, improved endurance, or overall health.\n\n2. **Create a Balanced Routine**: Incorporate a mix of cardiovascular exercise, strength training, and flexibility workouts. This helps improve overall fitness and reduces the risk of injury.\n\n3. **Stay Consistent**: Regular exercise is key to achieving and maintaining fitness. Aim for at least 150 minutes of moderate aerobic activity or 75 minutes of vigorous activity each week, along with strength training on two or more days.\n\n4. **Listen to Your Body**: Pay attention to how your body feels during and after workouts. Rest and recovery are just as important as the workouts themselves.\n\n5. **Stay Hydrated**: Drink plenty of water before, during, and after exercise to maintain hydration and supp

In [16]:
from deepeval.integrations.langchain import CallbackHandler
from deepeval.metrics import TaskCompletionMetric


def run_rag_query(query: str):
    """Run a query through the RAG system"""

    initial_state["messages"] = [HumanMessage(content=query)]

    result = app.invoke(
        initial_state,
        config={
            "callbacks": [
                CallbackHandler(
                    metrics=[
                        TaskCompletionMetric(strict_mode=True, async_mode=False)
                    ]
                )
            ]  # pass the metrics to the callback handler
        },
    )

    final_message = result["messages"][-1]
    return final_message.content


In [24]:
from deepeval.dataset import EvaluationDataset, Golden

dataset = EvaluationDataset(goldens=[
    Golden(input="How many steps should I take a day?"),
    Golden(input="What are some ways to get more steps in on a day?"),
    Golden(input="What is a good resting heart rate?"),
    Golden(input="What range of heart rate is normal?")
])

# Need a corporate email for this to work ... annoying.
# dataset = EvaluationDataset()
# dataset.pull(alias="health_rag_queries")

In [18]:
for golden in dataset.evals_iterator():
    run_rag_query(golden.input)

Output()

IndexError: list index out of range

In [22]:
for golden in dataset.evals_iterator():
    print(golden)

TypeError: 'generator' object is not subscriptable

TypeError: object of type 'generator' has no len()