## Using DeepEval with OpenAI

This guide will help you to evalaute LLM calls using OpenAI SDK, both as a standalone LLM call and as a part of LLM application. DeepEval's OpenAI integrations takes care of generating LLM spans for OpenAI SDK calls and it is fully compatible with the native `observe` decorator. 

In [None]:
!pip install openai -U deepeval ipywidgets

In [None]:
import os
os.environ["OPENAI_API_KEY"] = "<your-openai-api-key>"

In [None]:
from deepeval.openai import OpenAI
client = OpenAI()

### Evaluating OpenAI SDK as a standalone LLM call

There are 3 simple steps to evaluate OpenAI SDK as a standalone LLM call:

#### Create an evalaution dataset with goldens.

In [None]:
from deepeval.dataset import Golden, EvaluationDataset

goldens = [
    Golden(input="What are the top 5 most popular palces to eat in New York City?"),
    Golden(input="What is the weather in Paris, France?"),
]

dataset = EvaluationDataset(goldens=goldens)

#### Select the metrics to evaluate.

Note: The current integrations only supports metrics with input, output and tools called. This means that the only eligible metrics are those which have required arguments as `input`, `output` and `tools_called`. However you can still set the other test cases parameters like (`expected_output` or `context`) in the next step.

In [None]:
from deepeval.metrics import AnswerRelevancyMetric, BiasMetric

metrics = [AnswerRelevancyMetric(), BiasMetric()]

### Run the evals 

The `evals_iterator` from `EvaluationDataset` object returns a generator of goldens. You can iterate through the goldens and run the evals. If you want to set more parameters for the test cases, you can set them in the `LlmSpanContext` object.

In [None]:
from deepeval.tracing import trace, LlmSpanContext

for golden in dataset.evals_iterator():
    # run OpenAI client
    with trace(
        llm_span_context=LlmSpanContext(
            metrics=metrics,
            expected_output=golden.expected_output,
        )
    ):
        client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": golden.input}
            ],
        )

### Evaluating OpenAI as SDK as a part of LLM application

In [None]:
from deepeval.tracing import observe

@observe()
def retrieve_docs(query):
    return [
        "Paris is the capital and most populous city of France.",
        "It has been a major European center of finance, diplomacy, commerce, and science."
    ]

@observe()
def llm_app(input):
    with trace(
        llm_span_context=LlmSpanContext(
            metrics=[AnswerRelevancyMetric(), BiasMetric()],
        ),
    ):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": '\n'.join(retrieve_docs(input)) + "\n\nQuestion: " + input}
            ]
        )
    return response.choices[0].message.content

In [None]:
# Create dataset
dataset = EvaluationDataset(goldens=[
    Golden(input="What are the top 5 most popular palces to eat in New York City?"),
    Golden(input="What is the weather in Paris, France?"),
    ]
)

# Iterate through goldens
for golden in dataset.evals_iterator():
    # run your LLM application
    llm_app(input=golden.input)