# LLM Evals

Here we explore a basic introduction into using [LangSmith](https://docs.smith.langchain.com/) for LLM evaluations. Via LangSmith you can evaluate both LLMs and RAG applications. In this case we'll follow the starter guide and dissect it together: https://docs.smith.langchain.com/. For API key setup please follow the documentation here: https://smith.langchain.com/o/0ab3186a-c8bd-52b6-bc67-11d9436130bb/settings.


In [None]:
#!pip install -U langsmith
!pip install openai

Collecting openai
  Downloading openai-1.50.2-py3-none-any.whl.metadata (24 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Downloading openai-1.50.2-py3-none-any.whl (382 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m383.0/383.0 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (318 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m318.9/318.9 kB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: jiter, openai
Successfully installed jiter-0.5.0 openai-1.50.2


## OpenAI Setup

In [None]:
import os

# Set environment variables in Python
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = ""
os.environ["OPENAI_API_KEY"] = ""

# Verify if the environment variables are set correctly
print("LANGCHAIN_TRACING_V2:", os.getenv("LANGCHAIN_TRACING_V2"))
print("LANGCHAIN_API_KEY:", os.getenv("LANGCHAIN_API_KEY"))
print("OPENAI_API_KEY:", os.getenv("OPENAI_API_KEY"))

LANGCHAIN_TRACING_V2: true
LANGCHAIN_API_KEY: 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
OPENAI_API_KEY: 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


In [None]:
import openai
from langsmith.wrappers import wrap_openai
from langsmith import traceable

# Auto-trace LLM calls in-context
client = wrap_openai(openai.Client())

@traceable # Auto-trace this function
def pipeline(user_input: str):
    result = client.chat.completions.create(
        messages=[{"role": "user", "content": user_input}],
        model="gpt-3.5-turbo"
    )
    return result.choices[0].message.content

pipeline("Hello, world!")
# Out:  Hello there! How can I assist you today?

'Hello there! How can I assist you today?'

## Evaluation Run

In [None]:
from langsmith import Client
from langsmith.evaluation import evaluate

client = Client()

# Define dataset: these are your test cases
dataset_name = "Sample Dataset"
dataset = client.create_dataset(dataset_name, description="A sample dataset in LangSmith.")
client.create_examples(
    inputs=[
        {"postfix": "to LangSmith"},
        {"postfix": "to Evaluations in LangSmith"},
    ],
    outputs=[
        {"output": "Welcome to LangSmith"},
        {"output": "Welcome to Evaluations in LangSmith"},
    ],
    dataset_id=dataset.id,
)

# Define your evaluator
def exact_match(run, example):
    return {"score": run.outputs["output"] == example.outputs["output"]}

experiment_results = evaluate(
    lambda input: "Welcome " + input['postfix'], # Your AI system goes here
    data=dataset_name, # The data to predict and grade over
    evaluators=[exact_match], # The evaluators to score the results
    experiment_prefix="sample-experiment", # The name of the experiment
    metadata={
      "version": "1.0.0",
      "revision_id": "beta"
    },
)

View the evaluation results for experiment: 'sample-experiment-93ec6f81' at:
https://smith.langchain.com/o/0ab3186a-c8bd-52b6-bc67-11d9436130bb/datasets/92e1360e-6ddd-4b1f-b718-2c20ac3441f2/compare?selectedSessions=d1517b4b-83af-409b-992a-ce495f815068




0it [00:00, ?it/s]