# LangSmith

LangSmith is an observability and evaluation platform for applications using LLM, built by the LangChain team. It helps you debug, monitor, measure, evaluate, and optimize entire LLM pipeline—from prompt, model, retriever, tool, agent, to the entire workflow.

### Setting up LangSmith

When using LangSmith, we need to setup our environment variables and provide our API key

In [None]:
import os
from getpass import getpass

os.environ["LANGSMITH_API_KEY"] = os.getenv("LANGSMITH_API_KEY") or \
    getpass("Enter LangSmith API Key: ")

# Keep this enabled to use LangChain's tracing features
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_ENDPOINT"] = "https://eu.api.smith.langchain.com"
# Project name for LangSmith
os.environ["LANGSMITH_PROJECT"] = "langsmith_testing"

In most cases, this is all we need to start seeing logs and traces in the LangSmith UI. By default, LangChain will trace LLM calls, chains, etc. We'll take a look at a quick example of this below.

# Default Tracing

As mentioned, LangSmith traces a lot of data without us needing to do anything.

In [None]:
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or getpass(
    "Enter OpenAI API Key: "
)

openai_model = "gpt-4o-mini"

In [6]:
from langchain_openai import ChatOpenAI

# For normal accurate responses
llm = ChatOpenAI(temperature=0.0, model=openai_model)

In [None]:
# Output Error but testing LangSmith tracing
llm.invoke("Hello")

RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

By default, LangSmith will capture plenty — however, it won't capture functions from outside of LangChain.

### Tracing Non-LangChain Code

LangSmith can trace functions that are not part of LangChain, we just need to add the @traceable decorator. Let's try this for a few simple functions.

In [8]:
from langsmith import traceable
import random
import time


@traceable
def generate_random_number():
    return random.randint(0, 100)

@traceable
def generate_string_delay(input_str: str):
    number = random.randint(1, 5)
    time.sleep(number)
    return f"{input_str} ({number})"

@traceable
def random_error():
    number = random.randint(0, 1)
    if number == 0:
        raise ValueError("Random error")
    else:
        return "No error"

Let's run these a few times and see what happens.

In [9]:
from tqdm.auto import tqdm

for _ in tqdm(range(10)):
    generate_random_number()
    generate_string_delay("Hello")
    try:
        random_error()
    except ValueError:
        pass

100%|██████████| 10/10 [00:33<00:00,  3.31s/it]


OpenRouter test

In [None]:
os.environ["OPENROUTER_API_KEY"] = os.getenv("OPENROUTER_API_KEY") or getpass(
    "Enter OpenRouter API Key: "
)

openrouter_model = "openai/gpt-oss-120b:free"

In [11]:
llm = ChatOpenAI(temperature=0.0, model=openrouter_model, api_key=openrouter, base_url="https://openrouter.ai/api/v1")

In [12]:
llm.invoke("Hello")

AIMessage(content='Hello! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 41, 'prompt_tokens': 70, 'total_tokens': 111, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 27, 'rejected_prediction_tokens': None, 'image_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0, 'video_tokens': 0}, 'cost': 0, 'is_byok': False, 'cost_details': {'upstream_inference_cost': None, 'upstream_inference_prompt_cost': 0, 'upstream_inference_completions_cost': 0}}, 'model_name': 'openai/gpt-oss-120b:free', 'system_fingerprint': None, 'id': 'gen-1765517562-c9c7RblEeNnlKLmqlb6S', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--d39151f3-9a34-453b-96d3-4944c484e24a-0', usage_metadata={'input_tokens': 70, 'output_tokens': 41, 'total_tokens': 111, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'rea

In [13]:
# Implement with tracable function
@traceable
def run():
    llm.invoke("Hello")

run()

Those traces should now be visible in the LangSmith UI, again under the same project:

We can various metrics here for each run. First, ofcourse, the run name. We can see any inputs and outputs from each run, we can see if the run raised any errors, it's start time, and latency. Inside the UI we can also filter for specific runs

Finally, we can also modify our traceable names if we'd like to make them more readable inside the UI. For example:

In [14]:
@traceable(name="Chitchat Maker")
def error_generation_function(question: str):
    delay = random.randint(0, 3)
    time.sleep(delay)
    number = random.randint(0, 1)
    if number == 0:
        raise ValueError("Random error")
    else:
        return "I'm great how are you?"

In [15]:
for _ in tqdm(range(10)):
    try:
        error_generation_function("How are you today?")
    except ValueError:
        pass

100%|██████████| 10/10 [00:18<00:00,  1.81s/it]
