In [1]:
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
import uuid
thread_id = uuid.uuid4()

In [4]:
from langsmith import traceable
from openai import OpenAI
from typing import List
import nest_asyncio
from utils import get_vector_db_retriever

openai_client = OpenAI()
nest_asyncio.apply()
retriever = get_vector_db_retriever()

@traceable(run_type="chain")
def retrieve_documents(question: str):
    return retriever.invoke(question)

@traceable(run_type="chain")
def generate_response(question: str, documents):
    formatted_docs = "\n\n".join(doc.page_content for doc in documents)
    rag_system_prompt = """You are an assistant for question-answering tasks. 
    Use the following pieces of retrieved context to answer the latest question in the conversation. 
    If you don't know the answer, just say that you don't know. 
    Use three sentences maximum and keep the answer concise.
    """
    messages = [
        {
            "role": "system",
            "content": rag_system_prompt
        },
        {
            "role": "user",
            "content": f"Context: {formatted_docs} \n\n Question: {question}"
        }
    ]
    return call_openai(messages)

@traceable(run_type="llm")
def call_openai(
    messages: List[dict], model: str = "gpt-4o-mini", temperature: float = 0.0
) -> str:
    return openai_client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
    )

@traceable(run_type="chain")
def langsmith_rag(question: str):
    documents = retrieve_documents(question)
    response = generate_response(question, documents)
    return response.choices[0].message.content

In [6]:
question = "To start, what is the primary purpose of LangSmith, and what main problem does it solve for developers working with LLMs?"
ai_answer = langsmith_rag(question, langsmith_extra={"metadata": {"thread_id": thread_id}})
print(ai_answer)

The primary purpose of LangSmith is to provide a platform for building production-grade LLM applications, focusing on observability and evaluation. It helps developers monitor and evaluate their applications to ensure reliability and confidence in deployment. By offering tools for tracing, evaluating, and testing prompts, it addresses the challenges of debugging and improving AI applications over time.


In [7]:
question = "You mentioned it helps with observability. Could you elaborate on what a 'trace' in LangSmith actually captures from an LLM application's execution?"
ai_answer = langsmith_rag(question, langsmith_extra={"metadata": {"thread_id": thread_id}})
print(ai_answer)

In LangSmith, a 'trace' captures detailed information about the execution of an LLM application, including inputs, outputs, and the sequence of function calls. It allows for monitoring specific steps, such as retrieval and generation, and can include user feedback attached to any child run of the trace. This comprehensive logging aids in evaluating and optimizing the application's performance.


In [8]:
question = "So, if a trace shows the entire lifecycle of a request, how can I use that information to specifically debug a RAG (Retrieval-Augmented Generation) pipeline that's returning irrelevant information?"
ai_answer = langsmith_rag(question, langsmith_extra={"metadata": {"thread_id": thread_id}})
print(ai_answer)

To debug a RAG pipeline returning irrelevant information, you can filter the traces to focus on runs where user feedback indicates dissatisfaction or where specific tool calls were invoked. Analyze the filtered traces to identify patterns or errors in the retrieval or generation stages. Additionally, you can examine the metadata associated with these runs to understand the context and improve the relevance of the responses.


In [9]:
question = "Besides debugging, how does LangSmith help me evaluate and monitor the performance of my application over time, especially after I've deployed it to production?"
ai_answer = langsmith_rag(question, langsmith_extra={"metadata": {"thread_id": thread_id}})
print(ai_answer)

LangSmith helps evaluate and monitor application performance by providing a Usage Graph that tracks tracing spend by workspace (tenant_id), allowing you to analyze costs associated with data retention. It also offers insights into data retention settings, enabling you to optimize them for cost-effectiveness without losing historical observability. By managing retention defaults at both the organization and project levels, you can ensure efficient data handling as your application scales.


In [10]:
question = "You mentioned creating datasets for evaluation. Can I use the traces I've already collected in LangSmith to automatically create these datasets, or do I need to build them from scratch?"
ai_answer = langsmith_rag(question, langsmith_extra={"metadata": {"thread_id": thread_id}})
print(ai_answer)

Yes, you can use the traces you've already collected in LangSmith to automatically create datasets. You can filter interesting traces based on evaluation criteria and add them to a dataset without needing to build them from scratch. This allows for efficient dataset creation using existing data.
