# RAG Application

![Simple RAG](../../images/simple_rag.png)

In this notebook, we're going to set up a simple RAG application that we'll be using as we learn more about LangSmith.

RAG (Retrieval Augmented Generation) is a popular technique for providing LLMs with relevant documents that will enable them to better answer questions from users. 

In our case, we are going to index some LangSmith documentation!

LangSmith makes it easy to trace any LLM application, no LangChain required!

### Setup

Make sure you set your environment variables, including your OpenAI API key.

In [1]:
# You can set them inline!
import os
os.environ["OPENAI_API_KEY"] = ""
os.environ["LANGSMITH_API_KEY"] = ""
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_PROJECT"] = "langsmith-academy"

In [1]:
# Or you can use a .env file
from dotenv import load_dotenv
load_dotenv(dotenv_path="../../.env", override=True)

True

### Simple RAG application

In [2]:
from langsmith import traceable
from openai import OpenAI
from typing import List
import nest_asyncio
from utils import get_vector_db_retriever

MODEL_PROVIDER = "openai"
MODEL_NAME = "gpt-4o-mini"
APP_VERSION = 1.0
RAG_SYSTEM_PROMPT = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the latest question in the conversation. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.
"""

openai_client = OpenAI()
nest_asyncio.apply()
retriever = get_vector_db_retriever()

"""
retrieve_documents
- Returns documents fetched from a vectorstore based on the user's question
"""
@traceable(run_type="chain")
def retrieve_documents(question: str):
    return retriever.invoke(question)

"""
generate_response
- Calls `call_openai` to generate a model response after formatting inputs
"""
@traceable(run_type="chain")
def generate_response(question: str, documents):
    formatted_docs = "\n\n".join(doc.page_content for doc in documents)
    messages = [
        {
            "role": "system",
            "content": RAG_SYSTEM_PROMPT
        },
        {
            "role": "user",
            "content": f"Context: {formatted_docs} \n\n Question: {question}"
        }
    ]
    return call_openai(messages)

"""
call_openai
- Returns the chat completion output from OpenAI
"""
@traceable(run_type="llm")
def call_openai(
    messages: List[dict], model: str = MODEL_NAME, temperature: float = 0.0
) -> str:
    return openai_client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
    )

"""
langsmith_rag
- Calls `retrieve_documents` to fetch documents
- Calls `generate_response` to generate a response based on the fetched documents
- Returns the model response
"""
@traceable(run_type="chain")
def langsmith_rag(question: str):
    documents = retrieve_documents(question)
    response = generate_response(question, documents)
    return response.choices[0].message.content


USER_AGENT environment variable not set, consider setting it to identify your requests.
Fetching pages: 100%|##########| 197/197 [01:10<00:00,  2.78it/s]


This should take a little less than a minute. We are indexing and storing LangSmith documentation in a SKLearn vector database.

In [3]:
question = "What is LangSmith used for?"
ai_answer = langsmith_rag(question, langsmith_extra={"metadata": {"website": "www.google.com"}})
print(ai_answer)

LangSmith is a platform for building production-grade LLM applications, allowing users to monitor and evaluate their applications for reliability. It provides tools for tracing application requests, measuring application quality over time, and testing prompts with version control and collaboration features. LangSmith is framework agnostic, compatible with or without LangChain’s open-source frameworks.


In [4]:
question="In a complex agent application, if a specific tool call fails deep within a chain, how does LangSmith's tracing structure, specifically its Run Tree, enable a developer to quickly pinpoint the failure, examine the exact inputs to the failing component, and isolate the cause without modifying the application's runtime code?" #A question which is related to the langsmith documentation
ai_answer = langsmith_rag(question, langsmith_extra={"metadata": {"website": "www.google.com"}})
print(ai_answer)

LangSmith's tracing structure, particularly the Run Tree, allows developers to visualize the sequence of operations and their hierarchical relationships within the application. When a tool call fails, the Run Tree provides a clear path to the specific run associated with that failure, enabling developers to examine the inputs and outputs at each step leading up to the error. This detailed tracing facilitates quick identification and isolation of the failure's cause without the need to alter the application's runtime code.


### Let's take a look in LangSmith!