# RAG Application

### Setup

Setup the environment variables, including a groq api key since I've used that.

In [1]:
#load environment variables
from dotenv import load_dotenv
load_dotenv()

True

### Simple RAG application

In [6]:
from langsmith import traceable
from typing import List
import nest_asyncio
from utils import get_vector_db_retriever
from langchain_groq import ChatGroq  # Groq LLM wrapper

MODEL_PROVIDER = "groq"
MODEL_NAME = "llama-3.1-8b-instant"   # free fast Groq model
APP_VERSION = 1.0

RAG_SYSTEM_PROMPT = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the latest question in the conversation. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.
"""

# Groq client
groq_client = ChatGroq(model=MODEL_NAME, temperature=0.0)

nest_asyncio.apply()
retriever = get_vector_db_retriever()

"""
retrieve_documents
- Returns documents fetched from a vectorstore based on the user's question
"""
@traceable(run_type="chain")
def retrieve_documents(question: str):
    return retriever.invoke(question)

"""
generate_response
- Calls `call_groq` to generate a model response after formatting inputs
"""
@traceable(run_type="chain")
def generate_response(question: str, documents):
    formatted_docs = "\n\n".join(doc.page_content for doc in documents)
    messages = [
        {"role": "system", "content": RAG_SYSTEM_PROMPT},
        {"role": "user", "content": f"Context: {formatted_docs} \n\n Question: {question}"}
    ]
    return call_groq(messages)

"""
call_groq
- Returns the chat completion output from Groq
"""
@traceable(run_type="llm")
def call_groq(
    messages: List[dict], model: str = MODEL_NAME, temperature: float = 0.0
) -> str:
    # ChatGroq uses LCEL invoke, not openai-style
    response = groq_client.invoke(messages)
    return response.content  # returns plain string

"""
langsmith_rag
- Calls `retrieve_documents` to fetch documents
- Calls `generate_response` to generate a response based on the fetched documents
- Returns the model response
"""
@traceable(run_type="chain")
def langsmith_rag(question: str):
    documents = retrieve_documents(question)
    response = generate_response(question, documents)
    return response


This should take a little less than a minute. We are indexing and storing LangSmith documentation in a SKLearn vector database.

In [7]:
question = "What is LangSmith used for?"
ai_answer = langsmith_rag(question, langsmith_extra={"metadata": {"website": "www.google.com"}})
print(ai_answer)

LangSmith is used for storing and processing trace data in a simple format. This allows for easy export and import of data. It is designed to collect data from various sources, including monitoring infrastructure.


MY OWN EXAMPLES:

In [8]:
question = "How does LangSmith help developers with their applications?"
ai_answer = langsmith_rag(question, langsmith_extra={"metadata": {"website": "www.google.com"}})
print(ai_answer)

LangSmith helps developers by providing a platform for building production-grade LLM applications, allowing them to monitor and evaluate their application with confidence. It also offers features such as tracing, evaluating, and testing prompts, which can help developers debug faster and build more reliable AI applications. Additionally, LangSmith provides automatic version control and collaboration features.


In [9]:
question = "What is tracing in LangSmith?"
ai_answer = langsmith_rag(question, langsmith_extra={"metadata": {"website": "www.google.com"}})
print(ai_answer)

Tracing in LangSmith refers to the process of logging and storing data about the execution of a program or workflow. This data is used to understand and analyze the flow of a program, identify performance issues, and optimize its execution. LangSmith uses OpenTelemetry to automatically log traces from a CrewAI workflow with minimal changes to the existing code.
