LangSmith is a tool by LangChain that helps you track, monitor, and debug your AI language model workflows. It records the inputs, outputs, and errors of your models so you can analyze their performance, find issues, and improve your prompts or applications. Think of it as an easy way to understand and optimize how your AI behaves in real time.

Smart RAG

Loads blog content and splits it into chunks.
Tags chunks as beginning, middle, or end.
Uses Gemini to:
Analyze the question and predict which section to focus on.
Retrieve only relevant chunks from that section.
Generate a final answer using the filtered content.
It’s smart because it adds context-aware filtering before answering, making responses more accurate.

Loading Env and Initializing Models

In [1]:
import os
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_core.documents import Document

# Load environment variables
load_dotenv()
google_api_key = os.getenv("GOOGLE_API_KEY")
assert google_api_key, "GOOGLE_API_KEY not found in .env"

# LangSmith optional
os.environ["LANGSMITH_TRACING"] = "true"

# Gemini LLM
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    google_api_key=google_api_key
)

# Embedding model
embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key=google_api_key
)

# Create an empty vector store (we'll add docs later)
vector_store = InMemoryVectorStore(embedding=embeddings)


 Setting USER_AGENT and Loading Web Data for RAG

In [2]:
# Set USER_AGENT
os.environ["USER_AGENT"] = "my-langchain-rag-app/1.0"

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing import Literal, Annotated
from typing_extensions import TypedDict, List

# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Add section metadata for smarter filtering
total_documents = len(all_splits)
third = total_documents // 3
for i, document in enumerate(all_splits):
    if i < third:
        document.metadata["section"] = "beginning"
    elif i < 2 * third:
        document.metadata["section"] = "middle"
    else:
        document.metadata["section"] = "end"

# Add documents to vector store
_ = vector_store.add_documents(documents=all_splits)


Setting Up Prompt and Defining Application Logic with LangGraph


In [3]:
# Pull RAG prompt
prompt = hub.pull("rlm/rag-prompt")

# Define structured query schema
class Search(TypedDict):
    query: Annotated[str, ..., "Search query to run."]
    section: Annotated[Literal["beginning", "middle", "end"], ..., "Section to query."]

# Define state structure
class State(TypedDict):
    question: str
    query: Search
    context: List[Document]
    answer: str

# Step 1: Analyze query and extract structured search info
def analyze_query(state: State):
    structured_llm = llm.with_structured_output(Search)
    query = structured_llm.invoke(state["question"])
    return {"query": query}

# Step 2: Retrieve documents based on query and section
def retrieve(state: State):
    query = state["query"]
    retrieved_docs = vector_store.similarity_search(
        query["query"],
        filter=lambda doc: doc.metadata.get("section") == query["section"]
    )
    return {"context": retrieved_docs}

# Step 3: Generate answer from context
def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}

# Build graph with smart steps
graph_builder = StateGraph(State).add_sequence([analyze_query, retrieve, generate])
graph_builder.add_edge(START, "analyze_query")
graph = graph_builder.compile()

# Run test question
response = graph.invoke({"question": "What is Task Decomposition?"})
print(response["answer"])


Task decomposition involves breaking down complex tasks into smaller, simpler steps. This can be achieved through prompting the model to "think step by step" or by using task-specific instructions. It enables efficient handling of complex tasks.
