# Retrieval Augmented Generation application

Use a LLM for a RAG application. 
The document parsed by the application is 

https://lilianweng.github.io/posts/2023-06-23-agent/

the document is stored in a docker container using PGVector

In [2]:
# check token connection
from dotenv import load_dotenv
load_dotenv(override=True)

True

In [3]:
# import chat model, embedding model and vectorstore
from langchain.chat_models import init_chat_model
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_postgres import PGVector

In [4]:
import bs4 # documents handler
from langchain import hub # prompt
from langchain_community.document_loaders import WebBaseLoader # load documents 
from langchain_core.documents import Document # document class for Status
from langchain_text_splitters import RecursiveCharacterTextSplitter # to split documents in chunks
from langgraph.graph import START, StateGraph # to create a graph
from typing_extensions import List, TypedDict # List and Dictionary types. For Status. 

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [5]:
# import chat model and embedder
llm = init_chat_model("gemini-2.0-flash", model_provider="google_genai")
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

In [6]:
# PGvector on docker container
connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain"  # Uses psycopg3!
collection_name = "my_docs"

vector_store = PGVector(
    embeddings=embeddings,
    collection_name=collection_name,
    connection=connection,
    use_jsonb=True,
)

In [7]:
# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Index chunks
# _ = vector_store.add_documents(documents=all_splits)
vector_store.add_documents(documents=all_splits) # don't want to store

# Define prompt for question-answering
# N.B. for non-US LangSmith endpoints, you may need to specify
# api_url="https://api.smith.langchain.com" in hub.pull.
prompt = hub.pull("rlm/rag-prompt")

# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

# Define application steps (nodes of the graph)
def retrieve(state: State):
    # given a state, looks at "question" and retrieve a context form the documents chunks that 
    # (if possible) contains the answer.
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    # given a state, looks at "question" and "context" to create a prompt. Prompt is given to the LLM model
    # to generate the answer. Anwer will populate "answer" key of state.
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

In [8]:
response = graph.invoke({"question": "What is Task Decomposition?"})
print(response["answer"])

Task decomposition involves breaking down a complex task into smaller, simpler steps. This is often achieved by instructing a model to "think step by step". Task decomposition transforms big tasks into multiple manageable tasks and sheds light into an interpretation of the model’s thinking process.
