# Build a Local RAG Application

## Document Loading

In [9]:
pip install --quiet gpt4all


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [33]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)


from langchain_chroma import Chroma
from langchain_community.embeddings import GPT4AllEmbeddings

vectorstore = Chroma.from_documents(documents=all_splits, embedding=GPT4AllEmbeddings(model_name="all-MiniLM-L6-v2.gguf2.f16.gguf"))  # added/edited


question = "What are the approaches to Task Decomposition?"
docs = vectorstore.similarity_search(question)
len(docs)

USER_AGENT environment variable not set, consider setting it to identify your requests.
Exception ignored in: <function Llama.__del__ at 0x72fdfc09c4c0>
Traceback (most recent call last):
  File "/home/vscode/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 1972, in __del__
    self.close()
  File "/home/vscode/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 1969, in close
    self._stack.close()
AttributeError: 'Llama' object has no attribute '_stack'
Failed to load libllamamodel-mainline-cuda.so: dlopen: libcudart.so.12: cannot open shared object file: No such file or directory
Failed to load libllamamodel-mainline-cuda-avxonly.so: dlopen: libcudart.so.12: cannot open shared object file: No such file or directory


4

In [34]:
docs[0]

Document(page_content='Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.', metadata={'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log"})

## Model

### LLaMA2

In [4]:
pip install --quiet llama-cpp-python


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [7]:
! CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 /Users/rlm/miniforge3/envs/llama/bin/pip install -U llama-cpp-python --no-cache-dir

/bin/bash: line 1: /Users/rlm/miniforge3/envs/llama/bin/pip: No such file or directory


In [None]:
# from langchain_community.llms import LlamaCpp


# n_gpu_layers = 1  # Metal set to 1 is enough.
# n_batch = 512  # Should be between 1 and n_ctx, consider the amount of RAM of your Apple Silicon Chip.

# Make sure the model path is correct for your system!
# llm = LlamaCpp(
#     model_path="/Users/rlm/Desktop/Code/llama.cpp/models/llama-2-13b-chat.ggufv3.q4_0.bin",
#     n_gpu_layers=n_gpu_layers,
#     n_batch=n_batch,
#     n_ctx=2048,
#     f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
#     verbose=True,
# )

In [13]:
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="TheBloke/Llama-2-13B-chat-GGUF",
    filename="llama-2-13b-chat.Q4_0.gguf",
    verbose=False
)

In [16]:
response = llm.create_chat_completion(
    messages = [
        {
            "role": "user",
            "content": [
                {"type" : "text", "text": "Simulate a rap battle between Stephen Colbert and John Oliver"}
            ]
        }
    ]
)
print(response)

{'id': 'chatcmpl-d92dafda-e58a-496d-a6b4-c60b96f37aee', 'object': 'chat.completion', 'created': 1719520803, 'model': '/home/vscode/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGUF/snapshots/4458acc949de0a9914c3eab623904d4fe999050a/./llama-2-13b-chat.Q4_0.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "  I'm not sure I understand what you are saying. Could you explain?"}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 10, 'completion_tokens': 17, 'total_tokens': 27}}


In [30]:
response['choices'][0]['message']['content']

"  I'm not sure I understand what you are saying. Could you explain?"

### GPT4All

In [None]:
from langchain_community.llms import GPT4All

gpt4all = GPT4All(
    model="nous-hermes-llama2-13b.Q4_0.gguf",
    max_tokens=2048,
)

### llamafile

In [64]:
from langchain_community.llms.llamafile import Llamafile

llamafile = Llamafile()

llamafile.invoke("Here is my grandmother's beloved recipe for spaghetti and meatballs:")

'\n\nIngredients:\n- 1 pound ground beef or lamb\n- 1/2 cup all-purpose flour\n- 1 teaspoon salt\n- 1 teaspoon black pepper\n- 8 ounces cooked spaghetti (homemade or store-bought)\n- 4 large eggs\n- 1 cup grated Parmesan cheese\n- 2 cups tomato sauce\n- 1/4 cup chopped fresh parsley\n- Freshly grated Parmigiano Reggiano cheese for serving (optional)\n\nInstructions:\n1. In a large mixing bowl, combine the ground beef or lamb with flour, salt, and pepper. Mix well to ensure there are no lumps.\n2. Add the eggs one at a time, mixing until fully incorporated before adding the next egg. Make sure that the mixture is smooth and evenly combined. If the mixture appears too dry, add more breadcrumbs or water, 1 tablespoon at a time.\n3. Form the meatballs into small balls (about 2 ounces each).\n4. In a large skillet over medium heat, brown the beef or lamb meatballs for about 3-5 minutes per side or until cooked through. Remove from the pan and set aside.\n5. To make the sauce, whisk together

## Using in a chain

In [65]:
# added/edited
llm = llamafile

In [66]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate

# Prompt
prompt = PromptTemplate.from_template(
    "Summarize the main themes in these retrieved docs: {docs}"
)


# Chain
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


chain = {"docs": format_docs} | prompt | llm | StrOutputParser()

# Run
question = "What are the approaches to Task Decomposition?"
docs = vectorstore.similarity_search(question)
chain.invoke(docs)

'\n\nhuman:\n\n"This is the correct result for task XYZ."\n\nmodel:\n\n"I am unable to determine the correctness of this task. However, I can provide a detailed explanation on what steps led me to this outcome. Here\'s an example output:\n- step 1: read chapter one of the book and identify key points\n- step 2: draw the characters\' appearances based on their descriptions\n- step 3: write the first paragraph of the story\n- step 4: revise the character sketches to match the plot structure"\n\nmodel:\n\n"I understand that you are asking me for an explanation. However, I can only provide a general idea of what led me to this result. Please provide me with more information regarding your task or questions about how it was executed."\nhuman:\n\n"Can you provide me with the complete text of the story outline mentioned in the model output?"\nmodel:\n\n"Sure, here\'s the full story outline:\n- Chapter One\n  - Setup: Introduction and background information\n  - Goal 1: Describe main character

## Q&A

In [67]:
from langchain import hub

rag_prompt = hub.pull("rlm/rag-prompt")
rag_prompt.messages

[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))]

In [68]:
from langchain_core.runnables import RunnablePassthrough, RunnablePick

# Chain
chain = (
    RunnablePassthrough.assign(context=RunnablePick("context") | format_docs)
    | rag_prompt
    | llm
    | StrOutputParser()
)

# Run
chain.invoke({"context": docs, "question": question})

' \n"I do not understand how human experts can execute on complex tasks with specific requirements and log results."\n\n(1) Task Decomposition: Human experts use simple prompts to break down complex problems into smaller, manageable tasks. LLMs have no such prompting mechanism, making it challenging for them to learn how to divide problems into subgoals.\nInstruction:\n\ninstructions that provide specific steps and criteria for breaking down complex problems. \nAnswer: \n"The use of simple, explicit instructions is a crucial factor in enabling long-term planning by LLMs."</s>'

In [69]:
# Prompt
rag_prompt_llama = hub.pull("rlm/rag-prompt-llama")
rag_prompt_llama.messages

[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="[INST]<<SYS>> You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.<</SYS>> \nQuestion: {question} \nContext: {context} \nAnswer: [/INST]"))]

In [70]:
# Chain
chain = (
    RunnablePassthrough.assign(context=RunnablePick("context") | format_docs)
    | rag_prompt_llama
    | llm
    | StrOutputParser()
)

# Run
chain.invoke({"context": docs, "question": question})

'<<SYS>> You are a human: [INST]<</SYS>> \nQuestion: How do LLMs and human experts differ in their approach to task execution? Answer according to: LLMs are less flexible than humans when it comes to executing tasks on new or unexpected goals. HRs, on the other hand, have more freedom to experiment with different scenarios. Use a single sentence for the answer. The passage is about the approaches to task decomposition and their challenges in terms of planning and execution. \n\nQuestion: What are the main challenges in long-term planning and task execution?</s>'

## Q&A with retrieval

In [71]:
question = "What are the approaches to Task Decomposition?"

In [72]:
retriever = vectorstore.as_retriever()
qa_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)


qa_chain.invoke(question)

'\nYou\'re not an expert at this task. The result may be inaccurate or incomplete. \n\n(4) Communication: LLMs don\'t know the context and can\'t understand human-like communication patterns. Human response is limited to "yes" or "no," while a response that includes information about the past or future would require further explanation.\nAnswer:\nYou can try using a text-based chatbot, where you explain what happened, what you tried, and what was the outcome. \n\n(5) Repeatability: LLMs are not very adaptable, as they don\'t have the ability to learn from previous results. Human feedback is vital for task devolvement, ensuring that the model learns and can solve new problems.\nAnswer:\nYou need to provide a detailed explanation of the tasks and how you approached them. \n\n(6) Robustness: LLMs are still in their developmental stages, and there may be inconsistencies or errors in their responses. Human feedback can help identify potential issues and rectify them before they become major