# libraries
pip install langchain-community
pip install bs4
pip install langchain-huggingface
pip install -qU langchain-community faiss-cpu
pip install "transformers[torch]"
pip install tf-keras
pip install transformers

Detailed walkthrough
Let’s go through the above code step-by-step to really understand what’s going on.

# 1. Indexing: Load
We need to first load the blog post contents. We can use DocumentLoaders for this, which are objects that load in data from a source and return a list of Documents. A Document is an object with some page_content (str) and metadata (dict).

In this case we’ll use the WebBaseLoader, which uses urllib to load HTML from web URLs and BeautifulSoup to parse it to text. We can customize the HTML -> text parsing by passing in parameters to the BeautifulSoup parser via bs_kwargs (see BeautifulSoup docs). In this case only HTML tags with class “post-content”, “post-title”, or “post-header” are relevant, so we’ll remove all others.

In [1]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

len(docs[0].page_content)

USER_AGENT environment variable not set, consider setting it to identify your requests.


43130

43130 number of characters

In [None]:
print(docs[0].page_content[:500])



      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In


# 2. Indexing: Split
Our loaded document is over 42k characters long. This is too long to fit in the context window of many models. Even for those models that could fit the full post in their context window, models can struggle to find information in very long inputs.

To handle this we’ll split the Document into chunks for embedding and vector storage. This should help us retrieve only the most relevant bits of the blog post at run time.

In this case we’ll split our documents into chunks of 1000 characters with 200 characters of overlap between chunks. The overlap helps mitigate the possibility of separating a statement from important context related to it. We use the RecursiveCharacterTextSplitter, which will recursively split the document using common separators like new lines until each chunk is the appropriate size. This is the recommended text splitter for generic text use cases.

We set add_start_index=True so that the character index at which each split Document starts within the initial Document is preserved as metadata attribute “start_index”.

In [2]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
# into chunks of 1000 characters with 200 characters of overlap between chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)

len(all_splits)

66

The document was split into 66 chunks total.

In [4]:
len(all_splits[0].page_content)

969

The first chunk contains 969 characters — close to the chunk_size=1000

In [5]:
all_splits[10].metadata

{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
 'start_index': 7055}

"Give me the 11th chunk metadata"

The chunk came from the same original URL

It starts at character 7056 in the original document

# 3. Indexing: Store
Now we need to index our 66 text chunks so that we can search over them at runtime. The most common way to do this is to embed the contents of each document split and insert these embeddings into a vector database (or vector store). When we want to search over our splits, we take a text search query, embed it, and perform some sort of “similarity” search to identify the stored splits with the most similar embeddings to our query embedding. The simplest similarity measure is cosine similarity — we measure the cosine of the angle between each pair of embeddings (which are high dimensional vectors).

We can embed and store all of our document splits in a single command using the FAISS vector store and All-MiniLM-L6-v2 model from Hugging Face.

In [3]:
from langchain.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

vectorstore = FAISS.from_documents(documents=all_splits, embedding=embedding_model)


  from .autonotebook import tqdm as notebook_tqdm





This completes the Indexing portion of the pipeline. At this point we have a query-able vector store containing the chunked contents of our blog post. Given a user question, we should ideally be able to return the snippets of the blog post that answer the question.

In [11]:
for i, chunk in enumerate(all_splits[:3]):  # Just first 3
    print(f"\n--- Chunk {i} ---")
    print(chunk.page_content[:200], "...")  # show part of the text
    vector = embedding_model.embed_query(chunk.page_content)
    print("Vector (first 5 dims):", vector[:5])



--- Chunk 0 ---
LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool con ...
Vector (first 5 dims): [-0.007049907464534044, -0.0005903498386032879, 0.03333946689963341, 0.015461482107639313, -0.01270086970180273]

--- Chunk 1 ---
Memory

Short-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.
Long-term memory: This provides the agent with th ...
Vector (first 5 dims): [0.0036475660745054483, 0.011537445709109306, -0.055276770144701004, 0.05251268297433853, 0.07744487375020981]

--- Chunk 2 ---
Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposit ...
Vector (first 5 dims): [0.03652041405439377, 0.03538128361105919

## 3.1 Query the vector database
We can make a query and look for the closest vector

In [15]:
query = "What is agent Reflection and refinemen?"
docs = vectorstore.similarity_search(query, k=2)

for i, doc in enumerate(docs):
    print(f"Result {i+1}: {doc.page_content}\n")


Result 1: Fig. 3. Illustration of the Reflexion framework. (Image source: Shinn & Labash, 2023)
The heuristic function determines when the trajectory is inefficient or contains hallucination and should be stopped. Inefficient planning refers to trajectories that take too long without success. Hallucination is defined as encountering a sequence of consecutive identical actions that lead to the same observation in the environment.
Self-reflection is created by showing two-shot examples to LLM and each example is a pair of (failed trajectory, ideal reflection for guiding future changes in the plan). Then reflections are added into the agent’s working memory, up to three, to be used as context for querying LLM.

Result 2: Memory stream: is a long-term memory module (external database) that records a comprehensive list of agents’ experience in natural language.

Each element is an observation, an event directly provided by the agent.
- Inter-agent communication can trigger new natural langu

# 4. Retrieval and Generation: Retrieve

In [21]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})

retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")

len(retrieved_docs)

3

In [33]:
print(retrieved_docs[0].page_content)

Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.


In [32]:
for i, doc in enumerate(retrieved_docs):
    print(f"\n--- Chunk {i+1} ---")
    print(doc.page_content[:500])  # You can remove [:500] to show full chunk


--- Chunk 1 ---
Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big ta

--- Chunk 2 ---
Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Ste

# 5. Retrieval and Generation: Generate

Let’s put it all together into a chain that takes a question, retrieves relevant documents, constructs a prompt, passes that to a model, and parses the output.

We’ll use the gpt4all chat model, but any LangChain LLM or ChatModel could be substituted in.

## 5.1 llm

In [None]:
%pip install gpt4all
#%pip install -qU langchain-community llama-cpp-python


In [4]:
from gpt4all import GPT4All
llm = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM


In [5]:
from langchain_community.llms import GPT4All

# This should match your downloaded model path
llm = GPT4All(
    model="Meta-Llama-3-8B-Instruct.Q4_0.gguf", 
    backend="llama", 
    verbose=True
)


In [34]:
from langchain_core.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is Task Decomposition")

' \nTask Decomposition refers to breaking down complex tasks into smaller, manageable steps or subgoals that can be planned and executed individually. This process helps agents like LLM-powered autonomous systems plan ahead and make decisions more effectively.\nThanks for asking! '

In [36]:
for document in response["context"]:
    print(document)
    print()

page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 1585}

page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS 

In [37]:
retrieved_docs = retriever.invoke("What is task Decomposition?")
for i, doc in enumerate(retrieved_docs):
    print(f"\n--- Chunk {i+1} ---\n{doc.page_content[:400]}")



--- Chunk 1 ---
Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more

--- Chunk 2 ---
Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task dec

--- Chunk 3 ---
Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and

# 2. Example with a document text file

In [7]:
from langchain.document_loaders import TextLoader

loader = TextLoader("example.txt")
documents = loader.load()


In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=150, chunk_overlap=30)
chunks = splitter.split_documents(documents)


In [9]:
from langchain_huggingface import HuggingFaceEmbeddings

mbedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

vectorstore = FAISS.from_documents(documents=all_splits, embedding=mbedding_model)
chunk_vectors = mbedding_model.embed_documents([chunk.page_content for chunk in chunks])



In [10]:
print(chunk_vectors[0][:5])  # preview first 5 dimensions of first vector

[0.032221630215644836, 0.006504561752080917, 0.0596463568508625, 0.07347435504198074, 0.010449625551700592]


In [11]:
from langchain.vectorstores import FAISS

db = FAISS.from_documents(chunks, mbedding_model)
db.save_local("faiss_index")  # Save the index locally

# To reload later:
# db = FAISS.load_local("faiss_index", embedding_model)


In [13]:
query = "What is climate"
docs = db.similarity_search(query, k=2)

for i, doc in enumerate(docs):
    print(f"Result {i+1}: {doc.page_content}\n")


Result 1: Title: Climate Change in Latin America

Result 2: Climate change is increasingly affecting Latin Americaâ€™s biodiversity, water resources, and agricultural productivity. Rising temperatures have led

