# Own RAG system

* We will use as document to index the text in this post: https://lilianweng.github.io/posts/2023-06-23-agent/

* All configurable variables must be extracted for the configuration 





In [1]:
# Cofiguration Variables

from typing import Literal

LLM_NAME = "gpt-3.5-turbo"
EMBEDDING_NAME = "BAAI/bge-large-en-v1.5"

CHUNK_SIZE = 1000
CHUNK_OVERLAP = 250

SEMANTIC_BREAKPOINT_TYPE: Literal[
    "percentile", "standard_deviation", "interquartile", "gradient"
] = "percentile"

SEMANTIC_BREAKPOINT_VALUE = 95 # the default value, lower make more splits

In [2]:
from dotenv import load_dotenv, find_dotenv
import os
import warnings
from IPython.display import display, Markdown  # to see better the output text

warnings.filterwarnings('ignore')
_ = load_dotenv(find_dotenv())  # read local .env file

# os.environ["LANGSMITH_API_KEY"] = getting from .env file
os.environ["LANGSMITH_TRACING"] = "true"



In [3]:
import bs4 #beautifulsoup4
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

print(f"Amount of documents loaded: {len(docs)}")
len(docs[0].page_content)

USER_AGENT environment variable not set, consider setting it to identify your requests.


Amount of documents loaded: 1


43131

* Create self-document with the text extracted from the HTML

In [5]:
from langchain_core.documents import Document

doc = Document(docs[0].page_content, metadata={
               "id": "fake_id", "user": "manuel", "collection": "blog"})

* We first use `SemanticChunker` to split the text into chunks and then use `RecursiveCharacterTextSplitter` to control the document size (we get started with `chunk_size=1000` and `chunk_overlap=250`). 

In [7]:
# %pip install --quiet langchain_experimental
from langchain_experimental.text_splitter import SemanticChunker
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter


embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_NAME)

semantic_splitter = SemanticChunker(
    embeddings,
    breakpoint_threshold_type=SEMANTIC_BREAKPOINT_TYPE, breakpoint_threshold_amount=SEMANTIC_BREAKPOINT_VALUE)

docs_split = semantic_splitter.split_documents([doc]) 
# semantic_splitter takes 110 seconds to run on a 43131 character document

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=CHUNK_SIZE,
    chunk_overlap=CHUNK_OVERLAP,
    separators=["\n\n", "\n", "."])

all_splits = text_splitter.split_documents(docs_split)

len(all_splits)

81

In [8]:
Markdown(all_splits[20].page_content)

LSH (Locality-Sensitive Hashing): It introduces a hashing function such that similar input items are mapped to the same buckets with high probability, where the number of buckets is much smaller than the number of inputs. ANNOY (Approximate Nearest Neighbors Oh Yeah): The core data structure are random projection trees, a set of binary trees where each non-leaf node represents a hyperplane splitting the input space into half and each leaf stores one data point. Trees are built independently and at random, so to some extent, it mimics a hashing function. ANNOY search happens in all the trees to iteratively search through the half that is closest to the query and then aggregates the results. The idea is quite related to KD tree but a lot more scalable. HNSW (Hierarchical Navigable Small World): It is inspired by the idea of small world networks where most nodes can be reached by any other nodes within a small number of steps; e.g. “six degrees of separation” feature of social networks

* We will use Qdrant like vectorstore and its Hybrid Search to retrieve documents.

* We store the original `ID` of each document as a field in the metadata. To delete a document from the Qdrant vector store, we filter by this metadata field to retrieve a list of document IDs associated(using the `_id` metadata field). With this list of IDs, we can proceed to delete the documents that correspond to the original entries.

In [9]:

# %pip install -qU langchain-qdrant
from langchain_qdrant import QdrantVectorStore
from langchain_qdrant import FastEmbedSparse, RetrievalMode
# * 

qdrant_vectorstore = QdrantVectorStore.from_documents(
    [],
    embedding=embeddings,
    sparse_embedding=FastEmbedSparse(model_name="Qdrant/bm25"),
    path='qdrant_db_hybrid1',
    collection_name="my_documents",
    retrieval_mode=RetrievalMode.HYBRID,
)

# TODO: see the comportament of the FastEmbedSparse


Fetching 29 files: 100%|██████████| 29/29 [00:00<00:00, 325226.78it/s]


In [10]:
qdrant_vectorstore.add_documents(all_splits)

['19d0a46422c44e3f948e8f4268cedac0',
 'd01ba413fb7843f6b8fcee8181fa7c6d',
 'd4f94e3168cb4558b1e983b93252da18',
 '38393a10f106471a88abdedb2ffe5bc7',
 '29b49a2ae87c4bdb87750d1d45578e75',
 '38bc2ca2876947f19a7c17a8fdfb33df',
 '4beb0a0fee09445dafdf1324aa0facaa',
 'c421637b217f4e808d465f8d84580a7b',
 'ab145d9e02c24966b07f4321ce503c7c',
 'b2d056bc0a9e4e89870641f2a2981b40',
 'f0a0e3dc54984133b18cd6d1da34aed1',
 '1720a8a764a446d7a408f39fdf5f03f6',
 '3cdbcbbbb1e7402aac450985cdc2b898',
 '2debf4d04a5c4ef496e89d958ec288f6',
 '398b4032266e472cb0e1064597c6b2f8',
 '0567ce32422c4ceaaf9c6708b0e467e4',
 '068bf1a7819b4c2b840e12cf47a8e7a5',
 '2e1b13e869be461cb79ecccb170cd868',
 'a2d953fa3f2d4aea988826cb3c0a62e2',
 '222098499d5f4f18a62ab308184a52c6',
 'd55abff921cb4122b089c36c07e4d036',
 'bd10f87389c844f38cf0a4e988d137d3',
 'eaa7e1f6e5ed439db96f7db32a656ab3',
 '2cd933cca52746e485f913fdb4518616',
 'b6124a98599144228cfd1671e8e0bb6f',
 'a8aa8c607e1e42cb9b4daf67fcafc330',
 '9623e3824a3047749362cba3ac352609',
 

In [43]:
qdrant_vectorstore._client.count(collection_name="my_documents",)


CountResult(count=81)

We’ll use the LCEL Runnable protocol to define the chain, allowing us to

* pipe together components and functions in a transparent way
* automatically trace our chain in `LangSmith`
* get streaming, async, and batched calling out of the box.

In [44]:
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from qdrant_client.http.models import models
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model=LLM_NAME)


def retriever_with_score(input_dict):
    list_results = qdrant_vectorstore.similarity_search_with_score(
        input_dict["query"],
        k=3,
        filter=models.Filter(
            must=[
                models.FieldCondition(
                    key="metadata.user",
                    match=models.MatchText(text="manuel"),
                ),
                models.FieldCondition(
                    key="metadata.collection",
                    match=models.MatchText(text="blog"),
                ),
            ]
        ),
    )
    scores = [score for _, score in list_results]
    sources = [doc for doc, _ in list_results]

    return {"query": input_dict["query"],
            "sources": sources,
            "scores": scores,
            }


def format_docs(input_dict):
    input_dict["context"] = "\n\n".join(
        doc.page_content for doc in input_dict["sources"])
    return input_dict


system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Keep the answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{query}"),
    ]
)


question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = RunnableLambda(retriever_with_score).assign(
    answer=(format_docs | prompt | llm | StrOutputParser()))

In [47]:
output = rag_chain.invoke({"query":"Tell me about the approaches to Task Decomposition?"})
# retrieved_docs = rag_chain.invoke("Locality-Sensitive Hashing")
output

{'query': 'Tell me about the approaches to Task Decomposition?',
 'sources': [Document(metadata={'id': 'fake_id', 'user': 'manuel', 'collection': 'blog', '_id': '29b49a2ae87c4bdb87750d1d45578e75', '_collection_name': 'my_documents'}, page_content='. Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs. Another quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural 

In [48]:
Markdown(output["answer"])

Task decomposition can be done in several ways:

1. **Simple Prompting**: Using prompts like "Steps for XYZ.\n1." or "What are the subgoals for achieving XYZ?" to guide the LLM in breaking down tasks.
2. **Task-Specific Instructions**: Providing specific instructions tailored to the task, such as "Write a story outline." for writing.
3. **Human Inputs**: Involving human guidance to assist in the decomposition process.
4. **LLM+P Approach**: Utilizing an external classical planner with the Planning Domain Definition Language (PDDL). This involves translating the problem into PDDL, generating a plan with a classical planner, and then converting the plan back into natural language. 

These methods enable effective planning and management of complex tasks.