# YouTube Channel

**Link:** https://www.youtube.com/@TwoSetAI

# Demystfying Retrieval Augmented Generation (RAG) Approach

**What is RAG?** RAG combines the strengths of generative AI with retrieval techniques to enhance the quality and relevance of generated text.

## RAG Architecture
<img src="https://mallahyari.github.io/rag-ebook/diagrams/rag_architecture.png" width="800" />

## Data Ingestion Pipeline

<img src="https://mallahyari.github.io/rag-ebook/diagrams/rag_data_pipeline.png" />

## Implementation

In [60]:
# pip install -qU langchain-text-splitters
# pip install qdrant-client
# pip install pypdf

# OPENAI_API_KEY = ""

import os
from dotenv import load_dotenv

load_dotenv()
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")

In [83]:
from pypdf import PdfReader

FILE_PATH = os.path.join("data","storm_paper.pdf")
reader = PdfReader(FILE_PATH)
number_of_pages = len(reader.pages)

entire_text = ""
for page_num in range(number_of_pages):
    page = reader.pages[page_num]
    entire_text += page.extract_text()

entire_text[:200]

'Assisting in Writing Wikipedia-like Articles From Scratch\nwith Large Language Models\nYijia Shao Yucheng Jiang Theodore A. Kanell Peter Xu\nOmar Khattab Monica S. Lam\nStanford University\n{shaoyj, yuchen'

## 1. Split text into chunks

In [3]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

In [4]:
text_chunks = text_splitter.split_text(entire_text)
print(f"Total chunks: {len(text_chunks)}")

Total chunks: 215


In [5]:
text_chunks[:2]

['Assisting in Writing Wikipedia-like Articles From Scratch\nwith Large Language Models\nYijia Shao Yucheng Jiang Theodore A. Kanell Peter Xu\nOmar Khattab Monica S. Lam\nStanford University\n{shaoyj, yuchengj, tkanell, peterxu, okhattab}@stanford.edu\nlam@cs.stanford.edu\nAbstract\nWe study how to apply large language models\nto write grounded and organized long-form ar-\nticles from scratch, with comparable breadth\nand depth to Wikipedia pages. This underex-\nplored problem poses new challenges at the',
 'plored problem poses new challenges at the\npre-writing stage, including how to research\nthe topic and prepare an outline prior to writ-\ning. We propose STORM , a writing system\nfor the Synthesis of Topic Outlines through\nRetrieval and Multi-perspective Question Ask-\ning. STORM models the pre-writing stage by\n(1) discovering diverse perspectives in research-\ning the given topic, (2) simulating conversa-\ntions where writers carrying different perspec-']

## LlamaIndex Split by sentence

In [6]:
from llama_index.core.node_parser import SentenceSplitter

llamaindex_splitter = SentenceSplitter(chunk_size=500, chunk_overlap=20)
llamaindex_text_chunks = llamaindex_splitter.split_text(entire_text)

## 2. Embedding Chunks

In [7]:
# !pip install sentence-transformers

import torch
from sentence_transformers import SentenceTransformer

# Check if a GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_name = "BAAI/bge-small-en-v1.5"
# model_name = "all-MiniLM-L6-v2"

embedding_model = SentenceTransformer(model_name, device=device)

In [8]:
embeddings = embedding_model.encode(text_chunks, show_progress_bar=True)

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

In [9]:
embeddings[0].shape

(384,)

## 3. Store in the Vector Database

> We use Qdrant. Please see [here](https://github.com/qdrant/qdrant) for documentation.

### How to run qdrant docker

```bash
docker pull qdrant/qdrant

docker run -p 6333:6333 \
    -v $(pwd)/qdrant_storage:/qdrant/storage \
    qdrant/qdrant
```

In [10]:
# !pip install qdrant-client

# Import client library
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance

client = QdrantClient("http://localhost:6333")

In [11]:
# embedding_model.get_sentence_embedding_dimension()
collection_name = "qa_index"
client.delete_collection(collection_name)

client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    
)

True

### Create payloads and ids

In [12]:
ids = []
payload = []

for id, text in enumerate(text_chunks):
    ids.append(id)
    payload.append({"source": FILE_PATH, "content": text})

payload[0]

{'source': 'data/storm_paper.pdf',
 'content': 'Assisting in Writing Wikipedia-like Articles From Scratch\nwith Large Language Models\nYijia Shao Yucheng Jiang Theodore A. Kanell Peter Xu\nOmar Khattab Monica S. Lam\nStanford University\n{shaoyj, yuchengj, tkanell, peterxu, okhattab}@stanford.edu\nlam@cs.stanford.edu\nAbstract\nWe study how to apply large language models\nto write grounded and organized long-form ar-\nticles from scratch, with comparable breadth\nand depth to Wikipedia pages. This underex-\nplored problem poses new challenges at the'}

In [13]:
client.upload_collection(
    collection_name=collection_name,
    vectors=embeddings,
    payload=payload,
    ids=ids,
    batch_size=256,  # How many vectors will be uploaded in a single request?
)

In [14]:
client.count(collection_name)

CountResult(count=215)

## Recap
1. Read the pdf file and extract text
2. Split/Chunk the textual content
3. Embed the chunks
4. Store the embeddings and matadata in Qdrant vector DB

## Embedding and Storing using Langchain

In [46]:
# pip install langchain-community

from langchain_community.vectorstores import Qdrant
from langchain_core.documents import Document
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name=model_name)

docs = [
    Document(
        page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
        metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
    )]

vectorstore = Qdrant.from_documents(
    docs,
    embeddings,
    path="/tmp/local_qdrant_storage"
    collection_name="my_documents",
)

## Retrieval Component

In [16]:
def search(text: str, top_k: int):
    query_embedding = embedding_model.encode(text).tolist()
    
    search_result = client.search(
        collection_name=collection_name,
        query_vector=query_embedding,
        query_filter=None,  
        limit=top_k
    )
    return search_result

In [84]:
question = "what is storm framework"
results = search(question, top_k=5)
results

[ScoredPoint(id=130, version=0, score=0.72634983, payload={'content': 'In §3, we introduce STORM, a framework that au-\ntomates the pre-writing stage by discovering differ-\nent perspectives, simulating information-seeking\nconversations, and creating a comprehensive out-\nline. Algorithm 1 displays the skeleton of STORM.\nWe implement STORM with zero-shot prompt-\ning using the DSPy framework (Khattab et al.,\n2023). Listing 1 and 2 show the prompts used\nin our implementation. We highlight that STORM\noffers a general framework designed to assist the', 'source': 'data/storm_paper.pdf'}, vector=None, shard_key=None),
 ScoredPoint(id=164, version=0, score=0.6753985, payload={'content': '50$ for our study.for each article. Figure 7 shows the screenshot of\nour web application and the full article produced\nby STORM is included in Table 12. For human\nevaluation, we use a 1 to 7 scale for more fine-\ngrained evaluation. The grading rubric is included\nin Table 10.\nWe collected the pairw

In [18]:
text_chunks[130]

'In §3, we introduce STORM, a framework that au-\ntomates the pre-writing stage by discovering differ-\nent perspectives, simulating information-seeking\nconversations, and creating a comprehensive out-\nline. Algorithm 1 displays the skeleton of STORM.\nWe implement STORM with zero-shot prompt-\ning using the DSPy framework (Khattab et al.,\n2023). Listing 1 and 2 show the prompts used\nin our implementation. We highlight that STORM\noffers a general framework designed to assist the'

## Retrieval using Langchain

In [None]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
retrieved_docs = retriever.invoke(question)

## Response Generation

In [47]:
system_prompt = """You are an assistant for question-answering tasks. Answer the question according only to the given context.
If question cannot be answered using the context, simply say I don't know. Do not make stuff up.

Context: {context}
"""

user_prompt = """
Question: {question}

Answer:"""

references = [obj.payload["content"] for obj in results]


context = "\n\n".join(references)

In [86]:
from litellm import completion

response = completion(
  api_key=OPENAI_API_KEY,
  model="gpt-3.5-turbo",
  messages=[{"content": system_prompt.format(context=context),"role": "system"}, {"content": user_prompt.format(question=question),"role": "user"}]
)

In [87]:
print(response.choices[0].message.content)

STORM is a framework that automates the pre-writing stage by discovering different perspectives, simulating information-seeking conversations, and creating a comprehensive outline. It is implemented using zero-shot prompting with the DSPy framework and is designed to assist in the article writing process.


## Response with References

In [56]:
print(f"ANSWER: {response.choices[0].message.content}\n\n")
print(f"REFERENCES:\n")
for index, ref in enumerate(references):
    print(f"Reference: [{index + 1}]: {ref}\n")

ANSWER: STORM is a framework introduced in the text that automates the pre-writing stage by discovering different perspectives, simulating information-seeking conversations, and creating a comprehensive outline. It is implemented with zero-shot prompting using the DSPy framework and offers a general framework designed to assist in the question-asking process.


REFERENCES:

Reference: [1]: In §3, we introduce STORM, a framework that au-
tomates the pre-writing stage by discovering differ-
ent perspectives, simulating information-seeking
conversations, and creating a comprehensive out-
line. Algorithm 1 displays the skeleton of STORM.
We implement STORM with zero-shot prompt-
ing using the DSPy framework (Khattab et al.,
2023). Listing 1 and 2 show the prompts used
in our implementation. We highlight that STORM
offers a general framework designed to assist the

Reference: [2]: 50$ for our study.for each article. Figure 7 shows the screenshot of
our web application and the full article p

## Streaming Response

In [89]:
response = completion(
  api_key=OPENAI_API_KEY,
  model="gpt-3.5-turbo",
  messages=[{ "content": system_prompt.format(context=context),"role": "system"}, { "content": user_prompt.format(question=question),"role": "user"}],
  stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

STORM is a framework introduced in the context that automates the pre-writing stage by discovering different perspectives, simulating information-seeking conversations, and creating a comprehensive outline. It is implemented with zero-shot prompting using the DSPy framework and is designed to assist in the question-asking process.None

## Use Local models via Ollama

In [91]:
response = completion(
  model="ollama/llama3",
  messages=[{"content": system_prompt.format(context=context),"role": "system"}, {"content": user_prompt.format(question=question),"role": "user"}],
  api_base="http://localhost:11434",
  stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

The STORM framework is an automated pre-writing stage discovery tool that simulates information-seeking conversations and creates a comprehensive outline, as introduced in §3 of the given context.

## Response Generation using Langchain

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is storm?")

## Langchain and Ollama

In [None]:
from langchain_community.llms import Ollama

llm = ChatOllama(model="llama3")


## Advanced RAG Topics

- Query routing
- Multi-document queries
- Multi-modal queries
- etc