<a href="https://colab.research.google.com/github/anujdutt9/Talks_and_Presentations/blob/main/Decoding_the_Giants/Demo_3_RAG_for_Website_PDF_QA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simple RAG for PDF QA using Hugging Face Zephyr and LangChain

In [1]:
!pip3 install -U transformers accelerate bitsandbytes sentence-transformers faiss-gpu langchain langchain-community langchain-huggingface beautifulsoup4 -q

In [2]:
# If running in Google Colab, you may need to run this cell to make sure you're
# using UTF-8 locale to install LangChain
import locale

locale.getpreferredencoding = lambda: "UTF-8"

# Load GitHub Issues Text

In [3]:
# Import necessary libraries
from langchain_community.document_loaders import WebBaseLoader

# Replace with your webpage URL
url = "https://abc.xyz/assets/ff/7c/06d6f493f6462caf08e8502ffc33/596de1b094c32cf0592a08edfe84ae74.html"
loader = WebBaseLoader(url)

# Load the webpage content
docs = loader.load()



In [4]:
# # Alternatively, Load the PDF file
# from langchain.document_loaders import PyPDFLoader

# pdf_path = "596de1b094c32cf0592a08edfe84ae74.pdf"
# loader = PyPDFLoader(file_path=pdf_path)

# docs = loader.load()

In [5]:
docs



# Chunk Documents into Pieces for Embedding Model

Keep some overlap between chunks to keep some semantic context between them.

Text Splitter - `RecursiveCharacterTextSplitter`

In [6]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=30
    )

chunked_docs = splitter.split_documents(docs)

In [7]:
chunked_docs[82]

Document(metadata={'source': 'https://abc.xyz/assets/ff/7c/06d6f493f6462caf08e8502ffc33/596de1b094c32cf0592a08edfe84ae74.html', 'title': 'goog-20231231', 'language': 'No language found.'}, page_content='to invest in new businesses, products, services and technologies, and systems, as well as to continue to invest in acquisitions and strategic investments;•our pace of hiring and our plans to provide competitive compensation programs;•our expectation that our cost of revenues, research and development (R&D) expenses, sales and marketing expenses, and general and administrative expenses may increase in amount and/or may increase as a percentage of revenues and may be affected by a number of')

In [8]:
chunked_docs[105]

Document(metadata={'source': 'https://abc.xyz/assets/ff/7c/06d6f493f6462caf08e8502ffc33/596de1b094c32cf0592a08edfe84ae74.html', 'title': 'goog-20231231', 'language': 'No language found.'}, page_content='features to our users as we continue to deliver on our mission to organize the world’s information and make it universally accessible and useful.While we have been integrating AI into our products for years, we are now embedding the power of generative AI to continue helping our users express themselves and get things done. For example, Duet AI in Google Workspace helps users write, organize, visualize, accelerate workflows, and have richer meetings. Bard allows users to collaborate with experimental AI')

# Create the Embeddings and The Retriever

**Text Embedding Model** - `BAAI/bge-base-en-v1.5` from HuggingFace

**Look at the** `Massive Text Embedding Benchmark (MTEB) Leaderboard` for better embedding models.

**Vector Database** - `FAISS` ; This library offers efficient similarity search and clustering of dense vectors, which is what we need here.

In [9]:
from langchain.vectorstores import FAISS
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

# Vector Database
db = FAISS.from_documents(
    chunked_docs,
    HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")
    )

  from tqdm.autonotebook import tqdm, trange


In [10]:
db.similarity_search("How does Alphabet plan to continue its growth in AI and cloud services?")

[Document(metadata={'source': 'https://abc.xyz/assets/ff/7c/06d6f493f6462caf08e8502ffc33/596de1b094c32cf0592a08edfe84ae74.html', 'title': 'goog-20231231', 'language': 'No language found.'}, page_content='In 2015, we implemented a holding company reorganization, and as a result, Alphabet Inc. ("Alphabet") became the successor issuer to Google.We generate revenues by delivering relevant, cost-effective online advertising; cloud-based solutions that provide enterprise customers with infrastructure and platform services as well as communication and collaboration tools; sales of other products and services, such as fees received for consumer subscription-based products, apps and in-app purchases, and'),
 Document(metadata={'source': 'https://abc.xyz/assets/ff/7c/06d6f493f6462caf08e8502ffc33/596de1b094c32cf0592a08edfe84ae74.html', 'title': 'goog-20231231', 'language': 'No language found.'}, page_content='and video. Our teams across Alphabet will leverage Gemini, as well as other AI models we

# Using Vector Database as Retriever

In [11]:
retriever = db.as_retriever(
    search_type="similarity",   # Use similarity search between Query and Documents
    search_kwargs={"k": 10}      # Return top 5 results
)

In [12]:
retriever.invoke("How does Alphabet plan to manage its traffic acquisition costs (TAC) as mentioned in the latest filing?")

[Document(metadata={'source': 'https://abc.xyz/assets/ff/7c/06d6f493f6462caf08e8502ffc33/596de1b094c32cf0592a08edfe84ae74.html', 'title': 'goog-20231231', 'language': 'No language found.'}, page_content='Google Network properties to Google Search & other properties. The TAC rate on Google Search & other revenues and the TAC rate on Google Network revenues were both substantially consistent from 2022 to 2023.The increase in other cost of revenues from 2022 to 2023 was primarily due to increases in content acquisition costs, largely for YouTube, and compensation expenses, which included $479 million of charges related to employee severance associated with the reduction in our workforce. Additionally, other'),
 Document(metadata={'source': 'https://abc.xyz/assets/ff/7c/06d6f493f6462caf08e8502ffc33/596de1b094c32cf0592a08edfe84ae74.html', 'title': 'goog-20231231', 'language': 'No language found.'}, page_content='Report on Form 10-K.As of January\xa023, 2024, there were 5,893 million shares 

# The Model

**Model** - `HuggingFaceH4/zephyr-7b-beta`

Check `Open-source LLM leaderboard` for latest models.

In [13]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_name = "google/gemma-2-2b-it"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(model_name,
                                             quantization_config=bnb_config,
                                             device_map={"":0}
                                             )
tokenizer = AutoTokenizer.from_pretrained(model_name)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

# Setup LLM Chain

```
Input Question + Context Text  ->  Prompt Template (Prompt)  ->  LLM Model  ->  LLM Output Parser
```

In [14]:
from langchain.llms import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from transformers import pipeline
from langchain_core.output_parsers import StrOutputParser

In [15]:
# Setup the Huggingface Text Generation Pipeline
text_generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.2,
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=400,
)

llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

  llm = HuggingFacePipeline(pipeline=text_generation_pipeline)


In [16]:
# Prompt Template with structured response guidance
prompt_template = """
<|system|>
Answer the question based on your knowledge. Use the following context to help:

{context}

Provide the answer in a clear, structured format with bullet points or short paragraphs for better readability.

</s>
<|user|>
{question}
</s>
<|assistant|>

"""

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
)

In [17]:
# LLM Chain
llm_chain = prompt | llm | StrOutputParser()

# Combine Retriever with Model llm_chain to create a RAG Chain


```
Retriever  ->  Context  ->
                            Prompt Template (Prompt)  ->  LLM Model  ->  LLM Output Parser
Question  ---------------->  
```

# Inference without Retriever (Context)

In [18]:
question = "What is the total number of outstanding shares of Alphabet’s Class A, Class B, and Class C stock as of January 23, 2024?"

In [19]:
raw_response = llm_chain.invoke({"context": "", "question": question})
print(raw_response)

The 'max_batch_size' argument of HybridCache is deprecated and will be removed in v4.46. Use the more precisely named 'batch_size' argument instead.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)



<|system|>
Answer the question based on your knowledge. Use the following context to help:



Provide the answer in a clear, structured format with bullet points or short paragraphs for better readability.

</s>
<|user|>
What is the total number of outstanding shares of Alphabet’s Class A, Class B, and Class C stock as of January 23, 2024?
</s>
<|assistant|>

I do not have access to real-time information, including financial data like the number of outstanding shares of Alphabet's stock.  To get that information, you would need to consult reliable sources such as:

* **Alphabet Investor Relations Website:** This website will likely provide the most up-to-date information on their stock holdings.
* **Financial News Websites:** Reputable financial news websites like Bloomberg, Reuters, or Yahoo Finance often report on company financials, including share counts.
* **Securities and Exchange Commission (SEC) Filings:** Public companies are required to file regular reports with the SEC, whi

# Inference with Retriever (Context)

In [20]:
question = "What is the total number of outstanding shares of Alphabet’s Class A, Class B, and Class C stock as of January 23, 2024?"

In [21]:
from langchain_core.runnables import RunnablePassthrough

retriever = db.as_retriever()
rag_chain = {"context": retriever, "question": RunnablePassthrough()} | llm_chain
raw_response = rag_chain.invoke(question)
print(raw_response)


<|system|>
Answer the question based on your knowledge. Use the following context to help:

[Document(metadata={'source': 'https://abc.xyz/assets/ff/7c/06d6f493f6462caf08e8502ffc33/596de1b094c32cf0592a08edfe84ae74.html', 'title': 'goog-20231231', 'language': 'No language found.'}, page_content='Report on Form 10-K.As of January\xa023, 2024, there were 5,893 million shares of Alphabet’s Class\xa0A stock outstanding, 869 million shares of Alphabet’s Class\xa0B stock outstanding, and 5,671 million shares of the Alphabet’s Class\xa0C stock outstanding.___________________________________________DOCUMENTS INCORPORATED BY REFERENCEPortions of the registrant’s Proxy Statement for the 2024 Annual Meeting of Stockholders are incorporated herein by reference in Part\xa0III of this Annual Report on Form 10-K to the extent'), Document(metadata={'source': 'https://abc.xyz/assets/ff/7c/06d6f493f6462caf08e8502ffc33/596de1b094c32cf0592a08edfe84ae74.html', 'title': 'goog-20231231', 'language': 'No lang