## Creating a RAG system to extract the content from the pdf and build a Q&A system [(link)](https://arxiv.org/pdf/2404.07143.pdf)

In [1]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-4.2.0-py3-none-any.whl.metadata (7.4 kB)
Downloading pypdf-4.2.0-py3-none-any.whl (290 kB)
   ---------------------------------------- 0.0/290.4 kB ? eta -:--:--
   ---------------- ----------------------- 122.9/290.4 kB 3.6 MB/s eta 0:00:01
   ---------------------------------------- 290.4/290.4 kB 3.6 MB/s eta 0:00:00
Installing collected packages: pypdf
Successfully installed pypdf-4.2.0



[notice] A new release of pip is available: 23.3.2 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
!pip install langchain_community

Collecting langchain_community
  Downloading langchain_community-0.0.34-py3-none-any.whl.metadata (8.5 kB)
Collecting SQLAlchemy<3,>=1.4 (from langchain_community)
  Downloading SQLAlchemy-2.0.29-cp311-cp311-win_amd64.whl.metadata (9.8 kB)
Collecting aiohttp<4.0.0,>=3.8.3 (from langchain_community)
  Downloading aiohttp-3.9.5-cp311-cp311-win_amd64.whl.metadata (7.7 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl.metadata (25 kB)
Collecting langchain-core<0.2.0,>=0.1.45 (from langchain_community)
  Downloading langchain_core-0.1.45-py3-none-any.whl.metadata (5.9 kB)
Collecting langsmith<0.2.0,>=0.1.0 (from langchain_community)
  Downloading langsmith-0.1.50-py3-none-any.whl.metadata (13 kB)
Collecting aiosignal>=1.1.2 (from aiohttp<4.0.0,>=3.8.3->langchain_community)
  Downloading aiosignal-1.3.1-py3-none-any.whl.metadata (4.0 kB)
Collecting frozenlist>=1.1.1 (from aiohttp<4.0.0,>=3.8.3->langchain_community)
  


[notice] A new release of pip is available: 23.3.2 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [6]:
!pip install -U langchain-text-splitters

Collecting langchain-text-splitters
  Downloading langchain_text_splitters-0.0.1-py3-none-any.whl.metadata (2.0 kB)
Downloading langchain_text_splitters-0.0.1-py3-none-any.whl (21 kB)
Installing collected packages: langchain-text-splitters
Successfully installed langchain-text-splitters-0.0.1



[notice] A new release of pip is available: 23.3.2 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [40]:
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from IPython.display import Markdown as md

In [7]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("https://arxiv.org/pdf/2404.07143.pdf")
pages = loader.load_and_split()

In [15]:
pages[0].page_content

'Preprint. Under review.\nLeave No Context Behind:\nEfficient Infinite Context Transformers with Infini-attention\nTsendsuren Munkhdalai, Manaal Faruqui and Siddharth Gopal\nGoogle\ntsendsuren@google.com\nAbstract\nThis work introduces an efficient method to scale Transformer-based Large\nLanguage Models (LLMs) to infinitely long inputs with bounded memory\nand computation. A key component in our proposed approach is a new at-\ntention technique dubbed Infini-attention. The Infini-attention incorporates\na compressive memory into the vanilla attention mechanism and builds\nin both masked local attention and long-term linear attention mechanisms\nin a single Transformer block. We demonstrate the effectiveness of our\napproach on long-context language modeling benchmarks, 1M sequence\nlength passkey context block retrieval and 500K length book summarization\ntasks with 1B and 8B LLMs. Our approach introduces minimal bounded\nmemory parameters and enables fast streaming inference for LLMs

## CHUNKING the PAGES

In [9]:
from langchain_text_splitters import NLTKTextSplitter

text_splitter = NLTKTextSplitter(chunk_size=500, chunk_overlap=100)

chunks = text_splitter.split_documents(pages)

print(len(chunks))

print(type(chunks[0]))

Created a chunk of size 568, which is longer than the specified 500
Created a chunk of size 506, which is longer than the specified 500
Created a chunk of size 633, which is longer than the specified 500


110
<class 'langchain_core.documents.base.Document'>


## Passing the key and creating an Embedding model

In [17]:
f=open("C:\csv\key1.txt")
key=f.read()

In [22]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embedding_model = GoogleGenerativeAIEmbeddings(google_api_key=key, 
                                               model="models/embedding-001")

## Storing the chunks in vector store

In [26]:
from langchain_community.vectorstores import Chroma

db = Chroma.from_documents(chunks, embedding_model, persist_directory="./chroma_db_")

db.persist()

In [29]:
db_connection = Chroma(persist_directory="./chroma_db_", embedding_function=embedding_model)

## Setting-up vector store as a retriver

In [31]:
retriever = db_connection.as_retriever(search_kwargs={"k": 5})

In [34]:
chat_template = ChatPromptTemplate.from_messages([
    # System Message Prompt Template
    SystemMessage(content="""You are a Helpful AI Bot. 
    You take the context and question from user. Your answer should be based on the specific context."""),
    # Human Message Prompt Template
    HumanMessagePromptTemplate.from_template("""Aswer the question based on the given context.
    Context:
    {context}
    
    Question: 
    {question}
    
    Answer: """)
])

## LLM

In [36]:
from langchain_google_genai import ChatGoogleGenerativeAI

chat_model = ChatGoogleGenerativeAI(google_api_key=key, 
                                   model="gemini-1.5-pro-latest")

In [37]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

In [38]:
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | chat_template
    | chat_model
    | output_parser
)

## Based on user query retrive the content

In [39]:
response = rag_chain.invoke("What is this paper Leave No Context Behind all about")

response

'## Leave No Context Behind: Understanding the Paper\'s Core Idea\n\nBased on the provided context, the paper "Leave No Context Behind" appears to focus on introducing a novel attention mechanism for LLMs (Large Language Models) called **Infini-attention**. This mechanism aims to tackle the challenge of effectively handling both long and short-range contextual dependencies in text processing, a crucial aspect for accurate and nuanced language understanding.\n\nHere\'s a breakdown of the paper\'s key points:\n\n**Problem Addressed:**\n\n* Current LLMs lack efficient and practical methods for incorporating compressive memory, which is essential for capturing long-range context. \n* Existing attention mechanisms struggle to balance simplicity with quality, often leading to complex implementations or compromised performance.\n\n**Proposed Solution: Infini-attention**\n\n* Combines local and global context states, similar to multi-head attention but with an added long-term memory component.

## Markdown the respone

In [42]:
md(response)

## Leave No Context Behind: Understanding the Paper's Core Idea

Based on the provided context, the paper "Leave No Context Behind" appears to focus on introducing a novel attention mechanism for LLMs (Large Language Models) called **Infini-attention**. This mechanism aims to tackle the challenge of effectively handling both long and short-range contextual dependencies in text processing, a crucial aspect for accurate and nuanced language understanding.

Here's a breakdown of the paper's key points:

**Problem Addressed:**

* Current LLMs lack efficient and practical methods for incorporating compressive memory, which is essential for capturing long-range context. 
* Existing attention mechanisms struggle to balance simplicity with quality, often leading to complex implementations or compromised performance.

**Proposed Solution: Infini-attention**

* Combines local and global context states, similar to multi-head attention but with an added long-term memory component.
* This allows the model to effectively access and utilize information from both recent and distant parts of the input sequence.
*  Maintains a balance between practicality and performance by introducing minimal changes to the standard scaled dot-product attention.

**Key Contributions:**

1. **Powerful and Practical Attention:** Infini-attention offers an efficient solution for modeling long and short-range context, enhancing the LLM's ability to understand complex relationships within text.
2. **Plug-and-Play Design:**  The mechanism seamlessly integrates with existing LLM architectures, supporting continual pre-training and adaptation to long-context scenarios.

**Overall Significance:**

The paper presents a promising approach to improve LLM performance by effectively addressing the limitations of current attention mechanisms. By incorporating long-term memory and simplifying the attention process, Infini-attention has the potential to enhance various NLP tasks requiring a deep understanding of context, such as text summarization, question answering, and machine translation. 
