# **Building a QnA system**


In [1]:
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

chat_template = ChatPromptTemplate.from_messages([
    # System Message Prompt Template
    SystemMessage(content="""You are a Helpful AI Bot. 
    You take the question from user and answer if you have the specific information related to the question. """),
    # Human Message Prompt Template
    HumanMessagePromptTemplate.from_template("""Aswer the following question: {question}
    Answer: """)
])

chat_template

ChatPromptTemplate(input_variables=['question'], messages=[SystemMessage(content='You are a Helpful AI Bot. \n    You take the question from user and answer if you have the specific information related to the question. '), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], template='Aswer the following question: {question}\n    Answer: '))])

In [2]:
file1 = open("/Users/eleshalapravalika/Downloads/GEMINI AI TUTOR/gemini_key.txt")
key = file1.read()

In [3]:
from langchain_google_genai import ChatGoogleGenerativeAI

chat_model = ChatGoogleGenerativeAI(google_api_key=key, 
                                   model="gemini-1.5-pro-latest")

In [4]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

In [5]:
chain = chat_template | chat_model | output_parser

In [6]:
user_input = "Can you tell me about the Leave no context behind paper"

print(chain.invoke({"question": user_input}))

## Leave No Context Behind: A Potential Analysis

Unfortunately, with my knowledge cutoff in November 2023, I don't have specific information about a paper titled "Leave No Context Behind." There are a few possibilities and approaches we can take to find out more:

**1. Identifying the Paper:**

* **Title Search:** Try searching online databases like Google Scholar, Semantic Scholar, or research repositories like arXiv using the exact title "Leave No Context Behind." This might lead you to the paper directly or related works.
* **Keyword Search:**  If the exact title search doesn't work, try searching for keywords related to the paper's potential topic. For example, if you suspect the paper is about natural language processing, you could search for "contextual language models," "contextual embeddings," or "context-aware NLP."
* **Author Search:** If you know the author(s) of the paper, try searching for their names and see if the paper appears in their list of publications.

**2. Explo

In [7]:
# Load a document

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("/Users/eleshalapravalika/Downloads/LEAVENOCONTEXTBEHIND.pdf")
pages = loader.load_and_split()

In [8]:
# Split the document into chunks

from langchain_text_splitters import NLTKTextSplitter

text_splitter = NLTKTextSplitter(chunk_size=500, chunk_overlap=100)

chunks = text_splitter.split_documents(pages)

print(len(chunks))

print(type(chunks[0]))

Created a chunk of size 568, which is longer than the specified 500
Created a chunk of size 506, which is longer than the specified 500
Created a chunk of size 633, which is longer than the specified 500


110
<class 'langchain_core.documents.base.Document'>


In [9]:
# Creating Chunks Embedding
# We are just loading OpenAIEmbeddings

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embedding_model = GoogleGenerativeAIEmbeddings(google_api_key=key, 
                                               model="models/embedding-001")

# vectors = embeddings.embed_documents(chunks)

In [10]:
# Store the chunks in vector store
from langchain_community.vectorstores import Chroma

# Embed each chunk and load it into the vector store
db = Chroma.from_documents(chunks, embedding_model, persist_directory="./chroma_db_")

# Persist the database on drive
db.persist()

In [11]:
# Setting a Connection with the ChromaDB
db_connection = Chroma(persist_directory="./chroma_db_", embedding_function=embedding_model)

In [12]:
# Converting CHROMA db_connection to Retriever Object
retriever = db_connection.as_retriever(search_kwargs={"k": 5})

print(type(retriever))

<class 'langchain_core.vectorstores.VectorStoreRetriever'>


Now let’s write the actual application logic. We want to create a simple application that takes a user question, searches for documents relevant to that question, passes the retrieved documents and initial question to a model, and returns an answer.

In [13]:
retrieved_docs = retriever.invoke(user_input)

In [14]:
len(retrieved_docs)

5

In [15]:
print(retrieved_docs[0].page_content)

Preprint.

Under review.


In [16]:
chat_template = ChatPromptTemplate.from_messages([
    # System Message Prompt Template
    SystemMessage(content="""You are a Helpful AI Bot. 
    You take the question related to Context from user and answer if you have the specific information related to the question."""),
    # Human Message Prompt Template
    HumanMessagePromptTemplate.from_template("""Answer the question based on the given context.
    Context:
    {context}
    
    Question: 
    {question}
    
    Answer: """)
])


In [17]:
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | chat_template
    | chat_model
    | output_parser
)

In [18]:
response = rag_chain.invoke("Can you tell me about the Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention")

response

'## Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention\n\nBased on the context you provided, here\'s a summary of the paper "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention":\n\n**Main Idea:**\n\nThis work introduces **Infini-attention**, a novel attention mechanism designed to efficiently handle both long and short-range contextual dependencies within Transformer models. This enables the processing of infinitely long contexts, overcoming limitations of standard attention mechanisms.\n\n**Key Contributions:**\n\n1. **Infini-attention Mechanism:** This powerful attention mechanism combines:\n    * **Long-term compressive memory:**  Stores and retrieves relevant information from extensive past contexts.\n    * **Local causal attention:** Focuses on recent context for capturing local dependencies. \n2. **Minimal Modification:**  Infini-attention integrates seamlessly with existing Transformer architectures, requiri

In [19]:
from IPython.display import Markdown as md

md(response)

## Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Based on the context you provided, here's a summary of the paper "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention":

**Main Idea:**

This work introduces **Infini-attention**, a novel attention mechanism designed to efficiently handle both long and short-range contextual dependencies within Transformer models. This enables the processing of infinitely long contexts, overcoming limitations of standard attention mechanisms.

**Key Contributions:**

1. **Infini-attention Mechanism:** This powerful attention mechanism combines:
    * **Long-term compressive memory:**  Stores and retrieves relevant information from extensive past contexts.
    * **Local causal attention:** Focuses on recent context for capturing local dependencies. 
2. **Minimal Modification:**  Infini-attention integrates seamlessly with existing Transformer architectures, requiring minimal changes to the standard scaled dot-product attention.
3. **Plug-and-Play:**  It supports continual pre-training and long-context adaptation, allowing models to learn from ever-growing datasets and adapt to new information efficiently.

**How it Works:**

* Infini-attention incorporates a compressive memory into the vanilla attention mechanism.
* It utilizes both masked local attention and long-term linear attention within a single Transformer block.
* This enables the model to access and process information from both recent and distant past contexts effectively.

**Benefits:**

* **Efficient Long-Context Modeling:**  Handles infinitely long sequences of data, overcoming limitations of traditional models.
* **Improved Performance:**  Leads to better performance on tasks requiring long-range context understanding. 
* **Continual Learning:**  Enables models to continuously learn and adapt to new information without forgetting past knowledge. 

**Overall, Infini-attention presents a significant advancement in Transformer models, allowing them to efficiently process and learn from infinitely long contexts, opening doors to more powerful and versatile natural language processing applications.** 


In [20]:
response = rag_chain.invoke("What is Compressive Memory?")

md(response)

## Compressive Memory Explained:

Based on the context you provided, **compressive memory** is a unique approach to storing and recalling information that draws inspiration from the plasticity of biological neurons. Unlike traditional memory systems (like arrays) that grow in size with the amount of data, compressive memory utilizes a fixed number of parameters. This allows it to maintain bounded storage and computation costs, making it efficient even when dealing with large amounts of data. 

**Here's how it works:**

* **Parameterized functions as memory:** Instead of simply storing data directly, compressive memory uses parameterized functions (mathematical representations with adjustable parameters) to encode and represent the information. 
* **Adding new information:** When new information is introduced, the parameters of these functions are adjusted and updated. The objective is to modify the functions in such a way that the original information can be accurately recovered later.
* **Benefits:** This approach offers several advantages:
    * **Efficiency:** By using a fixed number of parameters, compressive memory avoids the ever-increasing memory demands of traditional methods.
    * **Bounded costs:**  Storage and computational costs remain manageable, making it suitable for resource-constrained environments.
    * **Adaptability:** The ability to adjust parameters allows the memory to continuously learn and adapt to new information.

**Current Challenges:**

While the concept of compressive memory holds great promise, the context mentions that current large language models (LLMs) haven't yet found a way to implement it effectively in a practical setting. The challenge lies in striking the right balance between simplicity and the quality of information storage and retrieval.


In [21]:
response = rag_chain.invoke(" Explain  LLM Continual Pre-training in detail")

md(response)

## LLM Continual Pre-training Explained: Adapting to Long-Context Information

LLM Continual Pre-training focuses on adapting existing Large Language Models (LLMs) to handle **long-context information** effectively. This is crucial because standard LLMs often struggle with processing and understanding lengthy sequences of text.

Here's a breakdown of the key elements involved:

**1. Extending Attention Mechanisms:**

*   Traditional LLMs use "dot-product attention" which has limitations when dealing with long sequences. 
*   This method is replaced with mechanisms like **Infini-attention** that are better suited for long-context scenarios.

**2. Continued Pre-training on Long Sequences:**

*   Existing LLMs are further trained on datasets containing text sequences exceeding 4,000 tokens. 
*   Examples of such datasets include PG19, Arxiv-math corpus, and lengthy sections from the C4 text dataset.

**3. Segmenting Long Sequences:**

*   To manage the computational challenges of processing extensive text sequences, the input is divided into segments. 
*   In the given context, a segment length (N) of 2,000 tokens is used throughout the experiments.

**4. Lightweight Adaptation:**

*   The pre-training process is designed to be lightweight, meaning it efficiently adapts the existing LLM without requiring extensive resources or retraining from scratch.

**Benefits of LLM Continual Pre-training:**

*   **Improved Performance on Long-Context Tasks:** LLMs become capable of understanding and responding to prompts or questions that require processing lengthy information sequences. 
*   **Enhanced Comprehension and Reasoning:** By considering a broader context, LLMs can achieve deeper comprehension and provide more insightful responses.
*   **Efficient Adaptation:** Existing LLMs can be adapted to long-context scenarios without the need for extensive retraining, saving time and resources.

**Examples of Applications:**

*   **Summarizing lengthy documents or research papers**
*   **Answering complex questions that require considering extensive background information**
*   **Generating coherent and contextually relevant text in creative writing or dialogue systems** 


In [22]:
response = rag_chain.invoke(" What are Efficient Infinite Context Transformers")

md(response)

## Efficient Infinite Context Transformers: A Summary Based on the Context

The provided text seems to be discussing "Efficient Infinite Context Transformers," likely referring to a specific model or architecture within the realm of Transformer models in machine learning. While the full details are not available, we can glean some key points:

**Key Features:**

* **Unbounded Context Window:** This model appears to address the limitations of traditional Transformers with fixed-length context windows. It can handle and process input sequences of theoretically infinite length, which is crucial for tasks requiring long-range dependencies, such as long document summarization or analyzing extensive time-series data. 
* **Bounded Memory Footprint:** Despite handling unbounded context, the model maintains a controlled memory footprint. This is achieved through efficient memory management techniques, making it practical for real-world applications where memory limitations are a concern.
* **Infini-Attention Mechanism:**  The core of this model likely involves a novel attention mechanism called "Infini-attention." This mechanism appears to combine both local and global context states, similar to multi-head attention but with the ability to handle extended sequences.

**Possible Applications:**

* **Long Document Summarization:** Analyzing and summarizing lengthy documents like research papers or books.
* **Time-Series Analysis:** Processing and forecasting extensive time-series data, such as financial markets or climate patterns.
* **Natural Language Understanding:**  Tasks involving understanding complex language structures and long-range dependencies within text. 
* **Code Generation and Analysis:**  Analyzing and generating code where understanding dependencies across long code sequences is crucial.

**Additional Notes:**

* The provided context mentions "segment-level memory models" and comparisons, suggesting that this model might be an improvement over existing approaches for handling long sequences.
* The reference to Figure 1 and Table 1 implies that the full document likely contains visual illustrations and detailed comparisons with other models, which would provide a more comprehensive understanding. 
