# Task
Build a Retrieval Augmented Generation (RAG) system using the PDF document `/content/attention-is-all-you-need.pdf`, demonstrating its functionality by answering a question based on the document's content, and provide a summary of the implementation.

## Load and Process Document

### Subtask:
Load the PDF document '/content/attention-is-all-you-need.pdf' and extract its text content.


**Reasoning**:
The first instruction is to install the `PyPDF2` library. This is a prerequisite for extracting text from a PDF document.



In [1]:
pip install PyPDF2

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/232.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


**Reasoning**:
Now that PyPDF2 is installed, the next step is to load the PDF document and extract its text content by iterating through each page as per the subtask instructions.



In [2]:
import PyPDF2

pdf_path = '/content/attention-is-all-you-need.pdf'
document_text = ''

with open(pdf_path, 'rb') as file:
    reader = PyPDF2.PdfReader(file)
    for page_num in range(len(reader.pages)):
        page = reader.pages[page_num]
        document_text += page.extract_text()

print("Successfully extracted text from the PDF. The first 500 characters are:")
print(document_text[:500])

Successfully extracted text from the PDF. The first 500 characters are:
Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and figures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.comNoam Shazeer∗
Google Brain
noam@google.comNiki Parmar∗
Google Research
nikip@google.comJakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.comAidan N. Gomez∗ †
University of Toronto
aidan@cs.toronto.eduŁukasz Kaise


## Chunk Document Text

### Subtask:
Split the extracted text into smaller, manageable chunks with overlap.


**Reasoning**:
The next step is to import the `RecursiveCharacterTextSplitter` from `langchain.text_splitter` as instructed to prepare for text chunking.



In [3]:
pip install langchain



**Reasoning**:
Now that `langchain` is installed, I will import `RecursiveCharacterTextSplitter`, instantiate it with specified chunk size and overlap, and then apply it to `document_text` to create `text_chunks` as per the subtask instructions.



In [5]:
pip install langchain-text-splitters

Collecting langchain-text-splitters
  Downloading langchain_text_splitters-1.1.0-py3-none-any.whl.metadata (2.7 kB)
Downloading langchain_text_splitters-1.1.0-py3-none-any.whl (34 kB)
Installing collected packages: langchain-text-splitters
Successfully installed langchain-text-splitters-1.1.0


**Reasoning**:
Now that `langchain-text-splitters` is installed, I need to import `RecursiveCharacterTextSplitter` from the correct path, instantiate it with the specified `chunk_size` and `chunk_overlap`, and then apply it to the `document_text` variable to create `text_chunks` as per the subtask instructions. The import path has changed based on the new package installation.



In [6]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Instantiate the text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    is_separator_regex=False,
)

# Split the document text into chunks
text_chunks = text_splitter.split_text(document_text)

print(f"Number of chunks: {len(text_chunks)}")
print("First chunk:\n", text_chunks[0][:500])

Number of chunks: 50
First chunk:
 Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and figures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.comNoam Shazeer∗
Google Brain
noam@google.comNiki Parmar∗
Google Research
nikip@google.comJakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.comAidan N. Gomez∗ †
University of Toronto
aidan@cs.toronto.eduŁukasz Kaise


## Generate Embeddings

### Subtask:
Generate numerical embeddings for each of the text chunks.


**Reasoning**:
The first step is to install the `langchain-google-genai` library, which is necessary to access Google's generative AI models for embedding generation.



In [7]:
pip install langchain-google-genai

Collecting langchain-google-genai
  Downloading langchain_google_genai-4.2.0-py3-none-any.whl.metadata (2.7 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain-google-genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Downloading langchain_google_genai-4.2.0-py3-none-any.whl (66 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.5/66.5 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Installing collected packages: filetype, langchain-google-genai
Successfully installed filetype-1.2.0 langchain-google-genai-4.2.0


**Reasoning**:
Now that `langchain-google-genai` is installed, I will import `GoogleGenerativeAIEmbeddings`, initialize the embedding model with the `GOOGLE_API_KEY`, and then generate embeddings for each `text_chunk` as per the subtask instructions. It's important to ensure the `GOOGLE_API_KEY` environment variable is set before running this code.



In [None]:
import os

# IMPORTANT: Replace 'YOUR_ACTUAL_GOOGLE_API_KEY' with your actual, valid Google API Key
os.environ['GOOGLE_API_KEY'] = 'AIzaSyBOwAkK1fXvQfiEWJy5hN63xiwm3aOXBQk'


In [10]:
import google.generativeai as genai

# Configure the API key
# Assuming os.environ['GOOGLE_API_KEY'] is already set from previous steps
if 'GOOGLE_API_KEY' not in os.environ:
    raise ValueError("GOOGLE_API_KEY environment variable not set. Please set it to your Google API Key.")

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

# List all available models
all_models = genai.list_models()

# Filter for embedding models
embedding_models = [m for m in all_models if 'embedContent' in m.supported_generation_methods]

print("Available Embedding Models:")
for model in embedding_models:
    print(f"- {model.name}")



All support for the `google.generativeai` package has ended. It will no longer be receiving 
updates or bug fixes. Please switch to the `google.genai` package as soon as possible.
See README for more details:

https://github.com/google-gemini/deprecated-generative-ai-python/blob/main/README.md

  loader.exec_module(module)


Available Embedding Models:
- models/gemini-embedding-001


**Reasoning**:
The previous step identified the correct embedding model name as 'models/gemini-embedding-001'. I will now update the code to use this correct model name for `GoogleGenerativeAIEmbeddings` to successfully generate embeddings for each text chunk.



In [11]:
import os
from langchain_google_genai import GoogleGenerativeAIEmbeddings

# Initialize the embedding model with the correct model name
embeddings_model = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")

# Generate embeddings for each text chunk
text_embeddings = embeddings_model.embed_documents(text_chunks)

print(f"Number of embeddings generated: {len(text_embeddings)}")
print(f"Dimension of first embedding: {len(text_embeddings[0])}")

Number of embeddings generated: 50
Dimension of first embedding: 3072


## Set Up Vector Store

### Subtask:
Create a vector store (e.g., an in-memory vector database) to store the generated embeddings and their corresponding text chunks.


**Reasoning**:
The first instruction is to install the `faiss-cpu` and `langchain-community` libraries, which are necessary for creating an in-memory vector store with LangChain.



In [12]:
pip install faiss-cpu langchain-community

Collecting faiss-cpu
  Downloading faiss_cpu-1.13.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.6 kB)
Collecting langchain-community
  Downloading langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-classic<2.0.0,>=1.0.0 (from langchain-community)
  Downloading langchain_classic-1.0.1-py3-none-any.whl.metadata (4.2 kB)
Collecting requests<3.0.0,>=2.32.5 (from langchain-community)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7.0,>=0.6.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7.0,>=0.6.7->langchain-community)
  Downloading marshmallow-3.26.2-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7.0,>=0.6.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting mypy-extension

**Reasoning**:
Now that `faiss-cpu` and `langchain-community` are installed, I will import the `FAISS` class from `langchain_community.vectorstores` and then create an in-memory vector store using the `text_chunks` and `embeddings_model` as specified in the subtask instructions.



In [2]:
import os
import PyPDF2
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import FAISS

# Re-load document text (from previous steps, to ensure document_text is available)
pdf_path = '/content/attention-is-all-you-need.pdf'
document_text = ''

with open(pdf_path, 'rb') as file:
    reader = PyPDF2.PdfReader(file)
    for page_num in range(len(reader.pages)):
        page = reader.pages[page_num]
        document_text += page.extract_text()

# Re-split document text into chunks (from previous steps, to ensure text_chunks is available)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    is_separator_regex=False,
)
text_chunks = text_splitter.split_text(document_text)

# Explicitly set GOOGLE_API_KEY here to ensure it's available in this execution context
# IMPORTANT: Replace 'AIzaSyBOwAkK1fXvQfiEWJy5hN63xiwm3aOXBQk' with your actual, valid Google API Key
os.environ['GOOGLE_API_KEY'] = 'AIzaSyBOwAkK1fXvQfiEWJy5hN63xiwm3aOXBQk'

# Initialize embedding model by passing the api_key directly
embeddings_model = GoogleGenerativeAIEmbeddings(
    model="models/gemini-embedding-001",
    api_key=os.environ.get('GOOGLE_API_KEY')
)

# Create a FAISS vector store from the text chunks and embeddings model
vector_store = FAISS.from_texts(text_chunks, embeddings_model)

print("FAISS vector store created successfully.")
print(f"Vector store contains {vector_store.index.ntotal} vectors.")

FAISS vector store created successfully.
Vector store contains 50 vectors.


## Implement Retrieval Function

### Subtask:
Develop a function that takes a user query, generates its embedding, and then queries the vector store to retrieve the most semantically similar text chunks from the document.


**Reasoning**:
I will define the `retrieve_documents` function as instructed, which will use the `vector_store.similarity_search` method to find relevant text chunks. Then, I will call this function with a sample query and print the results to demonstrate its functionality.



In [3]:
def retrieve_documents(query, k=5):
    """
    Retrieves the most semantically similar text chunks from the vector store
    based on a user query.

    Args:
        query (str): The user's query string.
        k (int): The number of top similar documents to retrieve.

    Returns:
        list: A list of Document objects representing the retrieved text chunks.
    """
    # The similarity_search method internally uses the embeddings_model
    # to embed the query and compare it to the stored document embeddings.
    retrieved_docs = vector_store.similarity_search(query, k=k)
    return retrieved_docs

# Call the function with a sample query
sample_query = "What is the Transformer model?"
retrieved_documents = retrieve_documents(sample_query, k=3)

print(f"\nRetrieved documents for query: '{sample_query}'")
for i, doc in enumerate(retrieved_documents):
    print(f"--- Document {i+1} ---")
    print(doc.page_content[:500] + "...") # Print first 500 characters of each document


Retrieved documents for query: 'What is the Transformer model?'
--- Document 1 ---
aligned RNNs or convolution. In the following sections, we will describe the Transformer, motivate
self-attention and discuss its advantages over models such as [17, 18] and [9].
3 Model Architecture
Most competitive neural sequence transduction models have an encoder-decoder structure [ 5,2,35].
Here, the encoder maps an input sequence of symbol representations (x1, ..., x n)to a sequence
of continuous representations z= (z1, ..., z n). Given z, the decoder then generates an output
sequence (y1...
--- Document 2 ---
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experiments on two machine translation tasks show these models to
be superior in quality while being more parallelizable and requiring significantly
less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-
to-G

## Integrate with Language Model

### Subtask:
Set up a Large Language Model (LLM) that can take the user's query and the retrieved document chunks as context to generate an informed answer.


**Reasoning**:
The first step to setting up the LLM is to import the necessary class, `ChatGoogleGenerativeAI`, from `langchain_google_genai` as specified in the instructions. This will allow us to initialize the generative AI model.



In [4]:
from langchain_google_genai import ChatGoogleGenerativeAI
import os

# Initialize the LLM with the 'gemini-pro' model
# Assuming GOOGLE_API_KEY is already set as an environment variable
llm = ChatGoogleGenerativeAI(model="gemini-pro", api_key=os.environ.get('GOOGLE_API_KEY'))

print("ChatGoogleGenerativeAI model (gemini-pro) initialized.")

ChatGoogleGenerativeAI model (gemini-pro) initialized.


## Demonstrate RAG System

### Subtask:
Provide a demonstration by posing a sample question to the implemented RAG system and displaying the generated answer based on the document content.


**Reasoning**:
I need to import `PromptTemplate` and `RunnablePassthrough`, define a prompt template, create a `PromptTemplate` object, define the retrieval chain using the previously created `retrieve_documents` function and format the retrieved documents, then combine these components with the initialized `llm` into a RAG chain, and finally invoke this chain with a sample question to demonstrate its functionality.



In [5]:
pip install langchain-prompts

[31mERROR: Could not find a version that satisfies the requirement langchain-prompts (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for langchain-prompts[0m[31m
[0m

**Reasoning**:
The previous installation of `langchain-prompts` failed, indicating it's not a standalone package. The `PromptTemplate` class is likely found within `langchain_core.prompts`. I will update the import statement to reflect this and re-execute the code to define the RAG chain.



In [28]:
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

# 2. Define a prompt template string
# The prompt should instruct the LLM to answer the question based *only* on the provided context.
rag_prompt_template = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:"""

# 3. Create a PromptTemplate object
rag_prompt = PromptTemplate.from_template(rag_prompt_template)

# 4. Define a retrieval chain
# This chain should first retrieve relevant documents using the retrieve_documents function.
# The retrieved Document objects need to be formatted into a single string for the prompt's context variable.
# We'll use a simple lambda function and '\n\n'.join([doc.page_content for doc in retrieved_docs]) to achieve this.

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

retrieval_chain = (
    RunnablePassthrough.assign(
        context=lambda x: format_docs(retrieve_documents(x["question"]))
    )
)

# 5. Combine the retrieval chain, the prompt, and the llm into a final RAG chain
rag_chain = (
    retrieval_chain
    | rag_prompt
    | llm
)

# 6. Invoke the RAG chain with a sample question and print the generated answer.
sample_question = "What is the main innovation of the Transformer model?"
response = rag_chain.invoke({"question": sample_question})

print(f"Question: {sample_question}")
print(f"Answer: {response.content[0]['text']}") # Extract only the text content

sample_question_2 = "How does the Transformer model handle long-range dependencies?"
response_2 = rag_chain.invoke({"question": sample_question_2})

print(f"Question: {sample_question_2}")
print(f"Answer: {response_2.content[0]['text']}") # Extract only the text content

sample_question_3 = "What are the advantages of using self-attention in the Transformer?"
response_3 = rag_chain.invoke({"question": sample_question_3})

print(f"Question: {sample_question_3}")
print(f"Answer: {response_3.content[0]['text']}") # Extract only the text content

Question: What is the main innovation of the Transformer model?
Answer: The Transformer is the first sequence transduction model based entirely on attention mechanisms, specifically replacing recurrent layers and convolutions with multi-headed self-attention. By eschewing recurrence, the architecture allows for significantly more parallelization and faster training times compared to previous models. This design enables the model to draw global dependencies between input and output without relying on sequence-aligned RNNs.
Question: How does the Transformer model handle long-range dependencies?
Answer: The Transformer handles long-range dependencies by relying entirely on an attention mechanism that connects arbitrary input or output positions with a constant number of operations. This approach allows the model to draw global dependencies regardless of the distance between positions in the sequence. Additionally, Multi-Head Attention is used to counteract the reduced effective resolutio

### Create `streamlit_app.py`

Now, let's create a file named `streamlit_app.py` with the following content. This script will contain all the necessary RAG components and the Streamlit UI.

**Note**: You'll need to save this content to a file named `streamlit_app.py` in your Colab environment. You can do this by clicking on the folder icon on the left panel -> then the file icon to create a new file -> name it `streamlit_app.py` -> copy and paste the code below into it, and save.

In [11]:
pip install streamlit

Collecting streamlit
  Downloading streamlit-1.54.0-py3-none-any.whl.metadata (9.8 kB)
Collecting cachetools<7,>=5.5 (from streamlit)
  Downloading cachetools-6.2.6-py3-none-any.whl.metadata (5.6 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading streamlit-1.54.0-py3-none-any.whl (9.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.1/9.1 MB[0m [31m93.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading cachetools-6.2.6-py3-none-any.whl (11 kB)
Downloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m88.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: cachetools, pydeck, streamlit
  Attempting uninstall: cachetools
    Found existing installation: cachetools 7.0.0
    Uninstalling cachetools-7.0.0:
      Successfully uninstalled cachetools-7.0.0
Successfully installed cachetools-6.2.6 pydeck-0

In [40]:
%%writefile streamlit_app.py

import streamlit as st
import os
import PyPDF2
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

# --- Configuration --- #
pdf_path = '/content/attention-is-all-you-need.pdf'

# IMPORTANT: Replace 'AIzaSyBOwAkK1fXvQfiEWJy5hN63xiwm3aOXBQk' with your actual, valid Google API Key
os.environ['GOOGLE_API_KEY'] = 'AIzaSyBOwAkK1fXvQfiEWJy5hN63xiwm3aOXBQk'

# --- RAG System Setup (Self-contained for Streamlit app) --- #

@st.cache_resource
def load_and_process_document(path):
    document_text = ''
    with open(path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        for page_num in range(len(reader.pages)):
            page = reader.pages[page_num]
            document_text += page.extract_text()
    return document_text

@st.cache_resource
def chunk_document_text(text):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
        is_separator_regex=False,
    )
    text_chunks = text_splitter.split_text(text)
    return text_chunks

@st.cache_resource
def get_embeddings_model():
    return GoogleGenerativeAIEmbeddings(
        model="models/gemini-embedding-001",
        api_key=os.environ.get('GOOGLE_API_KEY')
    )

@st.cache_resource
def create_vector_store(chunks, _embeddings_model):
    return FAISS.from_texts(chunks, _embeddings_model)

@st.cache_resource
def get_llm():
    return ChatGoogleGenerativeAI(model="gemini-pro-latest", api_key=os.environ.get('GOOGLE_API_KEY')) # Changed model to gemini-pro-latest

# Load and process
document_text = load_and_process_document(pdf_path)
text_chunks = chunk_document_text(document_text)
embeddings_model = get_embeddings_model()
vector_store = create_vector_store(text_chunks, embeddings_model)
llm = get_llm()

def retrieve_documents(query, k=5):
    return vector_store.similarity_search(query, k=k)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Define the RAG chain
rag_prompt_template = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:"""
rag_prompt = PromptTemplate.from_template(rag_prompt_template)

retrieval_chain = (
    RunnablePassthrough.assign(
        context=lambda x: format_docs(retrieve_documents(x["question"]))
    )
)

rag_chain = (
    retrieval_chain
    | rag_prompt
    | llm
)

# --- Streamlit UI --- #
st.title("RAG System for PDF Document")
st.write("Ask a question about the 'Attention Is All You Need' PDF document.")

query_input = st.text_input("Your Question:", "What is the main innovation of the Transformer model?")

if st.button("Get Answer"):
    if query_input:
        with st.spinner("Retrieving and generating answer..."):
            response = rag_chain.invoke({"question": query_input})
            st.subheader("Answer:")
            st.write(response.content[0]['text']) # Extract only the text content
    else:
        st.warning("Please enter a question.")

Overwriting streamlit_app.py


### Run the Streamlit App

To run the Streamlit app, you'll need to expose it using `ngrok` since Colab notebooks are not directly accessible from the outside. Run the following commands:

In [42]:
import sys
import asyncio
!{sys.executable} -m pip install ngrok nest_asyncio

import nest_asyncio
nest_asyncio.apply()

from ngrok import ngrok

# Terminate any existing ngrok tunnels
ngrok.kill()

# Run ngrok.connect in an async manner to get the URL
async def get_ngrok_url():
    # Make sure to set your ngrok auth token if not already configured
    ngrok.set_auth_token("39lGMVjDcEv5pEXDxzUblocKqiR_zEUAzU1rbnHHXrsEXU6P") # Uncomment and replace if you face issues
    tunnel = await ngrok.connect(8501)
    return tunnel.url()

# Get the event loop and run the async function
loop = asyncio.get_event_loop()
public_url = loop.run_until_complete(get_ngrok_url())

print(f"Streamlit App URL: {public_url}")

# Run the Streamlit app in the background
!streamlit run streamlit_app.py &>/dev/null&

Streamlit App URL: https://premandibular-pokable-blanca.ngrok-free.dev
