In [1]:
# RAG Simple - Processing the Paper "Attention Is All You Need"
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
import os

load_dotenv()

True

In [2]:
llm = init_chat_model(model="gpt-4o-mini", temperature=0)
response = llm.invoke("What is the Transformer architecture?")

response.pretty_print()


The Transformer architecture is a type of neural network architecture that was introduced in the paper "Attention is All You Need" by Vaswani et al. in 2017. It has become a foundational model for various natural language processing (NLP) tasks and has also been adapted for other domains such as computer vision and audio processing. The key features of the Transformer architecture include:

### 1. **Self-Attention Mechanism:**
   - The core innovation of the Transformer is the self-attention mechanism, which allows the model to weigh the importance of different words in a sentence relative to each other. This enables the model to capture contextual relationships without relying on sequential processing, as seen in recurrent neural networks (RNNs).

### 2. **Positional Encoding:**
   - Since Transformers do not have a built-in notion of sequence order (unlike RNNs), they use positional encodings to inject information about the position of each token in the input sequence. This allows t

# **Build RAG** 

In [3]:
# Setup RAG
cache = "../faiss_cache/transformer_paper"

# Cargar y procesar PDF
docs = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200).split_documents(
    PyPDFLoader("../pdfs/Paper.pdf").load()
)

# Vector store con cache
if os.path.exists(cache + ".faiss"):
    vectorstore = FAISS.load_local(cache, OpenAIEmbeddings(), allow_dangerous_deserialization=True)
else:
    vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())
    vectorstore.save_local(cache)


In [4]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

prompt = ChatPromptTemplate.from_template(
    "Bas√°ndote en el paper 'Attention Is All You Need':\n\nContexto: {context}\n\nPregunta: {question}\n\nRespuesta:"
)

rag = (
    {"context": retriever | (lambda docs: "\n\n".join(d.page_content for d in docs)), "question": RunnablePassthrough()}
    | prompt | llm | StrOutputParser()
)

print("‚úÖ RAG listo")

‚úÖ RAG listo


## üîç **Make questions**

In [5]:
# Preguntar directamente
rag.invoke("What is the Transformer architecture?")


'The Transformer architecture is a novel model design introduced in the paper "Attention Is All You Need," which relies entirely on attention mechanisms rather than recurrence or convolutions. This architecture is particularly effective for tasks such as machine translation. \n\nKey components of the Transformer architecture include:\n\n1. **Encoder-Decoder Structure**: The Transformer consists of an encoder and a decoder, each made up of a stack of identical layers. The encoder processes the input data, while the decoder generates the output.\n\n2. **Stacked Layers**: The encoder is composed of N = 6 identical layers, each containing two main sub-layers:\n   - A multi-head self-attention mechanism that allows the model to focus on different parts of the input sequence simultaneously.\n   - A position-wise fully connected feed-forward network that processes the output of the attention mechanism.\n\n3. **Residual Connections and Layer Normalization**: Each sub-layer in the encoder and d

In [6]:
# O con funci√≥n helper
def ask(question, show_sources=False):
    result = rag.invoke(question)
    print(f"üí° {result}\n")
    if show_sources:
        docs = retriever.invoke(question)
        print(f"üìö Fuentes ({len(docs)} docs):")
        for i, doc in enumerate(docs, 1):
            print(f"  {i}. P√°g {doc.metadata.get('page', '?')}")
    return result

ask("What is multi-head attention?")


üí° Multi-head attention is a mechanism used in the Transformer model that allows the model to focus on different parts of the input sequence simultaneously by employing multiple attention heads. Each attention head operates independently and learns to attend to different representation subspaces of the input data. 

In multi-head attention, the input consists of queries (Q), keys (K), and values (V). Each attention head computes its own set of projections for these inputs using learned parameter matrices. Specifically, for each head \(i\), the queries, keys, and values are projected into lower-dimensional spaces defined by \(d_k\) and \(d_v\). The outputs of all heads are then concatenated and projected back to the original dimensionality \(d_{model}\) using another learned parameter matrix \(W_O\).

The formula for multi-head attention can be expressed as:

\[
\text{MultiHead}(Q, K, V) = \text{Concat}(\text{head}_1, \ldots, \text{head}_h) W_O
\]

where each head is computed as:

\[


'Multi-head attention is a mechanism used in the Transformer model that allows the model to focus on different parts of the input sequence simultaneously by employing multiple attention heads. Each attention head operates independently and learns to attend to different representation subspaces of the input data. \n\nIn multi-head attention, the input consists of queries (Q), keys (K), and values (V). Each attention head computes its own set of projections for these inputs using learned parameter matrices. Specifically, for each head \\(i\\), the queries, keys, and values are projected into lower-dimensional spaces defined by \\(d_k\\) and \\(d_v\\). The outputs of all heads are then concatenated and projected back to the original dimensionality \\(d_{model}\\) using another learned parameter matrix \\(W_O\\).\n\nThe formula for multi-head attention can be expressed as:\n\n\\[\n\\text{MultiHead}(Q, K, V) = \\text{Concat}(\\text{head}_1, \\ldots, \\text{head}_h) W_O\n\\]\n\nwhere each he