In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

In [2]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode
from typing_extensions import TypedDict

In [3]:
# Define state for LangGraph
class State(TypedDict):
    question: str
    answer: str

In [4]:
pdf_path = "data/attention-is-all-you-need-Paper.pdf" 

loader = PyPDFLoader(pdf_path)
docs = loader.load()

print(f"Loaded {len(docs)} documents")
print("First document content preview:\n", docs[0].page_content[:300])


Loaded 11 documents
First document content preview:
 Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.com
Noam Shazeer∗
Google Brain
noam@google.com
Niki Parmar∗
Google Research
nikip@google.com
Jakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.com
Aidan N. Gomez∗†
University of Toronto
aid


In [5]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectordb = FAISS.from_documents(docs, embeddings)

# Save for reuse
vectordb.save_local("faiss_index")

print("Vector store created and saved.")


Vector store created and saved.


In [6]:
vectordb = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)

# Quick retrieval check
results = vectordb.similarity_search("What is the document about?", k=2)
for i, res in enumerate(results, 1):
    print(f"Result {i}:\n", res.page_content[:200], "\n")


Result 1:
 Table 3: Variations on the Transformer architecture. Unlisted values are identical to those of the base
model. All metrics are on the English-to-German translation development set, newstest2013. Liste 

Result 2:
 References
[1] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint
arXiv:1607.06450, 2016.
[2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine 



In [7]:
retriever = vectordb.as_retriever()

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

# Quick test
query = "Summarize the document in two sentences."
response = qa_chain({"query": query})

print("Answer:\n", response["result"])
print("\nSources:\n", [doc.metadata for doc in response["source_documents"]])


  response = qa_chain({"query": query})


Answer:
 The document presents the Transformer, a novel neural network architecture that relies solely on attention mechanisms, eliminating the need for recurrent or convolutional layers. It demonstrates superior performance in machine translation tasks, achieving state-of-the-art results on the WMT 2014 English-to-German and English-to-French translation benchmarks while being more efficient in training time.

Sources:
 [{'producer': 'PyPDF2', 'creator': 'PyPDF', 'creationdate': '', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'publisher': 'Curran Associates, Inc.', 'language': 'en-US', 'created': '2017', 'eventtype': 'Poster', 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan 

In [8]:
class State(TypedDict):
    question: str
    answer: str

def rag_node(state: State) -> State:
    result = qa_chain({"query": state["question"]})
    return {"question": state["question"], "answer": result["result"]}

graph = StateGraph(State)
graph.add_node("rag", rag_node)
graph.add_edge(START, "rag")
graph.add_edge("rag", END)

app = graph.compile()


In [9]:
state = {"question": "What are the main points discussed in the document?", "answer": ""}
result = app.invoke(state)

print("Question:", result["question"])
print("Answer:", result["answer"])


Question: What are the main points discussed in the document?
Answer: The document discusses the Transformer model, a novel architecture for sequence transduction that relies entirely on attention mechanisms instead of recurrent or convolutional layers. Here are the main points:

1. **Model Architecture**: The Transformer consists of an encoder-decoder structure with stacked self-attention and fully connected layers. It allows for significant parallelization during training.

2. **Performance**: The Transformer achieves state-of-the-art BLEU scores on English-to-German and English-to-French translation tasks, outperforming previous models while requiring less training time and cost.

3. **Training Details**: The models were trained on the WMT 2014 datasets, with specific configurations for hyperparameters, dropout rates, and optimization techniques. The training utilized 8 NVIDIA P100 GPUs.

4. **Regularization Techniques**: The document describes the use of label smoothing and dropout