# Can you read my screen?

Parts of any RAG pipeline?

1. Load docs (pdfs, txts, markdown, html)
2. Ingest data into the LLM (either directly or via something like embeddings and chunking)
3. Retrieving context + answering questions

In [1]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.chat_models import init_chat_model
from langchain.document_loaders import PyPDFLoader

In [8]:
prompt_template = """
You answer questions about papers. Here is a paper:
{paper_contents}.
Query asked: {query}
Your answer:

"""
prompt = ChatPromptTemplate.from_template(prompt_template)

In [3]:
model = init_chat_model(model="gpt-4o", temperature=0)

In [5]:
paper_path = "./assets-resources/attention-paper.pdf"
docs = PyPDFLoader(paper_path).load_and_split()

In [6]:
docs

[Document(metadata={'source': './assets-resources/attention-paper.pdf', 'page': 0}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.comNoam Shazeer∗\nGoogle Brain\nnoam@google.comNiki Parmar∗\nGoogle Research\nnikip@google.comJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.comAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.eduŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗ ‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network archite

In [9]:
from langchain_core.output_parsers import StrOutputParser
chain_rag_basic = prompt | model | StrOutputParser()

In [10]:
paper_contents = " ".join([doc.page_content for doc in docs])
paper_contents

'Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.comNoam Shazeer∗\nGoogle Brain\nnoam@google.comNiki Parmar∗\nGoogle Research\nnikip@google.comJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.comAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.eduŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗ ‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attention mechanisms, dispensing with recurrence and con

In [11]:
query = "What is this paper about?"
chain_rag_basic.invoke({"paper_contents": paper_contents, "query":query})

'The paper "Attention Is All You Need" introduces a novel neural network architecture called the Transformer, which is designed for sequence transduction tasks such as machine translation. Unlike traditional models that rely on recurrent or convolutional neural networks, the Transformer is based entirely on attention mechanisms, eliminating the need for recurrence and convolutions. This architecture allows for greater parallelization and faster training times.\n\nThe paper demonstrates the effectiveness of the Transformer on machine translation tasks, achieving state-of-the-art results on the WMT 2014 English-to-German and English-to-French translation tasks. The Transformer model outperforms previous models, including ensembles, in terms of translation quality while requiring significantly less training time.\n\nThe authors also explore the generalization capabilities of the Transformer by applying it to English constituency parsing, where it performs competitively with existing model

In [12]:
def ask_paper(paper_contents: str, query: str):
    return chain_rag_basic.invoke({"paper_contents": paper_contents, "query":query})

query = "How does self-attention works?"
ask_paper(paper_contents, query)

'Self-attention, also known as intra-attention, is a mechanism that allows a model to weigh the importance of different elements within a single sequence when computing a representation of that sequence. It is a key component of the Transformer architecture, which is used for tasks like machine translation and other sequence transduction problems.\n\nHere\'s how self-attention works:\n\n1. **Input Representation**: Each element in the input sequence is represented as a vector. For example, in a sentence, each word is represented as a vector.\n\n2. **Query, Key, and Value Vectors**: For each input element, three vectors are computed: a query vector, a key vector, and a value vector. These vectors are obtained by multiplying the input vector by three different learned weight matrices.\n\n3. **Attention Scores**: The attention score for a pair of elements is computed by taking the dot product of the query vector of one element with the key vector of another element. This score indicates h

In [13]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

In [14]:
embeddings.embed_query("Lucas is a silly bald instructor")

[-0.01610441878437996,
 -0.0035802386701107025,
 -0.0042423829436302185,
 -0.024755554273724556,
 -0.015319162048399448,
 -0.0013800222659483552,
 -0.02160121686756611,
 -0.014094694517552853,
 0.002453927416354418,
 -0.009755817241966724,
 0.024010224267840385,
 0.009163547307252884,
 0.011772197671234608,
 0.012457633391022682,
 0.019365230575203896,
 -0.019591491669416428,
 0.039262838661670685,
 -0.009136929176747799,
 0.03460453823208809,
 -0.02616635337471962,
 0.0010273221414536238,
 0.003952902741730213,
 -0.03119732066988945,
 -0.005077550187706947,
 0.008544659242033958,
 -0.005829533562064171,
 0.026672111824154854,
 -0.024941885843873024,
 0.0177281703799963,
 -0.014866641722619534,
 0.030159184709191322,
 0.0009973759297281504,
 0.016037872061133385,
 -0.012763750739395618,
 -0.023477846756577492,
 -0.03795851394534111,
 0.010561038739979267,
 -0.009156893007457256,
 -0.0005020153475925326,
 -0.0016162648098543286,
 0.01695622317492962,
 -0.013209616765379906,
 -0.01115996

In [15]:
sentence1 = "Lucas is teaching about LLMs"
sentence2 = "A Whale likes to dance"
sentence3 = "A lion likes to sing"

embedding1 = embeddings.embed_query(sentence1)
embedding2 = embeddings.embed_query(sentence2)
embedding3 = embeddings.embed_query(sentence3)

In [16]:
import numpy as np

def cosine_similarity(embedding1, embedding2):
    """
    Calculate cosine similarity between two embeddings.
    
    Args:
        embedding1: First embedding vector
        embedding2: Second embedding vector
        
    Returns:
        float: Cosine similarity score between -1 and 1
    """
    # Convert to numpy arrays if they aren't already
    vec1 = np.array(embedding1)
    vec2 = np.array(embedding2)
    
    # Calculate dot product
    dot_product = np.dot(vec1, vec2)
    
    # Calculate magnitudes
    norm1 = np.linalg.norm(vec1)
    norm2 = np.linalg.norm(vec2)
    
    # Calculate cosine similarity
    similarity = dot_product / (norm1 * norm2)
    
    return similarity

# Example usage with the embeddings from previous cell
similarity_1_2 = cosine_similarity(embedding1, embedding2)
similarity_1_3 = cosine_similarity(embedding1, embedding3)
similarity_2_3 = cosine_similarity(embedding2, embedding3)

print(f"Similarity between sentence 1 and 2: {similarity_1_2:.4f}")
print(f"Similarity between sentence 1 and 3: {similarity_1_3:.4f}")
print(f"Similarity between sentence 2 and 3: {similarity_2_3:.4f}")


Similarity between sentence 1 and 2: 0.7208
Similarity between sentence 1 and 3: 0.7533
Similarity between sentence 2 and 3: 0.8584
