In [26]:
# !pip install langchain
# !pip install langchain-openai
# !pip install pypdf
# !pip install chromadb
# !pip install langchainhub

In [27]:
import os

# # Set OPENAI API Key

# os.environ["OPENAI_API_KEY"] = "your openai key"

# OR (load from .env file)

# from dotenv import load_dotenv
# make sure you have python-dotenv installed
# load_dotenv("./.env")

Let's set up a study workflow using Jupyter Notebooks, LLMs, and langchain.

In [28]:
import os
from langchain.document_loaders import PyPDFLoader
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

In [29]:
pdf_path = "./assets-resources/attention-paper.pdf"

loader = PyPDFLoader(pdf_path) # LOAD
pdf_docs = loader.load_and_split() # SPLIT
embeddings = OpenAIEmbeddings() # EMBED
vectordb = Chroma.from_documents(pdf_docs, embedding=embeddings) # STORE
retriever = vectordb.as_retriever()
llm = ChatOpenAI(model="gpt-4o")
pdf_qa = RetrievalQA.from_llm(llm=llm, retriever=retriever) # RETRIEVE
pdf_qa

RetrievalQA(combine_documents_chain=StuffDocumentsChain(llm_chain=LLMChain(prompt=ChatPromptTemplate(input_variables=['context', 'question'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], template="Use the following pieces of context to answer the user's question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\n{context}")), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], template='{question}'))]), llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x3306e3190>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x334e15690>, model_name='gpt-4o', openai_api_key=SecretStr('**********'), openai_api_base='https://api.openai.com/v1', openai_proxy='')), document_prompt=PromptTemplate(input_variables=['page_content'], template='Context:\n{page_content}'), document_variable_name='context'), retriever=Vect

In [30]:
query = "What are the key components of the transformer architecture?"
result = pdf_qa.invoke({"query": query, "chat_history": []})

In [31]:
print(result["result"])

The key components of the transformer architecture are:

1. **Encoder and Decoder Stacks**:
   - **Encoder**: The encoder is composed of a stack of 6 identical layers, each with two sub-layers:
     - Multi-head self-attention mechanism
     - Position-wise fully connected feed-forward network
     - Residual connections around each of the sub-layers followed by layer normalization
   - **Decoder**: The decoder also has a stack of 6 identical layers, with three sub-layers:
     - Multi-head self-attention mechanism
     - Position-wise fully connected feed-forward network
     - Multi-head attention over the encoder stack output
     - Similar residual connections and layer normalization as in the encoder
     - Modified self-attention to prevent positions from attending to subsequent positions

2. **Attention Mechanisms**:
   - **Self-Attention**: Relates different positions of a single sequence to compute a representation of the sequence.
   - **Multi-Head Attention**: Improves the m

In [32]:
def ask_pdf(pdf_qa,query):
    print("QUERY: ",query)
    result = pdf_qa.invoke({"query": query, "chat_history": []})
    answer = result["result"]
    print("ANSWER", answer)
    return answer


ask_pdf(pdf_qa,"How does the self-attention mechanism in transformers differ from traditional sequence alignment methods?")

QUERY:  How does the self-attention mechanism in transformers differ from traditional sequence alignment methods?
ANSWER The self-attention mechanism in transformers differs from traditional sequence alignment methods, such as those used in recurrent neural networks (RNNs) and convolutional neural networks (CNNs), in several key ways:

1. **Parallelization**:
   - **Traditional Sequence Alignment (RNNs)**: These models process sequences sequentially, where the computation at each position \( t \) depends on the previous position \( t-1 \). This inherently sequential nature limits parallelization during training and inference.
   - **Self-Attention (Transformers)**: Self-attention mechanisms allow for parallel computation of representations at all positions in the input sequence. This enables significant parallelization, making the training process much faster.

2. **Dependency Modeling**:
   - **Traditional Sequence Alignment (RNNs)**: Dependencies between distant positions in the sequ

'The self-attention mechanism in transformers differs from traditional sequence alignment methods, such as those used in recurrent neural networks (RNNs) and convolutional neural networks (CNNs), in several key ways:\n\n1. **Parallelization**:\n   - **Traditional Sequence Alignment (RNNs)**: These models process sequences sequentially, where the computation at each position \\( t \\) depends on the previous position \\( t-1 \\). This inherently sequential nature limits parallelization during training and inference.\n   - **Self-Attention (Transformers)**: Self-attention mechanisms allow for parallel computation of representations at all positions in the input sequence. This enables significant parallelization, making the training process much faster.\n\n2. **Dependency Modeling**:\n   - **Traditional Sequence Alignment (RNNs)**: Dependencies between distant positions in the sequence are typically harder to learn because each position depends on the prior positions linearly through time

In [33]:
quiz_questions = ask_pdf(pdf_qa, "Quiz me with 3 simple questions on the positional encodings and the role they play in transformers.")

quiz_questions

QUERY:  Quiz me with 3 simple questions on the positional encodings and the role they play in transformers.
ANSWER Sure! Here are three simple questions about positional encodings and their role in transformers:

1. **What is the primary purpose of positional encodings in the Transformer model?**

2. **What type of positional encoding does the base Transformer model use?**

3. **What was observed when sinusoidal positional encoding was replaced with learned positional embeddings in the transformer model, according to the provided context?**

Feel free to answer these questions, and I can provide feedback if you'd like!


"Sure! Here are three simple questions about positional encodings and their role in transformers:\n\n1. **What is the primary purpose of positional encodings in the Transformer model?**\n\n2. **What type of positional encoding does the base Transformer model use?**\n\n3. **What was observed when sinusoidal positional encoding was replaced with learned positional embeddings in the transformer model, according to the provided context?**\n\nFeel free to answer these questions, and I can provide feedback if you'd like!"

In [34]:
llm = ChatOpenAI(model="gpt-4o", temperature=0.0)

In [35]:
from langchain_core.prompts.chat import SystemMessagePromptTemplate, HumanMessagePromptTemplate

template = f"You take in text and spit out Python code doing what the user wants"
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_message_prompt = HumanMessagePromptTemplate.from_template("Return ONLY a PYTHON list containing the questions in this text: {questions}")

In [36]:
from langchain_core.prompts import ChatPromptTemplate

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt,human_message_prompt])

In [37]:
quiz_chain = chat_prompt | llm

In [38]:
quiz_chain.invoke({"questions": quiz_questions})

AIMessage(content='```python\nquestions = [\n    "What is the primary purpose of positional encodings in the Transformer model?",\n    "What type of positional encoding does the base Transformer model use?",\n    "What was observed when sinusoidal positional encoding was replaced with learned positional embeddings in the transformer model, according to the provided context?"\n]\n```', response_metadata={'token_usage': {'completion_tokens': 65, 'prompt_tokens': 134, 'total_tokens': 199}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_d576307f90', 'finish_reason': 'stop', 'logprobs': None}, id='run-7bcef4d0-40bb-4f23-9b6f-378fb93f5785-0', usage_metadata={'input_tokens': 134, 'output_tokens': 65, 'total_tokens': 199})

In [39]:
import re

def extract_python_code(markdown_string):
    pattern = r'```python\n(.*?)\n```'
    matches = re.findall(pattern, markdown_string, re.DOTALL)

    if matches:
        python_code = matches[0]
        return python_code
    else:
        return None

In [40]:
from langchain_core.runnables import RunnableLambda


quiz_chain = chat_prompt | llm | RunnableLambda(lambda x: x.content) | extract_python_code

Disclaimer: We haven't discussed runnable at length, but essentially they make up the core of the LCEL interface. 

`RunnableLambda` allows you to take in an output from part of the chain and pass it along after performing some transformation defined withint its lambda function.

In [41]:
questions_list = quiz_chain.invoke({"questions": quiz_questions})

In [42]:
questions_list

'questions = [\n    "What is the primary purpose of positional encodings in the Transformer model?",\n    "What type of positional encoding does the base Transformer model use?",\n    "What was observed when sinusoidal positional encoding was replaced with learned positional embeddings in the transformer model, according to the provided context?"\n]'

In [43]:
exec(questions_list)

In [44]:
questions

['What is the primary purpose of positional encodings in the Transformer model?',
 'What type of positional encoding does the base Transformer model use?',
 'What was observed when sinusoidal positional encoding was replaced with learned positional embeddings in the transformer model, according to the provided context?']

In [45]:
# the questions variable was created within the string inside the `questions_list` variable.
answers = []
for q in questions:
    answers.append(ask_pdf(pdf_qa,q))

QUERY:  What is the primary purpose of positional encodings in the Transformer model?
ANSWER The primary purpose of positional encodings in the Transformer model is to provide information about the position of each token in the input sequence. Since the Transformer model eschews recurrence and convolution mechanisms, which naturally handle sequential data by design, it lacks inherent positional information. Positional encodings are added to the input embeddings to ensure that the model can take the order of the sequence into account, allowing it to learn and utilize the relative positions of the tokens effectively.
QUERY:  What type of positional encoding does the base Transformer model use?
ANSWER The base Transformer model uses sinusoidal positional encoding.
QUERY:  What was observed when sinusoidal positional encoding was replaced with learned positional embeddings in the transformer model, according to the provided context?
ANSWER When sinusoidal positional encoding was replaced w

In [46]:
evaluations = []

for q,a in zip(questions, answers):
    # Check for results
    evaluations.append(ask_pdf(pdf_qa,f"Is this: {a} the correct answer to this question: {q} according to the paper? Return ONLY '''YES''' or '''NO'''. Output:"))

evaluations

QUERY:  Is this: The primary purpose of positional encodings in the Transformer model is to provide information about the position of each token in the input sequence. Since the Transformer model eschews recurrence and convolution mechanisms, which naturally handle sequential data by design, it lacks inherent positional information. Positional encodings are added to the input embeddings to ensure that the model can take the order of the sequence into account, allowing it to learn and utilize the relative positions of the tokens effectively. the correct answer to this question: What is the primary purpose of positional encodings in the Transformer model? according to the paper? Return ONLY '''YES''' or '''NO'''. Output:
ANSWER YES
QUERY:  Is this: The base Transformer model uses sinusoidal positional encoding. the correct answer to this question: What type of positional encoding does the base Transformer model use? according to the paper? Return ONLY '''YES''' or '''NO'''. Output:
ANSWE

['YES', 'YES', 'YES']

In [47]:
scores = []

yes_count = evaluations.count('YES')
scores.append(yes_count)
print(scores)

[3]
