In [None]:
# !pip install langchain
# !pip install langchain-openai
# !pip install pypdf
# !pip install chromadb
# !pip install langchainhub

In [None]:
import os

# # Set OPENAI API Key

# os.environ["OPENAI_API_KEY"] = "your openai key"

# OR (load from .env file)

# from dotenv import load_dotenv
# make sure you have python-dotenv installed
# load_dotenv("./.env")

Let's set up a study workflow using Jupyter Notebooks, LLMs, and langchain.

In [2]:
import os
from langchain.document_loaders import PyPDFLoader
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

In [3]:
pdf_path = "./assets-resources/attention-paper.pdf"

loader = PyPDFLoader(pdf_path) # LOAD
pdf_docs = loader.load_and_split() # SPLIT
embeddings = OpenAIEmbeddings() # EMBED
vectordb = Chroma.from_documents(pdf_docs, embedding=embeddings) # STORE
retriever = vectordb.as_retriever()
llm = ChatOpenAI(model="gpt-4-1106-preview")
pdf_qa = RetrievalQA.from_llm(llm=llm, retriever=retriever) # RETRIEVE
pdf_qa

RetrievalQA(combine_documents_chain=StuffDocumentsChain(llm_chain=LLMChain(prompt=ChatPromptTemplate(input_variables=['context', 'question'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], template="Use the following pieces of context to answer the user's question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\n{context}")), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], template='{question}'))]), llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x15343d890>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x1536dbe50>, model_name='gpt-4-1106-preview', openai_api_key=SecretStr('**********'), openai_api_base='https://api.openai.com/v1', openai_proxy='')), document_prompt=PromptTemplate(input_variables=['page_content'], template='Context:\n{page_content}'), document_variable_name='context'), re

In [4]:
query = "What are the key components of the transformer architecture?"
result = pdf_qa.invoke({"query": query, "chat_history": []})

In [5]:
print(result["result"])

The Transformer architecture, as described in the provided context, consists of several key components:

1. **Encoder and Decoder Stacks**: Both the encoder and decoder are composed of a stack of six identical layers. 

   - **Encoder**: Each encoder layer has two sub-layers. The first is a multi-head self-attention mechanism, and the second is a position-wise fully connected feed-forward network. Each of these sub-layers is followed by a residual connection and layer normalization. 

   - **Decoder**: Each decoder layer also has two sub-layers similar to the encoder, plus a third sub-layer that performs multi-head attention over the encoder's output. The self-attention mechanism in the decoder is modified to prevent positions from attending to subsequent positions (also known as masked self-attention).

2. **Attention**: The attention function in the Transformer is a mapping of a query and a set of key-value pairs to an output, with all components being vectors. The output is computed

In [6]:
def ask_pdf(pdf_qa,query):
    print("QUERY: ",query)
    result = pdf_qa.invoke({"query": query, "chat_history": []})
    answer = result["result"]
    print("ANSWER", answer)
    return answer


ask_pdf(pdf_qa,"How does the self-attention mechanism in transformers differ from traditional sequence alignment methods?")

QUERY:  How does the self-attention mechanism in transformers differ from traditional sequence alignment methods?
ANSWER The self-attention mechanism in transformers differs from traditional sequence alignment methods in the following ways:

1. Global Dependency Modeling: Self-attention allows the model to directly compute dependencies between any two positions in the sequence, regardless of their distance. Traditional sequence alignment methods like those in RNNs and CNNs process the sequence step-by-step or in local receptive fields, which can make it harder to capture long-range dependencies.

2. Parallelization: Self-attention mechanisms enable parallel computation across all positions in a sequence because they do not require sequential processing. This is in contrast to RNNs, which process elements sequentially and therefore cannot be parallelized across the steps of a sequence.

3. Fixed Number of Operations: The Transformer reduces the number of operations required to relate tw

'The self-attention mechanism in transformers differs from traditional sequence alignment methods in the following ways:\n\n1. Global Dependency Modeling: Self-attention allows the model to directly compute dependencies between any two positions in the sequence, regardless of their distance. Traditional sequence alignment methods like those in RNNs and CNNs process the sequence step-by-step or in local receptive fields, which can make it harder to capture long-range dependencies.\n\n2. Parallelization: Self-attention mechanisms enable parallel computation across all positions in a sequence because they do not require sequential processing. This is in contrast to RNNs, which process elements sequentially and therefore cannot be parallelized across the steps of a sequence.\n\n3. Fixed Number of Operations: The Transformer reduces the number of operations required to relate two arbitrary positions in a sequence to a constant, while in RNNs and CNNs, this number grows with the distance bet

In [7]:
quiz_questions = ask_pdf(pdf_qa, "Quiz me on the positional encodings and the role they play in transformers.")

quiz_questions

QUERY:  Quiz me on the positional encodings and the role they play in transformers.
ANSWER Sure, let's start with a quiz on positional encodings in the context of the Transformer model:

1. What is the purpose of positional encodings in the Transformer architecture?
   a) To provide a unique identifier for each word in the vocabulary.
   b) To allow the model to take into account the order of the words in the sequence.
   c) To replace the self-attention mechanism in the model.
   d) To increase the computational efficiency of the model.

2. How are positional encodings combined with the input embeddings in the Transformer model?
   a) By concatenating the positional encoding with the input embedding vector.
   b) By adding the positional encoding to the input embedding vector.
   c) By multiplying the positional encoding with the input embedding vector.
   d) By using the positional encoding as an input to the self-attention mechanism.

3. Which of the following statements about posit

"Sure, let's start with a quiz on positional encodings in the context of the Transformer model:\n\n1. What is the purpose of positional encodings in the Transformer architecture?\n   a) To provide a unique identifier for each word in the vocabulary.\n   b) To allow the model to take into account the order of the words in the sequence.\n   c) To replace the self-attention mechanism in the model.\n   d) To increase the computational efficiency of the model.\n\n2. How are positional encodings combined with the input embeddings in the Transformer model?\n   a) By concatenating the positional encoding with the input embedding vector.\n   b) By adding the positional encoding to the input embedding vector.\n   c) By multiplying the positional encoding with the input embedding vector.\n   d) By using the positional encoding as an input to the self-attention mechanism.\n\n3. Which of the following statements about positional encodings is true for the original Transformer model proposed by Vaswa

In [8]:
llm = ChatOpenAI(model="gpt-4-1106-preview", temperature=0.0)

In [9]:
from langchain_core.prompts.chat import SystemMessagePromptTemplate, HumanMessagePromptTemplate

template = f"You take in text and spit out Python code doing what the user wants"
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_message_prompt = HumanMessagePromptTemplate.from_template("Return ONLY a PYTHON list containing the questions in this text: {questions}")

In [10]:
from langchain_core.prompts import ChatPromptTemplate

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt,human_message_prompt])

In [11]:
quiz_chain = chat_prompt | llm

In [12]:
quiz_chain.invoke({"questions": quiz_questions})

AIMessage(content='```python\nquestions = [\n    "What is the purpose of positional encodings in the Transformer architecture?",\n    "How are positional encodings combined with the input embeddings in the Transformer model?",\n    "Which of the following statements about positional encodings is true for the original Transformer model proposed by Vaswani et al.?",\n    "How do positional encodings enable the model to determine the position of each word in the sequence?",\n    "In the context of the provided paper, what variation related to positional encodings was experimented with by the authors?"\n]\n```', response_metadata={'token_usage': {'completion_tokens': 112, 'prompt_tokens': 504, 'total_tokens': 616}, 'model_name': 'gpt-4-1106-preview', 'system_fingerprint': 'fp_94f711dcf6', 'finish_reason': 'stop', 'logprobs': None}, id='run-f13d7eaa-1516-427c-b154-96468e3f0d6c-0')

In [13]:
import re

def extract_python_code(markdown_string):
    pattern = r'```python\n(.*?)\n```'
    matches = re.findall(pattern, markdown_string, re.DOTALL)

    if matches:
        python_code = matches[0]
        return python_code
    else:
        return None

In [14]:
from langchain_core.runnables import RunnableLambda


quiz_chain = chat_prompt | llm | RunnableLambda(lambda x: x.content) | extract_python_code

Disclaimer: We haven't discussed runnable at length, but essentially they make up the core of the LCEL interface. 

`RunnableLambda` allows you to take in an output from part of the chain and pass it along after performing some transformation defined withint its lambda function.

In [15]:
questions_list = quiz_chain.invoke({"questions": quiz_questions})

In [16]:
questions_list

'questions = [\n    "What is the purpose of positional encodings in the Transformer architecture?",\n    "How are positional encodings combined with the input embeddings in the Transformer model?",\n    "Which of the following statements about positional encodings is true for the original Transformer model proposed by Vaswani et al.?",\n    "How do positional encodings enable the model to determine the position of each word in the sequence?",\n    "In the context of the provided paper, what variation related to positional encodings was experimented with by the authors?"\n]'

In [17]:
exec(questions_list)

In [18]:
questions

['What is the purpose of positional encodings in the Transformer architecture?',
 'How are positional encodings combined with the input embeddings in the Transformer model?',
 'Which of the following statements about positional encodings is true for the original Transformer model proposed by Vaswani et al.?',
 'How do positional encodings enable the model to determine the position of each word in the sequence?',
 'In the context of the provided paper, what variation related to positional encodings was experimented with by the authors?']

In [19]:
# the questions variable was created within the string inside the `questions_list` variable.
answers = []
for q in questions:
    answers.append(ask_pdf(pdf_qa,q))

QUERY:  What is the purpose of positional encodings in the Transformer architecture?
ANSWER Positional encodings in the Transformer architecture serve the purpose of providing information about the order of the tokens in the sequence. Since the Transformer model does not use recurrence or convolution, it does not inherently account for the sequential nature of the input data. Without positional encodings, the model would treat the input as a set of tokens without any sense of word order, which is critical for many tasks such as language understanding and translation.

The positional encodings are added to the input embeddings at the bottom of the encoder and decoder stacks. This way, the model can learn and leverage the sequence order - which word came first, which came second, and so forth. The original Transformer paper proposes the use of sinusoidal functions to generate these positional encodings, providing a unique encoding for each position that can be easily extended to sequence

In [20]:
evaluations = []

for q,a in zip(questions, answers):
    # Check for results
    evaluations.append(ask_pdf(pdf_qa,f"Is this: {a} the correct answer to this question: {q} according to the paper? Return ONLY '''YES''' or '''NO'''. Output:"))

evaluations

QUERY:  Is this: Positional encodings in the Transformer architecture serve the purpose of providing information about the order of the tokens in the sequence. Since the Transformer model does not use recurrence or convolution, it does not inherently account for the sequential nature of the input data. Without positional encodings, the model would treat the input as a set of tokens without any sense of word order, which is critical for many tasks such as language understanding and translation.

The positional encodings are added to the input embeddings at the bottom of the encoder and decoder stacks. This way, the model can learn and leverage the sequence order - which word came first, which came second, and so forth. The original Transformer paper proposes the use of sinusoidal functions to generate these positional encodings, providing a unique encoding for each position that can be easily extended to sequence lengths unseen during training. However, learned positional embeddings are

['YES', 'YES', 'YES', 'YES', 'YES']

In [21]:
scores = []

yes_count = evaluations.count('YES')
scores.append(yes_count)
print(scores)

[5]
