In [None]:
# !pip install langchain
# !pip install langchain-openai
# !pip install pypdf
# !pip install chromadb
# !pip install langchainhub

In [None]:
import os

# # Set OPENAI API Key

# os.environ["OPENAI_API_KEY"] = "your openai key"

# OR (load from .env file)

# from dotenv import load_dotenv
# make sure you have python-dotenv installed
# load_dotenv("./.env")

Let's set up a study workflow using Jupyter Notebooks, LLMs, and langchain.

In [2]:
import os
from langchain.document_loaders import PyPDFLoader
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

In [3]:
pdf_path = "./assets-resources/attention-paper.pdf"

loader = PyPDFLoader(pdf_path) # LOAD
pdf_docs = loader.load_and_split() # SPLIT
embeddings = OpenAIEmbeddings() # EMBED
vectordb = Chroma.from_documents(pdf_docs, embedding=embeddings) # STORE
retriever = vectordb.as_retriever()
llm = ChatOpenAI(model="gpt-4-1106-preview")
pdf_qa = RetrievalQA.from_llm(llm=llm, retriever=retriever) # RETRIEVE
pdf_qa

RetrievalQA(combine_documents_chain=StuffDocumentsChain(llm_chain=LLMChain(prompt=ChatPromptTemplate(input_variables=['context', 'question'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], template="Use the following pieces of context to answer the user's question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\n{context}")), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], template='{question}'))]), llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x15343d890>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x1536dbe50>, model_name='gpt-4-1106-preview', openai_api_key=SecretStr('**********'), openai_api_base='https://api.openai.com/v1', openai_proxy='')), document_prompt=PromptTemplate(input_variables=['page_content'], template='Context:\n{page_content}'), document_variable_name='context'), re

In [4]:
query = "What are the key components of the transformer architecture?"
result = pdf_qa.invoke({"query": query, "chat_history": []})

In [5]:
print(result["result"])

The Transformer architecture, as described in the provided context, consists of several key components:

1. **Encoder and Decoder Stacks**: Both the encoder and decoder are composed of a stack of six identical layers. 

   - **Encoder**: Each encoder layer has two sub-layers. The first is a multi-head self-attention mechanism, and the second is a position-wise fully connected feed-forward network. Each of these sub-layers is followed by a residual connection and layer normalization. 

   - **Decoder**: Each decoder layer also has two sub-layers similar to the encoder, plus a third sub-layer that performs multi-head attention over the encoder's output. The self-attention mechanism in the decoder is modified to prevent positions from attending to subsequent positions (also known as masked self-attention).

2. **Attention**: The attention function in the Transformer is a mapping of a query and a set of key-value pairs to an output, with all components being vectors. The output is computed

In [6]:
def ask_pdf(pdf_qa,query):
    print("QUERY: ",query)
    result = pdf_qa.invoke({"query": query, "chat_history": []})
    answer = result["result"]
    print("ANSWER", answer)
    return answer


ask_pdf(pdf_qa,"How does the self-attention mechanism in transformers differ from traditional sequence alignment methods?")

QUERY:  How does the self-attention mechanism in transformers differ from traditional sequence alignment methods?
ANSWER The self-attention mechanism in transformers differs from traditional sequence alignment methods in the following ways:

1. Global Dependency Modeling: Self-attention allows the model to directly compute dependencies between any two positions in the sequence, regardless of their distance. Traditional sequence alignment methods like those in RNNs and CNNs process the sequence step-by-step or in local receptive fields, which can make it harder to capture long-range dependencies.

2. Parallelization: Self-attention mechanisms enable parallel computation across all positions in a sequence because they do not require sequential processing. This is in contrast to RNNs, which process elements sequentially and therefore cannot be parallelized across the steps of a sequence.

3. Fixed Number of Operations: The Transformer reduces the number of operations required to relate tw

'The self-attention mechanism in transformers differs from traditional sequence alignment methods in the following ways:\n\n1. Global Dependency Modeling: Self-attention allows the model to directly compute dependencies between any two positions in the sequence, regardless of their distance. Traditional sequence alignment methods like those in RNNs and CNNs process the sequence step-by-step or in local receptive fields, which can make it harder to capture long-range dependencies.\n\n2. Parallelization: Self-attention mechanisms enable parallel computation across all positions in a sequence because they do not require sequential processing. This is in contrast to RNNs, which process elements sequentially and therefore cannot be parallelized across the steps of a sequence.\n\n3. Fixed Number of Operations: The Transformer reduces the number of operations required to relate two arbitrary positions in a sequence to a constant, while in RNNs and CNNs, this number grows with the distance bet

In [22]:
quiz_questions = ask_pdf(pdf_qa, "Quiz me with 3 simple questions on the positional encodings and the role they play in transformers.")

quiz_questions

QUERY:  Quiz me with 3 simple questions on the positional encodings and the role they play in transformers.
ANSWER Sure, here are three simple questions focused on positional encodings in the context of Transformer models:

1. What is the main purpose of positional encodings in the Transformer architecture?
   A) To indicate the order of words in the input sequence.
   B) To help the model perform better on longer sequences.
   C) To reduce the training time of the model.
   D) To provide a unique identifier for each word in the vocabulary.

2. How do positional encodings enable the Transformer model to take into account the order of words in the input sequence?
   A) By adding a unique positional signal to each word embedding, which helps the model distinguish the position of each word.
   B) By using a separate recurrent neural network to track the positions.
   C) By sorting the words in alphabetical order before processing.
   D) By creating a separate attention mechanism specifica

"Sure, here are three simple questions focused on positional encodings in the context of Transformer models:\n\n1. What is the main purpose of positional encodings in the Transformer architecture?\n   A) To indicate the order of words in the input sequence.\n   B) To help the model perform better on longer sequences.\n   C) To reduce the training time of the model.\n   D) To provide a unique identifier for each word in the vocabulary.\n\n2. How do positional encodings enable the Transformer model to take into account the order of words in the input sequence?\n   A) By adding a unique positional signal to each word embedding, which helps the model distinguish the position of each word.\n   B) By using a separate recurrent neural network to track the positions.\n   C) By sorting the words in alphabetical order before processing.\n   D) By creating a separate attention mechanism specifically for positions.\n\n3. In the original Transformer paper, what type of positional encoding is mentio

In [23]:
llm = ChatOpenAI(model="gpt-4-1106-preview", temperature=0.0)

In [24]:
from langchain_core.prompts.chat import SystemMessagePromptTemplate, HumanMessagePromptTemplate

template = f"You take in text and spit out Python code doing what the user wants"
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_message_prompt = HumanMessagePromptTemplate.from_template("Return ONLY a PYTHON list containing the questions in this text: {questions}")

In [25]:
from langchain_core.prompts import ChatPromptTemplate

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt,human_message_prompt])

In [26]:
quiz_chain = chat_prompt | llm

In [27]:
quiz_chain.invoke({"questions": quiz_questions})

AIMessage(content='```python\nquestions = [\n    "What is the main purpose of positional encodings in the Transformer architecture?",\n    "How do positional encodings enable the Transformer model to take into account the order of words in the input sequence?",\n    "In the original Transformer paper, what type of positional encoding is mentioned as an alternative to the sinusoidal positional encoding, and what was the observed effect when it was used?"\n]\n```', response_metadata={'token_usage': {'completion_tokens': 84, 'prompt_tokens': 314, 'total_tokens': 398}, 'model_name': 'gpt-4-1106-preview', 'system_fingerprint': 'fp_94f711dcf6', 'finish_reason': 'stop', 'logprobs': None}, id='run-cd46aeac-badd-408b-8f5d-bff511988afe-0')

In [28]:
import re

def extract_python_code(markdown_string):
    pattern = r'```python\n(.*?)\n```'
    matches = re.findall(pattern, markdown_string, re.DOTALL)

    if matches:
        python_code = matches[0]
        return python_code
    else:
        return None

In [29]:
from langchain_core.runnables import RunnableLambda


quiz_chain = chat_prompt | llm | RunnableLambda(lambda x: x.content) | extract_python_code

Disclaimer: We haven't discussed runnable at length, but essentially they make up the core of the LCEL interface. 

`RunnableLambda` allows you to take in an output from part of the chain and pass it along after performing some transformation defined withint its lambda function.

In [30]:
questions_list = quiz_chain.invoke({"questions": quiz_questions})

In [31]:
questions_list

'questions = [\n    "What is the main purpose of positional encodings in the Transformer architecture?",\n    "How do positional encodings enable the Transformer model to take into account the order of words in the input sequence?",\n    "In the original Transformer paper, what type of positional encoding is mentioned as an alternative to the sinusoidal positional encoding, and what was the observed effect when it was used?"\n]'

In [32]:
exec(questions_list)

In [33]:
questions

['What is the main purpose of positional encodings in the Transformer architecture?',
 'How do positional encodings enable the Transformer model to take into account the order of words in the input sequence?',
 'In the original Transformer paper, what type of positional encoding is mentioned as an alternative to the sinusoidal positional encoding, and what was the observed effect when it was used?']

In [34]:
# the questions variable was created within the string inside the `questions_list` variable.
answers = []
for q in questions:
    answers.append(ask_pdf(pdf_qa,q))

QUERY:  What is the main purpose of positional encodings in the Transformer architecture?
ANSWER The main purpose of positional encodings in the Transformer architecture is to provide some information about the order of the tokens in the sequence. Since the Transformer relies entirely on attention mechanisms without any recurrent or convolutional layers, it does not inherently capture the sequential order of the input tokens. Positional encodings enable the model to take into account the position of each token within the sequence, which is crucial for tasks where the order of inputs affects the meaning and hence the output, such as in language modeling and machine translation.

The positional encodings are added to the input embeddings at the bottoms of the encoder and decoder stacks, ensuring that the model can consider the order of the tokens when processing the sequence. This allows the Transformer to maintain high parallelization while still being sensitive to the order of the toke

In [35]:
evaluations = []

for q,a in zip(questions, answers):
    # Check for results
    evaluations.append(ask_pdf(pdf_qa,f"Is this: {a} the correct answer to this question: {q} according to the paper? Return ONLY '''YES''' or '''NO'''. Output:"))

evaluations

QUERY:  Is this: The main purpose of positional encodings in the Transformer architecture is to provide some information about the order of the tokens in the sequence. Since the Transformer relies entirely on attention mechanisms without any recurrent or convolutional layers, it does not inherently capture the sequential order of the input tokens. Positional encodings enable the model to take into account the position of each token within the sequence, which is crucial for tasks where the order of inputs affects the meaning and hence the output, such as in language modeling and machine translation.

The positional encodings are added to the input embeddings at the bottoms of the encoder and decoder stacks, ensuring that the model can consider the order of the tokens when processing the sequence. This allows the Transformer to maintain high parallelization while still being sensitive to the order of the tokens, which is vital for understanding and generating sequences in a meaningful wa

['YES', 'YES', 'YES']

In [36]:
scores = []

yes_count = evaluations.count('YES')
scores.append(yes_count)
print(scores)

[3]
