### Summarizing Using the `refine` Chain

In [1]:
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate
from langchain.chains import summarize
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredPDFLoader
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv(), override=True)

True

In [2]:
loader = UnstructuredPDFLoader('../2 QA on Private Documents/files/attention_is_all_you_need.pdf')
data = loader.load()

In [3]:
data[0].page_content

'Attention Is All You Need\n\n7 1 0 2 c e D 6\n\nAshish Vaswani∗ Google Brain avaswani@google.com\n\nLlion Jones∗ Google Research llion@google.com\n\nNoam Shazeer∗ Google Brain noam@google.com\n\nNiki Parmar∗ Google Research nikip@google.com\n\nJakob Uszkoreit∗ Google Research usz@google.com\n\nAidan N. Gomez∗ † University of Toronto aidan@cs.toronto.edu\n\nŁukasz Kaiser∗ Google Brain lukaszkaiser@google.com\n\nIllia Polosukhin∗ ‡ illia.polosukhin@gmail.com\n\n] L C . s c [\n\n5 v 2 6 7 3 0 . 6 0 7 1 : v i X r a\n\n1\n\nAbstract\n\nThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to 

In [4]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=100)
chunks = text_splitter.split_documents(data)
len(chunks)

5

In [5]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [6]:
def print_embedding_cost(texts):
    import tiktoken
    enc = tiktoken.encoding_for_model('gpt-3.5-turbo')
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    print(f'Total Tokens: {total_tokens}')
    print(f'Embedding Cost in USD: {total_tokens / 1000 * 0.002:.6f}')
    
    
print_embedding_cost(chunks)

Total Tokens: 10045
Embedding Cost in USD: 0.020090


In [7]:
chain = summarize.load_summarize_chain(
    llm=llm,
    chain_type='refine',
    verbose=True
)
output_summary = chain.run(chunks)



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Attention Is All You Need

7 1 0 2 c e D 6

Ashish Vaswani∗ Google Brain avaswani@google.com

Llion Jones∗ Google Research llion@google.com

Noam Shazeer∗ Google Brain noam@google.com

Niki Parmar∗ Google Research nikip@google.com

Jakob Uszkoreit∗ Google Research usz@google.com

Aidan N. Gomez∗ † University of Toronto aidan@cs.toronto.edu

Łukasz Kaiser∗ Google Brain lukaszkaiser@google.com

Illia Polosukhin∗ ‡ illia.polosukhin@gmail.com

] L C . s c [

5 v 2 6 7 3 0 . 6 0 7 1 : v i X r a

1

Abstract

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based 


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYour job is to produce a final summary
We have provided an existing summary up to a certain point: The paper introduces a new network architecture called the Transformer, which is based solely on attention mechanisms and does not use recurrent or convolutional neural networks. The Transformer model achieves superior results in machine translation tasks, is more parallelizable, and requires less training time compared to existing models. The paper also discusses the advantages of self-attention and describes the architecture of the Transformer, including the encoder and decoder stacks and the attention mechanism used.
We have the opportunity to refine the existing summary(only if needed) with some more context below.
------------
Attention(Q, K, V ) = softmax(

QK T √ dk

)V

(1)

The two most commonly used attention functions are additive attention [2], and dot-product (multi- pl


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYour job is to produce a final summary
We have provided an existing summary up to a certain point: The paper introduces the Transformer, a network architecture based solely on attention mechanisms. The Transformer model achieves superior results in machine translation tasks, is more parallelizable, and requires less training time compared to existing models. The paper also discusses the advantages of self-attention and describes the architecture of the Transformer, including the encoder and decoder stacks and the attention mechanism used. The paper further compares self-attention layers to recurrent and convolutional layers in terms of computational complexity, parallelizability, and the length of paths between long-range dependencies in the network.
We have the opportunity to refine the existing summary(only if needed) with some more context below.
------------
As side beneﬁt, s


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYour job is to produce a final summary
We have provided an existing summary up to a certain point: The paper introduces the Transformer, a network architecture based solely on attention mechanisms, which achieves superior results in machine translation tasks, is more parallelizable, and requires less training time compared to existing models. The paper also discusses the advantages of self-attention and describes the architecture of the Transformer, including the encoder and decoder stacks and the attention mechanism used. Additionally, the paper explores the interpretability of self-attention and presents examples of attention distributions. The training regime, hardware, schedule, optimizer, and regularization techniques used for training the models are also described. The paper further evaluates the performance of the Transformer on English constituency parsing and demonstrate


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYour job is to produce a final summary
We have provided an existing summary up to a certain point: The paper introduces the Transformer, a network architecture based solely on attention mechanisms, which achieves superior results in machine translation tasks, is more parallelizable, and requires less training time compared to existing models. The paper also discusses the advantages of self-attention and describes the architecture of the Transformer, including the encoder and decoder stacks and the attention mechanism used. Additionally, the paper explores the interpretability of self-attention and presents examples of attention distributions. The training regime, hardware, schedule, optimizer, and regularization techniques used for training the models are also described. The paper further evaluates the performance of the Transformer on English constituency parsing and demonstrate

In [8]:
print(output_summary)

The paper introduces the Transformer, a network architecture based solely on attention mechanisms, which achieves superior results in machine translation tasks, is more parallelizable, and requires less training time compared to existing models. The paper also discusses the advantages of self-attention and describes the architecture of the Transformer, including the encoder and decoder stacks and the attention mechanism used. Additionally, the paper explores the interpretability of self-attention and presents examples of attention distributions. The training regime, hardware, schedule, optimizer, and regularization techniques used for training the models are also described. The paper further evaluates the performance of the Transformer on English constituency parsing and demonstrates its ability to generalize to other tasks. Overall, the Transformer model shows promising results and outperforms previous state-of-the-art models in various translation tasks. The authors are excited about

### `refine` with Custom Prompts

In [9]:
prompt_template = """
Write a concise summary of the following extracting the key information:
Text: `{text}`
CONCISE SUMMARY:
"""
initial_prompt = PromptTemplate(
    template=prompt_template, 
    input_variables=['text']
)


refine_template = """
Your job is to produce a final summary.
I have provided an existing summary up to a certain point: {existing_answer}.
Please refine the existing summary with some more context below.
------------
{text}
------------
Start the final summary with an INTRODUCTION PARAGRAPH that gives an overview of the topic FOLLOWED
by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
"""

refine_prompt = PromptTemplate(
    template=refine_template,
    input_variables=['existing_answer', 'text']
)

In [10]:
chain = summarize.load_summarize_chain(
    llm=llm,
    chain_type='refine',
    question_prompt=initial_prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=False
)
output_summary = chain.run(chunks)

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)).


In [11]:
print(output_summary)

Introduction:
The paper introduces the Transformer, a network architecture based solely on attention mechanisms, which achieves superior results in machine translation tasks and offers advantages in terms of computational efficiency and parallelizability. The model uses attention in three different ways and incorporates multi-head attention and positional encodings.

Bullet Points:
- The Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers.
- It outperforms previous state-of-the-art models in machine translation tasks.
- The authors plan to apply attention-based models to other tasks and extend the Transformer to handle input and output modalities other than text.
- The code used to train and evaluate the models is available online.
- Many of the attention heads in the Transformer exhibit behavior related to the structure of the sentence.

Conclusion:
Overall, the Transformer architecture presents a promising approach for various