# **Summarization** in LangChain

## Outline

* Map Reduce
* Stuff
* Refine

In [None]:
!pip --upgrade install langchain openai tiktoken -q

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

### Setting up Summarization Chain

In [None]:
from langchain.prompts import PromptTemplate
from langchain.docstore.document import Document
from langchain.chains.mapreduce import MapReduceChain
from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

llm = OpenAI(temperature=0)

In [None]:
# Load your document
with open('/content/how_to_win_friends.txt') as f:
    how_to_win_friends = f.read()

### Text Splitting
* CharacterTextSplitter
* RecursiveCharacterTextSplitter

In [None]:
#  Option 1: Use CharacterTextSplitter for small  documents

texts = CharacterTextSplitter.split_text(how_to_win_friends)
docs = [Document(page_content=t) for t in texts[:4]]

In [None]:
#  Option 2: Use RecursiveCharacterTextSplitter for large documents

text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 800,
chunk_overlap  = 200,
length_function = len,)
texts = text_splitter.create_documents(how_to_win_friends)

## Map Reduce
This method involves **an initial prompt on each chunk of data ***
( for summarization tasks, this could be a summary of that chunk; for question-answering tasks, it could be an answer based solely on that chunk). **Then a different prompt is run to combine all the initial outputs.** This is implemented in the LangChain as the MapReduceDocumentsChain.

**Pros:** Can scale to larger documents (and more documents) than StuffDocumentsChain. The calls to the LLM on individual documents are independent and can therefore be parallelized.

**Cons:** Requires many more calls to the LLM than StuffDocumentsChain. Loses some information during the final combining call.

In [None]:
from langchain.chains.summarize import load_summarize_chain
import textwrap

In [None]:
chain = load_summarize_chain(llm, 
                             chain_type="map_reduce")


output_summary = chain.run(docs)
wrapped_text = textwrap.fill(output_summary, width=100)
print(wrapped_text)

 Dale Carnegie wrote this book to provide practical, common-sense training to adults in how to
effectively deal with people in everyday business and social contacts. It was developed over 15
years of research and experimentation, and the rules it contains have been proven to work like
magic. People have seen their lives revolutionized by applying these principles, from increased
profits and pay to improved relationships with employees and family members. The book is about
discovering, developing, and profiting from dormant and unused assets, and its purpose is to help
readers become better equipped to meet life's situations.


In [None]:
# for summarizing each part
chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

In [None]:
# for combining the parts
chain.combine_document_chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

In [None]:
chain = load_summarize_chain(llm, 
                             chain_type="map_reduce",
                             verbose=True
                             )


output_summary = chain.run(docs)
wrapped_text = textwrap.fill(output_summary, 
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)



[1m> Entering new MapReduceDocumentsChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"How This Book Was Written-And Why 
by 

Dale Carnegie 

During the first thirty-five years of the twentieth century, the 
publishing houses of America printed more than a fifth of a million 
different books. Most of them were deadly dull, and many were 
financial failures. "Many," did I say? The president of one of the 
largest publishing houses in the world confessed to me that his 
company, after seventy-five years of publishing experience, still lost 
money on seven out of every eight books it published. 

Why, then, did I have the temerity to write another book? And, after 
I had written it, why should you bother to read it? 


Fair questions, both; and I'll try to answer them. 


I have, since 1912, been conducting educational courses for business 
and professional men and women in New York. At first, I conducted 
courses in public speaking o

## Stuffing
Stuffing is the simplest method, whereby you simply stuff all the related data into the prompt as context to pass to the language model. This is implemented in LangChain as the StuffDocumentsChain.

**Pros:** Only makes a single call to the LLM. When generating text, the LLM has access to all the data at once.

**Cons:** Most LLMs have a context length, and for large documents (or many documents) this will not work as it will result in a prompt larger than the context length.

The main downside of this method is that **it only works one smaller pieces of data.**  Once you are working with many pieces of data, this approach is no longer feasible. The next two approaches are designed to help deal with that.



In [None]:
chain = load_summarize_chain(llm, chain_type="stuff")

In [None]:
prompt_template = """Write a concise bullet point summary of the following:


{text}


CONSCISE SUMMARY IN BULLET POINTS:"""

BULLET_POINT_PROMPT = PromptTemplate(template=prompt_template, 
                        input_variables=["text"])


In [None]:
chain = load_summarize_chain(llm, 
                             chain_type="stuff", 
                             prompt=BULLET_POINT_PROMPT)

output_summary = chain.run(docs)

wrapped_text = textwrap.fill(output_summary, 
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

 

• Dale Carnegie wrote this book to provide practical advice on how to deal with people in
everyday business and social contacts. 

• Research revealed that 85% of success in any profession
is due to skill in human engineering, or the ability to lead people. 

• Carnegie conducted courses
for adults in New York and found that no practical, working handbook on human relations existed. 

•
He read everything he could find on the subject, interviewed successful people, and conducted a
survey to determine what adults wanted to study. 

• The survey revealed that people wanted to
understand and get along with people, make people like them, and win others to their way of
thinking. 

• Carnegie wrote this book based on his research, interviews, and courses, and it has
been tested in business and social contacts with successful results.


### Ver 3 With 'map_reduce' with our custom prompt

In [None]:
chain = load_summarize_chain(llm, 
                             chain_type="map_reduce",
                             map_prompt=BULLET_POINT_PROMPT, 
                             combine_prompt=BULLET_POINT_PROMPT)

# chain.llm_chain.prompt= BULLET_POINT_PROMPT
# chain.combine_document_chain.llm_chain.prompt= BULLET_POINT_PROMPT

output_summary = chain.run(docs)
wrapped_text = textwrap.fill(output_summary, 
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)


• Dale Carnegie wrote this book to provide practical, common-sense training in dealing with people
• Research revealed that 85% of financial success is due to skill in human engineering
• University
of Chicago and United Y.M.C.A. Schools conducted a survey to determine what adults want to study
•
This book was developed over 15 years of experiment and research
• Harvard graduate declared he had
learned more in 14 weeks through a system of training about influencing people than he had in 4
years of college
• William James of Harvard said humans live far within their limits and possess
powers they habitually fail to use
• The purpose of this book is to help discover, develop and
profit from these dormant and unused assets
• Education is the ability to meet life's situations and
the aim of education is not knowledge but action


In [None]:
# with a custom prompt
prompt_template = """Write a concise summary of the following:


{text}


CONSCISE SUMMARY IN BULLET POINTS:"""

PROMPT = PromptTemplate(template=prompt_template, 
                        input_variables=["text"])

## with intermediate steps
chain = load_summarize_chain(OpenAI(temperature=0), 
                             chain_type="map_reduce", 
                             return_intermediate_steps=True, 
                             map_prompt=PROMPT, 
                             combine_prompt=PROMPT)

output_summary = chain({"input_documents": docs}, return_only_outputs=True)
wrapped_text = textwrap.fill(output_summary['output_text'], 
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)


• Dale Carnegie wrote this book to provide practical, common-sense training in the art of getting
along with people in everyday business and social contacts. 
• Research done by the Carnegie
Foundation for the Advancement of Teaching revealed that 85% of one's financial success is due to
skill in human engineering-to personality and the ability to lead people. 
• The University of
Chicago and the United Y.M.C.A. Schools conducted a survey to determine what adults want to study,
which revealed that health and understanding people were the prime interests of adults. 
• This book
was developed over 15 years of research and experimentation and has been seen to revolutionize the
lives of many people. 
• The purpose of this book is to help readers discover, develop and profit
from their dormant and unused assets and equip them to better meet life's situations.


In [None]:
wrapped_text = textwrap.fill(output_summary['intermediate_steps'][2], 
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)


• This book was developed over 15 years of research and experimentation. 
• The rules set down in
the book have been seen to revolutionize the lives of many people. 
• Examples of success stories
include an employer with 314 employees, salespeople, executives, and spouses. 
• People are often
astonished at the results they achieve, which seem like magic. 
• One man was so inspired by the
principles that he stayed up for three days discussing them. 
• A letter from a German aristocrat
expressed religious fervor for the principles.


## Refine
This method involves **an initial prompt on the first chunk of data, generating some output. For the remaining documents, that output is passed in, along with the next document**, asking the LLM to refine the output based on the new document.

**Pros:** Can pull in more relevant context, and may be less lossy than MapReduceDocumentsChain.

**Cons:** Requires many more calls to the LLM than StuffDocumentsChain. The calls are also NOT independent, meaning they cannot be paralleled like MapReduceDocumentsChain. There is also some potential dependencies on the ordering of the documents.

In [None]:
chain = load_summarize_chain(llm, chain_type="refine")

output_summary = chain.run(docs)
wrapped_text = textwrap.fill(output_summary, width=100)
print(wrapped_text)

  Dale Carnegie wrote this book to provide practical, common-sense training to adults in how to
effectively deal with people in everyday business and social contacts. He has been conducting
educational courses for business and professional men and women since 1912 and has found that even
in technical lines such as engineering, 85% of one's financial success is due to skill in human
engineering. He believes that every college should provide courses to develop this ability, but has
not found one that does. To create this book, Carnegie conducted a survey to determine what adults
wanted to study, read extensively on the subject, and interviewed successful people to discover the
techniques they used in human relations. The book was developed out of the experiences of thousands
of adults in a laboratory of human relationships, the first of its kind. Carnegie's book provides
practical advice on how to revolutionize one's life and relationships through the application of his
principles. He ha

In [None]:
prompt_template = """Write a concise summary of the following extracting the key information:


{text}


CONCISE SUMMARY:"""
PROMPT = PromptTemplate(template=prompt_template, 
                        input_variables=["text"])

refine_template = (
    "Your job is to produce a final summary\n"
    "We have provided an existing summary up to a certain point: {existing_answer}\n"
    "We have the opportunity to refine the existing summary"
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{text}\n"
    "------------\n"
    "Given the new context, refine the original summary"
    "If the context isn't useful, return the original summary."
)
refine_prompt = PromptTemplate(
    input_variables=["existing_answer", "text"],
    template=refine_template,
)
chain = load_summarize_chain(OpenAI(temperature=0), 
                             chain_type="refine", 
                             return_intermediate_steps=True, 
                             question_prompt=PROMPT, 
                             refine_prompt=refine_prompt)


In [None]:
output_summary = chain({"input_documents": docs}, return_only_outputs=True)
wrapped_text = textwrap.fill(output_summary['output_text'], 
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)



Dale Carnegie wrote this book to provide practical, common-sense training to adults in how to
effectively deal with people in everyday business and social contacts. He conducted educational
courses for business and professional men and women in New York and realized that they needed more
training in this area. Research done a few years ago revealed that even in technical lines such as
engineering, 85% of one's financial success is due to skill in human engineering - personality and
the ability to lead people. Carnegie wanted to provide this training to adults, as it is not offered
in any college in the land. To do this, he conducted a survey to determine what adults wanted to
study, and the results revealed that health and understanding and getting along with people were the
prime interests. Since no practical textbook on the subject existed, Carnegie wrote one himself,
drawing on his own experience, research, and interviews with successful people. The book grew out of
the experience

In [None]:
wrapped_text = textwrap.fill(output_summary['intermediate_steps'][0], 
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

 Dale Carnegie wrote this book to provide practical, common-sense training to adults in how to
effectively deal with people in everyday business and social contacts. He conducted educational
courses for business and professional men and women in New York and realized that they needed more
training in this area. Research done a few years ago revealed that even in technical lines such as
engineering, 85% of one's financial success is due to skill in human engineering - personality and
the ability to lead people. Carnegie wanted to provide this training to adults, as it is not offered
in any college in the land.
