To perform document summarization using LLMs with the LangChain library, you have three main options: Stuff, Map-Reduce, and Refine. Here's a hands-on guide for each method:

## Option 1: Stuff
This method involves stuffing all your documents into a single prompt and passing it to an LLM.

Import necessary modules and define the prompt template.
Create an LLM chain with the defined prompt.
Define a StuffDocumentsChain that takes the LLM chain and combines all documents into a single prompt.
Run the summarization.

In [3]:
pip install pypdf

Collecting pypdf
  Downloading pypdf-4.0.2-py3-none-any.whl.metadata (7.4 kB)
Downloading pypdf-4.0.2-py3-none-any.whl (283 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.0/284.0 kB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-4.0.2
Note: you may need to restart the kernel to use updated packages.


In [2]:
loader = PyPDFLoader("./data/Raptor-Agreement.pdf")


In [3]:
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.document_loaders import PyPDFLoader
from langchain_openai import ChatOpenAI

# Define prompt
prompt_template = """Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

# Define LLM chain
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k")

llm_chain = LLMChain(llm=llm, prompt=prompt)

# Define StuffDocumentsChain
stuff_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="text")


docs = loader.load()
print(stuff_chain.run(docs))


  warn_deprecated(


 This is a Subscription Agreement between Subscriber and Raptor Technologies, LLC (Raptor) for the use of Raptor's Subscription Services. The agreement outlines the terms of the limited, non-exclusive license granted to Subscriber to access and use the services, payment terms, termination clauses, disclaimers, and miscellaneous provisions.

Subscriber will pay fees for the use of the Subscription Services, which include registered sex offender information and custom alerts. Raptor does not guarantee or warrant the accuracy, integrity, or quality of the third-party information provided. The agreement also includes sections on termination, disclaimers, miscellaneous provisions, and contact information for written notice.

The agreement may only be amended with a written agreement between both parties, and it is binding upon and enforceable by the Parties and their respective successors and permitted assigns. Raptor will not be in default of this Agreement for any performance failure caus

## Option 2: Map-Reduce
This method involves summarizing each document individually (map) and then combining these summaries into a final summary (reduce).

Define the map and reduce prompts.
Create an LLM chain for mapping each document to an individual summary.
Use a ReduceDocumentsChain to combine the summaries.
Optionally, use a MapReduceDocumentsChain to automate the process.

In [7]:


from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain.text_splitter  import CharacterTextSplitter

#llm = ChatOpenAI(temperature=0)

# Map
map_template = """The following is a set of documents
{docs}
Based on this list of docs, please identify the main themes 
Helpful Answer:"""
map_prompt = PromptTemplate.from_template(map_template)
map_chain = LLMChain(llm=llm, prompt=map_prompt)

# Reduce
reduce_template = """The following is set of summaries:
{docs}
Take these and distill it into a final, consolidated summary of the main themes. 
Helpful Answer:"""
reduce_prompt = PromptTemplate.from_template(reduce_template)

# Run chain
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="docs"
)

reduce_documents_chain = ReduceDocumentsChain(
    combine_documents_chain=combine_documents_chain,
    collapse_documents_chain=combine_documents_chain,
    token_max=4000,
)

map_reduce_chain = MapReduceDocumentsChain(
    llm_chain=map_chain,
    reduce_documents_chain=reduce_documents_chain,
    document_variable_name="docs",
    return_intermediate_steps=False,
)

text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=0
)
split_docs = text_splitter.split_documents(docs)

print(map_reduce_chain.run(split_docs))

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

 This document outlines the terms and conditions of an Agreement between Raptor Technologies, LLC (Raptor) and a Subscriber organization for access to Raptor's Subscription Services. The key themes include:

1. License and Terms: Raptor grants a limited, non-exclusive license to the Subscriber to use its Subscription Services subject to certain terms and conditions. The Subscriber is responsible for providing their own Internet access and equipment to use the Subscription Services.
2. Confidentiality: The Subscriber agrees to keep confidential any information related to the Subscription Services and Equipment provided by Raptor, except as expressly permitted.
3. Data Collection and Distribution: The Subscriber is prohibited from disclosing or making public individual's personally identifying information obtained through the Subscription Services except as required in the ordinary course of business or by applicable law.
4. Fees and Term: The Agreement has an initial term of one year, d

## Option 3: Refine
This method involves iteratively refining a summary based on new context.

Define the prompt template for refining.
Load the summarize chain with the refine chain type.
Run the summarization with the input documents.
from langchain import load_summarize_chain, PromptTemplate

In [9]:
from langchain.chains.summarize import load_summarize_chain
prompt_template = """Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

chain = load_summarize_chain(llm, chain_type="refine")
chain.run(split_docs)

" This document outlines the terms of a subscription agreement between Subscriber (district/school or organization) and Raptor Technologies LLC (Raptor) for access to Raptor's Subscription Services. The agreement grants Subscriber a limited, non-exclusive license to use the services in accordance with the agreement and applicable laws. Confidential information provided by Raptor must be kept confidential and not disclosed to third parties without prior written consent. Individual's personally identifying information obtained through the services must not be disclosed except as required by law or in the ordinary course of business. Subscriber is responsible for providing its own Internet access and equipment to use the services, and fees are payable annually in advance. The agreement has an initial term of one year, with automatic renewal unless written notice of non-renewal is given.\n\nRaptor disclaims all responsibility for determinations of an individual’s registered sex offender st