# Text Summarization of Large Documents using LangChain 🦜🔗

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/language/use-cases/document-summarization/summarization_large_documents_langchain.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/use-cases/document-summarization/summarization_large_documents_langchain.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/blob/main/language/use-cases/document-summarization/summarization_large_documents_langchain.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
</table>


## Overview

Text summarization is an NLP task that creates a concise and informative summary of a longer text. LLMs can be used to create summaries of news articles, research papers, technical documents, and other types of text.

Summarizing large documents can be challenging. To create summaries, you need to apply summarization strategies to your indexed documents. You have already seen some of these strategies in the previous notebooks. If you haven't completed it, it is recommended to do so to have a basic understanding of how to summarize large documents.

In this notebook, you will use LangChain, a framework for developing LLM applications, to apply some summarization strategies. The notebook covers several examples of how to summarize large documents.

### Objective

In this tutorial, you learn how to use LangChain with PaLM API to summarize large documents by working through the following examples:

- Stuffing method
- MapReduce method
- Refine method

### Costs

This tutorial uses billable components of Google Cloud:
- Vertex AI Generative AI Studio

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing), and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

## Getting Started

### Install Vertex AI SDK & Other dependencies

### Import libraries

In [None]:
!pip install openai tiktoken chromadb langchain

Collecting openai
  Downloading openai-1.3.3-py3-none-any.whl (220 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m220.3/220.3 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tiktoken
  Downloading tiktoken-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m33.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting chromadb
  Downloading chromadb-0.4.17-py3-none-any.whl (496 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m496.8/496.8 kB[0m [31m37.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain
  Downloading langchain-0.0.338-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m57.1 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.25.1-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [3

In [None]:
!pip install openai==0.28.1

Collecting openai==0.28.1
  Downloading openai-0.28.1-py3-none-any.whl (76 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/77.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━[0m [32m71.7/77.0 kB[0m [31m2.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.[0m[31m
[0mSuccessfully installed openai-0.28.1


In [None]:
!pip show openai

Name: openai
Version: 1.3.3
Summary: The official Python library for the openai API
Home-page: 
Author: 
Author-email: OpenAI <support@openai.com>
License: 
Location: /usr/local/lib/python3.10/dist-packages
Requires: anyio, distro, httpx, pydantic, tqdm, typing-extensions
Required-by: llmx


In [None]:
import urllib
import warnings
from pathlib import Path as p

import pandas as pd
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.chat_models import ChatOpenAI

from langchain.document_loaders import PyPDFLoader
# from langchain.llms import VertexAI

warnings.filterwarnings("ignore")

### Import models

You load the pre-trained text generation model called `text-bison@001`.

In [None]:
from typing import Any, List, Mapping, Optional

from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = "falcon_1ft_20231127_134856-z4pob5q6"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, torch_dtype=torch.float16, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)

class CustomLLM(LLM):

    @property
    def _llm_type(self) -> str:
        return "custom"

    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> str:
        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        # return prompt[: self.n]
        prompt_template=f'''{prompt}'''
        input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
        output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=1024)
        return tokenizer.decode(output[0])

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters."""
        return {"param": 2}

In [None]:
llm_gpt_turbo = CustomLLM()

## Summarization with Large Documents

### Preparing data files

To begin, you will need to download a few files that are required for the summarizing tasks below.

In [None]:
import markdown
import re
from langchain.schema import Document

# Function to extract text from a Markdown file
def extract_text_from_readme(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        markdown_content = file.read()

        # Parse the Markdown content
        html_content = markdown.markdown(markdown_content)

        # Remove HTML tags to get plain text
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(html_content, 'html.parser')
        text = soup.get_text()

        return text

# Specify the path to your README.md file
# readme_path = 'drive/MyDrive/README.md'
readme_path = 'README_sizelim.md'

# Extract text from the README.md file
readme_text = extract_text_from_readme(readme_path)

# Print the extracted text
# print(readme_text.strip(" ").replace("\n",""))
re.sub(r'[\r\n][\r\n]{2,}', '\n\n', readme_text)
readme_doc = Document(
        page_content=readme_text,
        metadata={"source": ""},
    )

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Method 1: Stuffing

Stuffing is the simplest method to pass data to a language model. It "stuffs" text into the prompt as context in a way that all of the relevant information can be processed by the model to get what you want.

In LangChain, you can use `StuffDocumentsChain` as part of the `load_summarize_chain` method. What you need to do is setting `stuff` as `chain_type` of your chain.

### Prompt design with `Stuffing` chain

In [None]:
prompt_template = """Write a concise summary of the following text delimited by triple backquotes.
              Return your response in bullet points which covers the key points of the text.
              ```{text}```
              BULLET POINT SUMMARY:
  """

prompt = PromptTemplate(template=prompt_template, input_variables=["text"])

### Retrying
Initiate a chain using `stuff` method and process three pages document.

In [None]:
stuff_chain = load_summarize_chain(llm_gpt_turbo, chain_type="stuff", prompt=prompt)

In [None]:
stuff_chain.run([readme_doc])

'- LangChain is a library for building applications with Large Language Models (LLMs) through composability.\n- LangSmith is a developer platform for building, testing, and monitoring LLM applications.\n- Select chains (SQLDatabase) will be moved to langchain_experimental for a leaner and safer LangChain.\n- Quick installation can be done with pip install langchain or pip install langsmith && conda install langchain -c conda-forge.\n- LangChain aims to assist in the development of applications such as question answering, chatbots, and agents.\n- LangChain provides documentation, examples, and resources for getting started and using the library.\n- LangChain helps with prompt management, chains, data augmented generation, agents, memory, and evaluation.\n- Contributions to the project are welcome.'

In [None]:
try:
    print(stuff_chain.run(readme_doc))
except Exception as e:
    print(
        "The code failed since it won't be able to run inference on such a huge context and throws this exception: ",
        e,
    )

The code failed since it won't be able to run inference on such a huge context and throws this exception:  'tuple' object has no attribute 'page_content'


### Considerations

The `stuffing` method is a way to summarize text by feeding the entire document to a large language model (LLM) in a single call. This method has both pros and cons.

The stuffing method only requires a single call to the LLM, which can be faster than other methods that require multiple calls. When summarizing text, the LLM has access to all the data at once, which can result in a better summary.

But, LLMs have a context length, which is the maximum number of tokens that can be processed in a single call. If the document is longer than the context length, the stuffing method will not work. Also the stuffing method is not suitable for summarizing large documents, as it can be slow and may not produce a good summary.

Let's explore other approaches to help deal with having longer text than context lengh limit of LLMs.

## Method 2: MapReduce

The `MapReduce` method implements a multi-stage summarization. It is a technique for summarizing large pieces of text by first summarizing smaller chunks of text and then combining those summaries into a single summary.

In LangChain, you can use `MapReduceDocumentsChain` as part of the `load_summarize_chain` method. What you need to do is setting `map_reduce` as `chain_type` of your chain.

### Prompt design with `MapReduce` chain

In our example, you have a 32-page document that you need to summarize.

With LangChain, the `map_reduce` chain breaks the document down into 1024 token chunks max. Then it runs the initial prompt you define on each chunk to generate a summary of that chunk. In the example below, you use the following first stage or map prompt.

```Write a concise summary of the following text delimited by triple backquotes. Return your response in bullet points which covers the key points of the text.
'''{text}'''. BULLET POINT SUMMARY:```

Once summaries for all of the chunks are generated, it runs a different prompt to combine those summaries into a single summary. In the example below, you use the following second stage or combine prompt.

```Write a summary of the entire document that includes the main points from all of the individual summaries.```

In [None]:
map_prompt_template = """
                      Write a summary of this chunk of text that includes the main points and any important details.
                      {text}
                      """

map_prompt = PromptTemplate(template=map_prompt_template, input_variables=["text"])

# combine_prompt_template = """
#                       Write a  summary of around 400 words for the following text delimited by triple backquotes.
#                       Return your response in bullet points which covers the key points of the text.
#                       ```{text}```
#                       BULLET POINT SUMMARY:
#                       """
combine_prompt_template = """
                      Write a  summary of around 400 words for the following text delimited by triple backquotes.
                      ```{text}```
                      SUMMARY:
                      """
combine_prompt = PromptTemplate(
    template=combine_prompt_template, input_variables=["text"]
)

### Generate summaries using MapReduce method

After defining prompts, you initialize the associated `map_reduce_chain`.

In [None]:
map_reduce_chain = load_summarize_chain(
    llm_gpt_turbo,
    chain_type="map_reduce",
    map_prompt=map_prompt,
    combine_prompt=combine_prompt,
    return_intermediate_steps=True,
)

Then, you generate summaries using the chain. Notice that LangChain use a tokenizer (from transformer library) with 1024 token limit by default.

In [None]:
map_reduce_outputs = map_reduce_chain({"input_documents": [readme_doc]})

In [None]:
map_reduce_outputs

{'input_documents': [Document(page_content='Size Limit \n\nSize Limit is a performance budget tool for JavaScript. It checks every commit\non CI, calculates\xa0the real cost of\xa0your JS for end-users and throws an error\nif the cost exceeds the\xa0limit.\n\nES modules and tree-shaking support.\nAdd Size Limit to GitHub Actions, Circle CI or another CI system\n  to know if a pull request adds a\xa0massive\xa0dependency.\nModular to fit different use cases: big JS applications\n  that use their own bundler or\xa0small\xa0npm\xa0libraries\xa0with\xa0many files.\nCan calculate the time it would take a browser\n  to download and execute your JS. Time\xa0is\xa0a\xa0much\xa0more\xa0accurate\n  and\xa0understandable metric compared to the size in bytes.\nCalculations include all dependencies and polyfills\n  used in your JS.\n\n\n\n\nWith GitHub action Size Limit will post bundle size changes as a comment\nin pull request discussion.\n\n\n\nWith --why, Size Limit can tell you why your librar

In [None]:
map_reduce_outputs["input_documents"][0].page_content

In [None]:
map_reduce_outputs['output_text']

'Size Limit is a performance budget tool for JavaScript that helps developers keep track of the size of their JavaScript files. It is designed to ensure that the size of the JavaScript code does not exceed a predefined limit, thus optimizing the performance of the application.\n\nThe tool works by analyzing the size of JavaScript files and comparing them to a specified limit. If the size exceeds the limit, Size Limit throws an error, alerting the developer to the issue. This allows developers to catch and address any potential performance bottlenecks early on in the development process.\n\nOne of the key features of Size Limit is its simplicity. It is easy to set up and use, making it accessible to developers of all skill levels. The tool can be integrated into existing projects with minimal effort, and it provides clear and concise error messages that make it easy to identify and fix any size-related issues.\n\nSize Limit also offers flexibility in terms of configuration. Developers c

After summaries are generated, you can validate them by organize input documents and associated output in a Pandas Dataframe.

In [None]:
final_mp_data = []
for doc, out in zip(
    map_reduce_outputs["input_documents"], map_reduce_outputs["intermediate_steps"]
):
    output = {}
    output["file_name"] = p(doc.metadata["source"]).stem
    output["file_type"] = p(doc.metadata["source"]).suffix
    output["page_number"] = doc.metadata["page"]
    output["chunks"] = doc.page_content
    output["concise_summary"] = out
    final_mp_data.append(output)

In [None]:
pdf_mp_summary = pd.DataFrame.from_dict(final_mp_data)
pdf_mp_summary = pdf_mp_summary.sort_values(
    by=["file_name", "page_number"]
)  # sorting the dataframe by filename and page_number
pdf_mp_summary.reset_index(inplace=True, drop=True)
pdf_mp_summary.head()

In [None]:
index = 3
print("[Context]")
print(pdf_mp_summary["chunks"].iloc[index])
print("\n\n [Simple Summary]")
print(pdf_mp_summary["concise_summary"].iloc[index])
print("\n\n [Page number]")
print(pdf_mp_summary["page_number"].iloc[index])
print("\n\n [Source: file_name]")
print(pdf_mp_summary["file_name"].iloc[index])

### Considerations

With `MapReduce` method, the model is able to summarize a large paper by overcoming the context limit of `Stuffing` method with parallel processing.

However, the `MapReduce` requires multiple calls to the model and potentially losing context between pages.

To deal this challenge, you can try another method to summarize multiple pages at a time.

## Method 3: Refine

The Refine method is an alternative method to deal with large document summarization. It works by first running an initial prompt on a small chunk of data, generating some output. Then, for each subsequent document, the output from the previous document is passed in along with the new document, and the LLM is asked to refine the output based on the new document.

In LangChain, you can use `MapReduceDocumentsChain` as part of the load_summarize_chain method. What you need to do is setting `refine` as `chain_type` of your chain.

### Prompt design with `Refine` chain

With LangChain, the `refine` chain requires two prompts.

The question prompt to generate the output for subsequent task. The refine prompt to refine the output based on the generated content.

In this example, the question prompt is:

```
Please provide a summary of the following text.
TEXT: {text}
SUMMARY:
```

and the refine prompt is:

```
Write a concise summary of the following text delimited by triple backquotes.
Return your response in bullet points which covers the key points of the text.
```{text}```
BULLET POINT SUMMARY:
```


In [None]:
question_prompt_template = """
                  Please provide a summary of the following text.
                  TEXT: {text}
                  SUMMARY:
                  """

question_prompt = PromptTemplate(
    template=question_prompt_template, input_variables=["text"]
)

refine_prompt_template = """
              Write a concise summary of the following text delimited by triple backquotes.
              Return your response in bullet points which covers the key points of the text.
              ```{text}```
              BULLET POINT SUMMARY:
              """

refine_prompt = PromptTemplate(
    template=refine_prompt_template, input_variables=["text"]
)

### Generate summaries using Refine method

After you define prompts, you initiate a summarization chain using `refine` chain type.

In [None]:
refine_chain = load_summarize_chain(
    llm_gpt_turbo,
    chain_type="refine",
    question_prompt=question_prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=True,
)

Then, you use the summatization chain to summarize document using Refine method.

In [None]:
refine_outputs = refine_chain({"input_documents": [readme_doc]})

In [None]:
refine_outputs

{'input_documents': [Document(page_content='🦜️🔗 LangChain\n⚡ Building applications with LLMs through composability ⚡\n\n\n\n\n\n\n\n\n\n\n\n\nLooking for the JS/TS version? Check out LangChain.js.\nTo help you ship LangChain apps to production faster, check out LangSmith. \nLangSmith is a unified developer platform for building, testing, and monitoring LLM applications. \nFill out this form to get off the waitlist or speak with our sales team\n🚨Breaking Changes for select chains (SQLDatabase) on 7/28/23\nIn an effort to make langchain leaner and safer, we are moving select chains to langchain_experimental.\nThis migration has already started, but we are remaining backwards compatible until 7/28.\nOn that date, we will remove functionality from langchain.\nRead more about the motivation and the progress here.\nRead how to migrate your code here.\nQuick Install\npip install langchain\nor\npip install langsmith && conda install langchain -c conda-forge\n🤔 What is this?\nLarge language mod

Below you can see the resulting summaries.

In [None]:
final_refine_data = []
for doc, out in zip(
    refine_outputs["input_documents"], refine_outputs["intermediate_steps"]
):
    output = {}
    output["file_name"] = p(doc.metadata["source"]).stem
    output["file_type"] = p(doc.metadata["source"]).suffix
    output["page_number"] = doc.metadata["page"]
    output["chunks"] = doc.page_content
    output["concise_summary"] = out
    final_refine_data.append(output)

In [None]:
pdf_refine_summary = pd.DataFrame.from_dict(final_refine_data)
pdf_refine_summary = pdf_mp_summary.sort_values(
    by=["file_name", "page_number"]
)  # sorting the datafram by filename and page_number
pdf_refine_summary.reset_index(inplace=True, drop=True)
pdf_refine_summary.head()

In [None]:
index = 3
print("[Context]")
print(pdf_refine_summary["chunks"].iloc[index])
print("\n\n [Simple Summary]")
print(pdf_refine_summary["concise_summary"].iloc[index])
print("\n\n [Page number]")
print(pdf_refine_summary["page_number"].iloc[index])
print("\n\n [Source: file_name]")
print(pdf_refine_summary["file_name"].iloc[index])

### Considerations

In short, the Refine method for text summarization with LLMs can pull in more relevant context and may be less lossy than Map Reduce. However, it requires many more calls to the LLM than Stuffing, and these calls are not independent, meaning they cannot be parallelized. Additionally, there is some potential dependency on the ordering of the documents. Latest documents they might become more relevant as this method suffers from recency bias.

## Conclusion


In this notebook you learn about different techniques to summarize long documents with LangChain and PaLM API. What you have seen in this notebook are only some of the possibilities you have. For example, there is another method called the Map-Rerank method which involves running an initial prompt on each chunk of data, which not only tries to complete a task but also gives a score for how certain it is in its answer. The responses are then ranked according to this score, and the highest score is returned.

With that being said, it is important to highlight that depending on your needs you may consider to use pure Foundational model with a custom framework to build generative ai application.

Here are some of the benefits of using a foundational model with a custom framework:

 - More flexibility to implement your application with different LLMs, prompting templates, document handling strategies and more.

 - More control to customize your generative applications based on your scenario.

 - Better performance to improve latency and scalability of your application.
