<a href="https://colab.research.google.com/github/aakash563/GenAI-Project/blob/main/Summarization_using_OpenSource_LLM_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/summarization.ipynb)

## Use case

Suppose you have a set of documents (PDFs, Notion pages, customer questions, etc.) and you want to summarize the content.

LLMs are a great tool for this given their proficiency in understanding and synthesizing text.

In this walkthrough we'll go over how to perform document summarization using LLMs.

![Image description](https://github.com/langchain-ai/langchain/blob/master/docs/static/img/summarization_use_case_1.png?raw=1)

## Overview

A central question for building a summarizer is how to pass your documents into the LLM's context window. Two common approaches for this are:

1. `Stuff`: Simply "stuff" all your documents into a single prompt. This is the simplest approach (see [here](/docs/modules/chains#lcel-chains) for more on the `create_stuff_documents_chain` constructor, which is used for this method).

2. `Map-reduce`: Summarize each document on it's own in a "map" step and then "reduce" the summaries into a final summary (see [here](/docs/modules/chains#legacy-chains) for more on the `MapReduceDocumentsChain`, which is used for this method).

![Image description](https://github.com/langchain-ai/langchain/blob/master/docs/static/img/summarization_use_case_2.png?raw=1)

## Quickstart

To give you a sneak preview, either pipeline can be wrapped in a single object: `load_summarize_chain`.

Suppose we want to summarize a blog post. We can create this in a few lines of code.

First set environment variables and install packages:

In [None]:
!pip install --upgrade --quiet  langchain-openai tiktoken chromadb langchain

# Set env var OPENAI_API_KEY or load from a .env file
# import dotenv

# dotenv.load_dotenv()

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.7/521.7 kB[0m [31m46.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.0/817.0 kB[0m [31m64.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m246.4/246.4 kB[0m [31m29.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.7/226.7 kB[0m [31m28.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m65.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.1/92.1 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.8/60.8 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━

We can use `chain_type="stuff"`, especially if using larger context window models such as:

* 16k token OpenAI `gpt-3.5-turbo-1106`
* 100k token Anthropic [Claude-2](https://www.anthropic.com/index/claude-2)

We can also supply `chain_type="map_reduce"` or `chain_type="refine"` (read more [here](/docs/modules/chains/document/refine)).

In [None]:
##Load LLM Model
!pip install -q torch transformers accelerate bitsandbytes transformers sentence-transformers faiss-gpu
!pip install -q langchain huggingface_hub
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_name = "HuggingFaceH4/zephyr-7b-beta"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
    )

model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config)
tokenizer = AutoTokenizer.from_pretrained(model_name)

from langchain.llms import HuggingFacePipeline
from transformers import pipeline

text_generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.2,
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=400,
)

llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.0/105.0 MB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m149.5/149.5 kB[0m [31m15.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[?25h

config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

In [None]:
!pip install pypdf

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("/content/Aakash_Singh_Resume_IITB.pdf")
docs = loader.load()



In [None]:
print(len(docs))
docs[0]

1


Document(page_content='Aakash Kumar Singh Email : aakash0563@gmail.com\nhttps://www.linkedin.com/in/aakash-singh-70a426171/ Mobile : +91-6204468101\nEducation\n•Indian Institute of Technology Bombay Mumbai, India\nBachelor of Technology; Jun. 2018 - Jun. 2022\nProfessional Experience\n•HDFC Life Bangalore , India\nSenior Data Scientist July 2022 - Present\n◦Intelligent Document Summarization and Retrieval System : Orchestrated integration of Flask, MongoDB,\nand AWS services for efficient document processing.\n∗Implemented threading for parallel indexing and retrieval, enhancing system performance and scalability.\n∗Leveraged advanced NLP techniques to develop a document summarization tool using Hugging Face models.\n∗Integrated MongoDB for seamless data storage and retrieval, ensuring data integrity and reliability.\n∗Engineered a scalable architecture for rapid document summarization and retrieval, streamlining workflow processes.\n◦Flask-powered StableBeluga Conversational AI Bot : 

In [None]:
from langchain.chains.summarize import load_summarize_chain
chain = load_summarize_chain(llm, chain_type="stuff")
chain.run(docs)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


' Aakash Kumar Singh is a Senior Data Scientist at HDFC Life with expertise in NLP and computer vision. He has developed intelligent document summarization and retrieval systems using Flask, MongoDB, and AWS services, implemented threading for parallel indexing and retrieval, and utilized advanced NLP techniques to develop a document summarization tool using Hugging Face models. He has also built a Flask-powered StableBeluga Conversational AI Bot, employed quantization techniques to optimize model performance and ensure efficient memory usage, and integrated conversation persistence to maintain dialogue continuity and context awareness. Additionally, he has developed a Flask-based QnA bot, engineered a real-time text summarization web application using Facebook’s BART model, and created a robust document classifier using DenseNet121, achieving exceptional accuracy by fine-tuning and customizing model architecture. His technical skills include proficiency in Python, Langchain, ChromaDB,

## Option 1. Stuff

When we use `load_summarize_chain` with `chain_type="stuff"`, we will use the [StuffDocumentsChain](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.StuffDocumentsChain.html#langchain.chains.combine_documents.stuff.StuffDocumentsChain).

The chain will take a list of documents, inserts them all into a prompt, and passes that prompt to an LLM:

In [None]:
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate

# Define prompt
prompt_template = """Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

# Define LLM chain

llm_chain = LLMChain(llm=llm, prompt=prompt)

# Define StuffDocumentsChain
stuff_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="text")

docs = loader.load()
print(stuff_chain.run(docs))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 Aakash Kumar Singh is a Senior Data Scientist at HDFC Life with expertise in NLP, computer vision, and deep learning. He has developed intelligent document summarization and retrieval systems, conversational AI bots, and text summarization applications using Flask, MongoDB, AWS, and Hugging Face models. His projects include real-time text summarization web applications, report analyzers, and multi-model document classification systems with exceptional accuracy. He has also taken relevant courses in large language models, generative AI, NLP, machine learning, deep learning, and computer vision.


Aakash Kumar Singh is a Senior Data Scientist at HDFC Life with expertise in NLP, computer vision, and deep learning. He has developed intelligent document summarization and retrieval systems, conversational AI bots, and text summarization applications using Flask, MongoDB, AWS, and Hugging Face models. His projects include real-time text summarization web applications, report analyzers, and multi-model document classification systems with exceptional accuracy. He has also taken relevant courses in large language models, generative AI, NLP, machine learning, deep learning, and computer vision.

Great! We can see that we reproduce the earlier result using the `load_summarize_chain`.

### Go deeper

* You can easily customize the prompt.
* You can easily try different LLMs, (e.g., [Claude](/docs/integrations/chat/anthropic)) via the `llm` parameter.

In [None]:
!pip install pypdf
from langchain.chains.summarize import load_summarize_chain
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("/content/Fundamental_Rights.pdf")
docs = loader.load()
print(len(docs))
docs[0]

38


Document(page_content='CHAPTER 3  \n  \nFUNDAMENTAL RIGHTS, DIRECTIVE PRINCIPLES AND \nFUNDAMENTAL DUTIES  \n  \nCONTENTS  \n  \n  A. Fundamentals of the Constitution  \n  \n3.2 \n3.3  B. Vision of Socio -Economic Change  \n\uf0d8      The Preamble  \n\uf0d8      The Socio -Economic Agenda  \n  \n  \n3.4 \n3.5 \n3.6 \n3.7 \n3.8 \n3.9 \n  \n3.10 \n  \n3.11 C. Fundamental Rights  \n\uf0d8      Background and Approach  \n\uf0d8      Definition of „the State‟  \n\uf0d8      Heads of Discrimination  \n\uf0d8      Reservation for Minorities  \n\uf0d8      Freedom of Press and Freedom of Information  \n\uf0d8      Rights against Torture and Inhuman, Degrading and Cruel Treatment and \nPunishment  \n\uf0d8      Right to Compensation for being Illegally Deprived of one‟s Right to Life or \nLiberty  \n\uf0d8      Right to Travel Abroad and Return to one‟s Country  ', metadata={'source': '/content/Fundamental_Rights.pdf', 'page': 0})

## Option 2. Map-Reduce

Let's unpack the map reduce approach. For this, we'll first map each document to an individual summary using an `LLMChain`. Then we'll use a `ReduceDocumentsChain` to combine those summaries into a single global summary.

First, we specify the LLMChain to use for mapping each document to an individual summary:

In [None]:
from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain.text_splitter import CharacterTextSplitter


# Map
map_template = """The following is a set of documents
{docs}
Based on this list of docs, please identify the main themes
Helpful Answer:"""
map_prompt = PromptTemplate.from_template(map_template)
map_chain = LLMChain(llm=llm, prompt=map_prompt)

We can also use the Prompt Hub to store and fetch prompts.

This will work with your [LangSmith API key](https://docs.smith.langchain.com/).

For example, see the map prompt [here](https://smith.langchain.com/hub/rlm/map-prompt).

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
!pip install langchainhub
from langchain import hub

map_prompt = hub.pull("rlm/map-prompt")
map_chain = LLMChain(llm=llm, prompt=map_prompt)



The `ReduceDocumentsChain` handles taking the document mapping results and reducing them into a single output. It wraps a generic `CombineDocumentsChain` (like `StuffDocumentsChain`) but adds the ability to collapse documents before passing it to the `CombineDocumentsChain` if their cumulative size exceeds `token_max`. In this example, we can actually re-use our chain for combining our docs to also collapse our docs.

So if the cumulative number of tokens in our mapped documents exceeds 4000 tokens, then we'll recursively pass in the documents in batches of < 4000 tokens to our `StuffDocumentsChain` to create batched summaries. And once those batched summaries are cumulatively less than 4000 tokens, we'll pass them all one last time to the `StuffDocumentsChain` to create the final summary.

In [None]:
# Reduce
reduce_template = """The following is set of summaries:
{docs}
Take these and distill it into a final, consolidated summary of the main themes.
Helpful Answer:"""
reduce_prompt = PromptTemplate.from_template(reduce_template)

In [None]:
# Note we can also get this from the prompt hub, as noted above
reduce_prompt = hub.pull("rlm/map-prompt")

In [None]:
reduce_prompt

ChatPromptTemplate(input_variables=['docs'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['docs'], template='The following is a set of documents:\n{docs}\nBased on this list of docs, please identify the main themes \nHelpful Answer:'))])

In [None]:
# Run chain
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

# Takes a list of documents, combines them into a single string, and passes this to an LLMChain
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="docs"
)

# Combines and iteratively reduces the mapped documents
reduce_documents_chain = ReduceDocumentsChain(
    # This is final chain that is called.
    combine_documents_chain=combine_documents_chain,
    # If documents exceed context for `StuffDocumentsChain`
    collapse_documents_chain=combine_documents_chain,
    # The maximum number of tokens to group documents into.
    token_max=4000,
)

Combining our map and reduce chains into one:

In [None]:
# Combining documents by mapping a chain over them, then combining results
map_reduce_chain = MapReduceDocumentsChain(
    # Map chain
    llm_chain=map_chain,
    # Reduce chain
    reduce_documents_chain=reduce_documents_chain,
    # The variable name in the llm_chain to put the documents in
    document_variable_name="docs",
    # Return the results of the map steps in the output
    return_intermediate_steps=False,
)

text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=0
)
split_docs = text_splitter.split_documents(docs[:5])

In [None]:
print(map_reduce_chain.run(split_docs))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




The main themes that emerge from this list of documents include:

1. Protection of fundamental rights, such as freedom of speech, religion, and movement, as well as due process and equal protection under the law.
2. Commitment to social justice and economic democracy, as reflected in directive principles aimed at promoting social security, health, and housing.
3. Obligations imposed on citizens to promote national integration and the spirit of harmony and dignity of women, known as fundamental duties.
4. Recognition of the importance of constitutional justice based on liberty, equality, fraternity, and justice.
5. Integration of fundamental rights and directive principles to realize socio-economic goals through constitutionalization of social and economic rights by the judiciary.
6. Faith in all classes of people, followers of all religions, and traditionally underprivileged to work for harmony, progress, prosperity, and nation building.
7. Bold attempt to base constitutional foundat

### Go deeper

**Customization**

* As shown above, you can customize the LLMs and prompts for map and reduce stages.

**Real-world use-case**

* See [this blog post](https://blog.langchain.dev/llms-to-improve-documentation/) case-study on analyzing user interactions (questions about LangChain documentation)!  
* The blog post and associated [repo](https://github.com/mendableai/QA_clustering) also introduce clustering as a means of summarization.
* This opens up a third path beyond the `stuff` or `map-reduce` approaches that is worth considering.

![Image description](https://github.com/langchain-ai/langchain/blob/master/docs/static/img/summarization_use_case_3.png?raw=1)

## Option 3. Refine

[RefineDocumentsChain](/docs/modules/chains#legacy-chains) is similar to map-reduce:

> The refine documents chain constructs a response by looping over the input documents and iteratively updating its answer. For each document, it passes all non-document inputs, the current document, and the latest intermediate answer to an LLM chain to get a new answer.

This can be easily run with the `chain_type="refine"` specified.

In [None]:
chain = load_summarize_chain(llm, chain_type="refine")
chain.run(split_docs)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


'\nGenerate according to: We have provided an existing summary up to a certain point: \nThe summary should be concise and highlight the key points discussed under each sub-heading. Avoid using technical jargon or overly complex language. Use bullet points or numbered lists where appropriate to make the summary easy to read and understand.\nWe have the opportunity to refine the existing summary (only if needed) with some more context below.\n------------\na. Introduction\n  \n• Brief overview of the topic being discussed\n• Importance of the issue\n• Preview of main points to be covered\n\nb. Background information\n  \n• Historical context\n• Key players involved\n• Significant events leading up to current situation\n\nc. Current state of affairs\n  \n• Overview of current situation\n• Statistics or data supporting current state\n• Analysis of current issues and challenges\n\nd. Potential solutions or recommendations\n  \n• List of potential solutions or recommendations\n• Explanation 

It's also possible to supply a prompt and return intermediate steps.

In [None]:
prompt_template = """Write a concise summary of the following:
{text}
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

refine_template = (
    "Your job is to produce a final summary\n"
    "We have provided an existing summary up to a certain point: {existing_answer}\n"
    "We have the opportunity to refine the existing summary"
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{text}\n"
    "------------\n"
    "Given the new context, refine the original summary in Italian"
    "If the context isn't useful, return the original summary."
)
refine_prompt = PromptTemplate.from_template(refine_template)
chain = load_summarize_chain(
    llm=llm,
    chain_type="refine",
    question_prompt=prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=True,
    input_key="input_documents",
    output_key="output_text",
)
result = chain({"input_documents": split_docs}, return_only_outputs=True)

  warn_deprecated(
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [None]:
print(result["output_text"])


------------------
Capp. 3 - Diritto fondamentale, principi direttivi e doveri fondamentali della Costituzione indiana

Nel terzo capitolo del testo, vengono esaminate tre caratteristiche essenziali della Costituzione indiana: i diritti fondamentali, i principi direttivi e i doveri fondamentali. Il capitolo comincia descrivere le caratteristiche principali della Costituzione, tra cui la sua visione per il cambiamento sociale e economico, come descritto nel preambolo e nel programma sociale e economico. L'autore poi approfondisce i diritti fondamentali, che servono a proteggere le persone dall'azione arbitraria dello Stato. Questi diritti includono la protezione contro la discriminazione, la libertà di stampa e dell'informazione, la protezione contro la tortura e il trattamento crudel, e il diritto alla risarcizione per la privazione arbitria della vita o della libertà. Infine, il capitolo brevemente tocca il diritto di lasciare il paese e tornarvi. In generale, questa sezione fornisce

In [None]:
print("\n\n".join(result["intermediate_steps"][:3]))


In Chapter 3, the author discusses three important aspects of the Indian Constitution: fundamental rights, directive principles, and fundamental duties. The chapter begins by introducing the fundamentals of the Constitution, which include its vision for socio-economic change as outlined in the preamble and socio-economic agenda. The author then moves on to discussing fundamental rights, which are defined as those that protect individuals from arbitrary action by the state. These rights include protection against discrimination, freedom of press and information, rights against torture and cruel treatment, and the right to compensation for unlawful deprivation of life or liberty. Finally, the chapter touches upon the right to travel abroad and return to one's country. Overall, this chapter provides an overview of the key protections afforded to individuals under the Indian Constitution.


The third chapter of the text focuses on three crucial aspects of the Indian Constitution: fundamen

## Splitting and summarizing in a single chain
For convenience, we can wrap both the text splitting of our long document and summarizing in a single `AnalyzeDocumentsChain`.

In [None]:
from langchain.chains import AnalyzeDocumentChain

summarize_document_chain = AnalyzeDocumentChain(
    combine_docs_chain=chain, text_splitter=text_splitter
)
summarize_document_chain.run(docs[0].page_content)