# Abstractive Text Summarization with Amazon Titan

> *This notebook should work well with the **`Data Science 3.0`** kernel in SageMaker Studio*

## Overview
When we work with large documents, we can face some challenges as the input text might not fit into the model's context window, or the model hallucinates with large documents, or out of memory errors occur, etc.

To solve those problems, we are going to show an architecture that is based on the concept of chunking and chaining prompts. This architecture is leveraging [LangChain](https://python.langchain.com/docs/get_started/introduction.html) which is a popular framework for developing applications powered by language models.

### Architecture

![](../../imgs/42-text-summarization-2.png)

In this architecture:

1. A large document (or a giant file appending small ones) is loaded
2. A LangChain utility is used to split it into multiple smaller chunks (chunking)
3. The first chunk is sent to the model; the model returns the corresponding summary
4. LangChain gets next chunk and appends it to the returned summary and sends the combined text as a new request to the model; the process repeats until all chunks are processed
5. In the end, you have a final summary based on entire content

### Use case
This approach can be used to summarize call transcripts, meetings transcripts, books, articles, blog posts, and other relevant content.

### Pre-requisites

Install LangChain pre-requisites

In [17]:
%pip install -U --no-cache-dir boto3
%pip install -U --no-cache-dir  \
    "langchain>=0.1.11" \
    lanchain-aws==0.1.0 \
    sqlalchemy -U \
    "faiss-cpu>=1.7,<2" \
    "pypdf>=3.8,<4" \
    pinecone-client==2.2.4 \
    apache-beam==2.52. \
    tiktoken==0.5.2 \
    "ipywidgets>=7,<8" \
    matplotlib==3.8.2 \
    anthropic==0.9.0
%pip install -U --no-cache-dir transformers

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[31mERROR: Could not find a version that satisfies the requirement lanchain-aws==0.1.0 (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for lanchain-aws==0.1.0[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


## Setup

⚠️ ⚠️ ⚠️ Before running this notebook, ensure you've run the [Bedrock boto3 setup notebook](../00_Intro/bedrock_boto3_setup.ipynb#Prerequisites) notebook. ⚠️ ⚠️ ⚠️


## Summarize long text 

### Configuring LangChain with Boto3

LangChain automatically passes boto3 session information to LangChain from your environment.

You need to specify an LLM for the LangChain BedrockLLM class. You can also pass in arguments for inference such as temperature and top_p. Here you specify Amazon Titan Text Large in `model_id` and pass Titan's inference parameter in `textGenerationConfig`.

In [18]:
from langchain_aws import BedrockLLM

modelId = "amazon.titan-tg1-large"
llm = BedrockLLM(
    model_id=modelId,
    model_kwargs={
        "maxTokenCount": 4096,
        "stopSequences": [],
        "temperature": 0,
        "topP": 1,
    },
    # client=boto3_bedrock,
)

### Loading a text file with many tokens

In `letters` directory, you can find a text file of [Amazon's CEO letter to shareholders in 2022](https://www.aboutamazon.com/news/company-news/amazon-ceo-andy-jassy-2022-letter-to-shareholders). The following cell loads the text file and counts the number of tokens in the file. 

You will see warning indicating the number of tokens in the text file exceeeds the maximum number of tokens for this model.

In [19]:
# You may have to run this block twice
shareholder_letter = "./letters/2022-letter.txt"

with open(shareholder_letter, "r") as file:
    letter = file.read()
    
llm.get_num_tokens(letter)

6526

### Splitting the long text into chunks

The text is too long to fit in the context windows for the Titan model we've chosen, so we will split it into smaller chunks.
`RecursiveCharacterTextSplitter` in LangChain supports splitting long text into chunks recursively until size of each chunk becomes smaller than `chunk_size`. A text is separated with `separators=["\n\n", "\n"]` into chunks, which avoids splitting each paragraph into multiple chunks.

Using 6,000 characters per chunk, we can get summaries for each portion separately. The number of tokens, or word pieces, in a chunk depends on the text.

In [20]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n"], chunk_size=4000, chunk_overlap=100
)

docs = text_splitter.create_documents([letter])

In [21]:
num_docs = len(docs)

num_tokens_first_doc = llm.get_num_tokens(docs[0].page_content)

print(
    f"Now we have {num_docs} documents and the first one has {num_tokens_first_doc} tokens"
)

Now we have 10 documents and the first one has 439 tokens


### Summarizing chunks and combining them

Assuming that the number of tokens is consistent in the other docs we should be good to go. Let's use LangChain's [load_summarize_chain](https://python.langchain.com/en/latest/use_cases/summarization.html) to summarize the text. `load_summarize_chain` provides three ways of summarization: `stuff`, `map_reduce`, and `refine`. 
- `stuff` puts all the chunks into one prompt. Thus, this would hit the maximum limit of tokens.
- `map_reduce` summarizes each chunk, combines the summary, and summarizes the combined summary. If the combined summary is too large, it would raise error.
- `refine` summarizes the first chunk, and then summarizes the second chunk with the first summary. The same process repeats until all chunks are summarized.

`map_reduce` and `refine` invoke LLM multiple times and takes time to obtain the final summary. 
Let's try `map_reduce` here. 

In [33]:
from langchain import PromptTemplate
map_prompt_template = """Write a summary of this chuck of text that includes the main points and any important details: {text}"""
map_prompt = PromptTemplate(template=map_prompt_template, input_variables=["text"])
combined_prompt_template = """Write a concise summary of the following text: {text}"""
combined_prompt = PromptTemplate(template=combined_prompt_template, input_variables=["text"])

In [38]:
# Set verbose=True if you want to see the prompts being used
from langchain.chains.summarize import load_summarize_chain
summary_chain = load_summarize_chain(llm=llm, chain_type="map_reduce", verbose=False, map_prompt=map_prompt, combine_prompt=combined_prompt)

> ⏰ **Note:** Depending on your number of documents, Bedrock request rate quota, and configured retry settings - the chain below may take some time to run.

In [39]:
%%time
output = ""
try:
    output = summary_chain.invoke({"input_documents": docs})
except ValueError as error:
    if  "AccessDeniedException" in str(error):
        print(f"\x1b[41m{error}\
        \nTo troubleshoot this issue please refer to the following resources.\
         \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
         \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")      
        class StopExecution(ValueError):
            def _render_traceback_(self):
                pass
        raise StopExecution        
    else:
        raise error

CPU times: user 58.4 ms, sys: 11.2 ms, total: 69.6 ms
Wall time: 2min 19s


In [46]:
output.get('output_text')

"\nAmazon has had a successful year despite facing challenges in 2022. The company has grown demand, innovated in its largest businesses, and made adjustments to its investment decisions. Amazon operates in large, dynamic, global market segments with many capable and well-funded competitors, and the company has experienced constant change over the past 25 years. In 1997, Amazon was a books-only retailer, and today it sells nearly every physical and digital retail item. Similarly, building a business around technology infrastructure services in the cloud was not obvious in 2003, and Amazon has since launched AWS and Kindle.\n\nAWS has made significant structural changes to deliver lower costs and faster speed. This included reevaluating the US fulfillment network, which had one national network that distributed inventory from fulfillment centers spread across the country. Last year, AWS started rearchitecting its inventory placement strategy and leveraging its larger fulfillment center 