# Summarizing texts

Large Language Models are very good at processing large amounts of text and paying attention to the most important parts. We can use this ability to summarize texts.

In this notebook, we will use an LLM to summarize the [Federalist Papers](https://en.wikipedia.org/wiki/The_Federalist_Papers).


## Loading our API key

At this point you should have set up a file named `secrets.env` with your OpenAI API key. We will now use a lightweight Python package called `dotenv` to read in this file and set its contents as environment variables:


In [None]:
from dotenv import load_dotenv
import os

load_dotenv("../../secrets.env")

os.getenv(
    "OPENAI_API_KEY"
) is not None  # Do not print the key itself! We want to keep it secret

## Loading text documents

If you are running this on Dartmouth's JupyterHub, the documents are already stored as individual text files for you as a dataset online. If you are running this anywhere else, [download and extract the dataset](https://git.dartmouth.edu/lib-digital-strategies/RDS/datasets/federalist-papers-dataset/-/archive/main/federalist-papers-dataset-main.zip) and change the path in the next cell accordingly.


In [None]:
from pathlib import Path

docs_dir = Path(
    "~/shared/RR-workshop-data/federalist-papers-dataset/split"
).expanduser()

We could use regular Python to read each of the papers like so:


In [None]:
with open(docs_dir / "federalist_1.txt") as f:
    doc = f.read()

print(doc[:200])

However, we can benefit from _LangChain_'s document loaders, which reads all the files in the directory with just a few lines of code:


In [None]:
from langchain.document_loaders import DirectoryLoader

doc_loader = DirectoryLoader(docs_dir, show_progress=True)
docs = doc_loader.load()

## Creating a summary

### Stuffing

The most straightforward way to ask a model to summarize a piece of text, is to send the entire text and ask the LLM to summarize it. This approach is called _stuffing_:


In [None]:
from langchain.chat_models import ChatOpenAI
from openai.error import InvalidRequestError

llm = ChatOpenAI(model="gpt-3.5-turbo")

try:
    for doc in docs[:3]:
        print("-" * 30)
        print(
            llm.predict(
                "Summarize the following text in 200 words: \n" + doc.page_content
            )
        )
except InvalidRequestError as e:
    print(e._message)

As we can see, the model successfully summarizes the first document, but fails to summarize the second one.

The reason for this is the length of the text: A Large Language Model cannot process arbitrarily long input strings. The maximum number of tokens (one token ~ 3/4 word) an LLM can take into account while producing the next output token is called the _context window_ (or \_context_length). You can think of this as the "attention span" of the model.

Checking [the official documentation](https://platform.openai.com/docs/models/gpt-3-5), we confirm that the context window for the base `gpt-3.5-turbo` model is 4096. However, there is a model called `'gpt-3.5-turbo-16k'` that offers four times the context window. So let's switch to that and try again:


In [None]:
llm = ChatOpenAI(model="gpt-3.5-turbo-16k")

for doc in docs[:3]:
    print("-" * 30)
    print(
        llm.predict("Summarize the following text in 200 words: \n" + doc.page_content)
    )

### Using a chain

Since summarizing documents is a popular use-case, _LangChain_ offers a pre-configured chain `StuffDocumentsChain`:


In [None]:
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains.combine_documents.stuff import StuffDocumentsChain

# Define prompt
prompt_template = "Write a concise summary of the following:{text}"

prompt = PromptTemplate.from_template(prompt_template)

# Define LLM chain
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k")
llm_chain = LLMChain(llm=llm, prompt=prompt)

# Define StuffDocumentsChain
stuff_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="text")

print(stuff_chain.run(docs[:1]))

### Convenience wrapper

We can simplify this even more by using the convenient wrapper `load_summarize_chain()`, which comes with a predefined prompt suitable for this task:


In [None]:
from langchain.chains.summarize import load_summarize_chain

chain = load_summarize_chain(llm=llm, chain_type="stuff")

chain.run(docs[:1])

## Map-reduce

Stuffing works great for shorter texts and models with a large context window. But what if we want to process longer documents, or a much larger number of documents?

<div class="alert alert-block alert-info">

The size of the context window appears to be a major area of concern for the big AI companies: OpenAI has recently released a [new version of GPT-4](https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo) with a context window of 128,000 tokens, and [Anthropic's Claude](https://www.anthropic.com/product) can process 100,000 tokens. These language models can therefore process entire books in a single prompt. It may therefore seem obsolete to worry about the issue of a short context window, but there are still even some of the big names, like [Google's PaLM 2](https://developers.generativeai.google/products/palm), that only offer relatively small context windows (e.g., 8,000 tokens). In the open-source community, models generally tend to have even smaller context windows (e.g., [Llama 2](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) with only 4000 tokens). There are also effects of the context length on a model's performance that are under active investigation.

Of course, all of this may change given the rapid development speed in the field. So watch this space!

</div>

One technique to consider in this case is _Map-reduce_. It consists of two steps:

1. **Map**: Chunk the texts to summarize into smaller units (e.g., shorter documents, paragraphs) and summarize each chunk
2. **Reduce**: Treat the summaries like new documents and summarize the summaries with the LLM

Given what we have learned so far, we have all the tools we need to implement this technique. Here is what we need:

1. A summarization chain that produces one summary for each document.
2. A second summarization chain that produces a final summary from the intermediate summaries.

Theoretically, we could even use the same summarization chain for each step, but it may be helpful keep them separate to provide more specific prompts to each of them.


### Exercise

1. Implement the first chain (`map_chain`) using a basic `LLMChain`. You can use the following prompt:

```
"Write a concise summary of the following document: {text}"
```

2. Implement the second chain (`reduce_chain`) using another `LLMChain`. Here, you can use the following prompt:

```
"Create a consolidated summary based on the following summaries of individual letters from the Federalist Papers: {summaries}"
```


In [None]:
prompt_template = "Write a concise summary of the following document: {text}"
prompt = PromptTemplate.from_template(prompt_template)

llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k")
map_chain = LLMChain(llm=llm, prompt=prompt)

In [None]:
prompt_template = "Create a consolidated summary based on the following summaries of individual letters from the Federalist Papers: {summaries}"
prompt = PromptTemplate.from_template(prompt_template)
reduce_chain = LLMChain(llm=llm, prompt=prompt)

Now, we can put it all together by iterating over all our documents, stuffing all summaries into a single string, and producing the intermediate summaries with the `map_chain`


In [None]:
# Create intermediate summaries
summaries = []
for doc in docs[:2]:
    summary = map_chain.run(doc)
    summaries.append(summary)

# Stuff all summaries into the context
stuffed_summaries = "\n---\n".join(summaries)
print(stuffed_summaries)

# Create final summary
summary = reduce_chain.run(stuffed_summaries)
print("********** Final summary **********")
print(summary)

Not bad! And definitely faster than if we had produced it manually. So there you have it: Speedreading with LLMs!

<div class="alert alert-block alert-info">

Our manual implementation works just fine in this case, because all of the intermediate summaries combined fit into the context window of the model used in the _reduce_ step. This is not generally the case, though. If the sum of the intermediate summaries is still too long, we can use an iterative approach: For example, we can reduce just a few of the intermediate summaries at a time, and then reduce those summarized summaries further. We can repeat this process as many times as needed.

_LangChain_ offers a few convenient objects to implement this behavior. This will take you deeper into the weeds of the framework, though, so it is considered beyond the scope of this workshop. If you want to take a peek at what that might look like, though, check out [this notebook](./02x-map_reduce.ipynb) as a jumping-off point.

</div>


In our [next notebook](./03-text-analysis-with-llms.ipynb), we will step away from text summarization and explore, how we can perform many of the analyses we have introduced in Session 2 using a Large Language Model!


<table >
<tbody>
  <tr>
    <td style="padding:0px;border-width:0px;vertical-align:center">    
    Created by Simon Stone for Dartmouth College Library under <a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons CC BY-NC 4.0 License</a>.<br>For questions, comments, or improvements, email <a href="mailto:researchdatahelp@groups.dartmouth.edu">Research Data Services</a>.
    </td>
    <td style="padding:0 0 0 1em;border-width:0px;vertical-align:center"><img alt="Creative Commons License" src="https://i.creativecommons.org/l/by/4.0/88x31.png"/></td>
  </tr>
</tbody>
</table>
