## Summarizing a document

First, we will load the document and extract the text

In [None]:
import pdfplumber

with pdfplumber.open("long_pdf.pdf") as pdf:
    pdf_text = ""
    for page in pdf.pages:
        pdf_text += str(page.extract_text())

pdf_text

In [5]:
len(pdf_text)

804000

Next, we need to count the amount of tokens.

To accomplish this we will use the Openai's `tiktoken`library, using the GPT-4 tokeniser.

In [6]:
import tiktoken

encoding = tiktoken.encoding_for_model("gpt-4")
tokens = encoding.encode(pdf_text)

f'Amount of GPT-4 tokens: {len(tokens)}'

'Amount of GPT-4 tokens: 258487'

The goal is to implement a cheap and efficient solution.

Since the tokens length is very big (250k+), we can analyze three different approaches:

- `Prompt Stuffing`
- `Map Reduce`
- `Refine`

## Prompt Stuffing

> The stuff documents chain ("stuff" as in "to stuff" or "to fill") is the most straightforward of the document chains. It takes a list of documents, inserts them all into a prompt and passes that prompt to an LLM.

In the context of document summarization, the prompt stuffing would be simply pass all the documents text as context in a single prompt, for example:
```py
"""
Summarize the document:

{document_text}
""" 
```

Pros:

- Efficient (Single LLM call)
- No context loss

Limitations:

- `Context window`: The context window is the amount of tokens a LLM is capable to receive as input. The most part of the models cannot receive 250k+ tokens.
- Probably expensive, due to the need of using a more robust model.

> Google already has models (in preview) that supports a large amount of tokens as input. However, for prompts greater than 128k tokens, double the price is charged

## Map Reduce

The Map Reduce technique is a method for processing large datasets in parallel across a cluster of computers. In the context of document summarization, Map Reduce can be used to break down a large document into smaller chunks, process each chunk independently, and then combine the results to produce a summary.

The "Map" step involves dividing the document into smaller chunks and processing each chunk in parallel. This can be done by creating a prompt for each chunk and passing it to a language model to generate a summary.

The "Reduce" step involves combining the summaries from each chunk to produce a final summary. This can be done by concatenating the summaries, ranking them, or using other methods to combine the results.

By breaking down the document into smaller chunks, it is possible to process the document in parallel and produce a summary that is more comprehensive and accurate.

Pros:
- Parallelism (although the need of multiple LLM calls)

Limitations:
- Context loss: each chunk is processed separately, so the chunks summaries are not truly connected


## Refine 
The Refine technique involves refining the initial summary generated by the LLM. This can be done by applying various post-processing techniques such as re-ranking the sentences, removing redundant information, or adding additional context.

Pros:
- Improved summary quality

Limitations:
- Requires additional processing steps


