# Demo: Summarizing PDFs with Map-Reduce Technique

**Objective:**

This notebook demoes a workflow leveraging the `Map-Reduce` technique to generate coherent summaries of PDF documents.

The workflow consists of three sequential steps:
1. **Load and split PDF**: Load the desired PDF and create text chunks for each page.
2. **Map-Step**: For each page, generate a concise summary (~150 words) using `GPT-3.5-turbo`.
3. **Reduce-Step**: Create final, consolidated summary using `GPT-4-turbo `from set of summaries generated in `Map-Step`.

**Key Advantages:**

This modular approach, which breaks down a complex task into manageable sub-tasks, offers several advantages over directly generating a summary from the whole PDF:
- **Handling of Large Data Volumes**: Enables efficient handling of larger documents by breaking them down into manageable pages, which allows for parallel processing.
- **Cost Savings with Cheaper LLMs**: Allows for the use of more cost-effective LLMs for generating intermediate summaries in the `Map-Step`, reducing overall operational costs while maintaining effectiveness.
- **Improved Summary Quality**: Each page is summarized individually, leading to more nuanced and accurate summaries, with the flexibility to customize and adjust the process for different use cases.

**Disclaimer:**

This notebook is strictly for educational purposes. Users should employ it responsibly, ensuring accuracy, respecting privacy and copyright, and avoiding the dissemination of misinformation. Always consider the ethical implications and context of the generated content.

## Setup: Load dependencies and set parameters

As an inital step, we load the required dependencies and set the parameters for the `Map-Reduce` workflow. In this demo, we summarize the main content of the seminal `"Attention Is All You Need"` paper by Google Research.

In [1]:
import logging
import os
from tqdm.notebook import tqdm

os.chdir('d:/genai/GenAIPy/') # Change directory to root

from genaipy.extractors.pdf import extract_pages_text
from genaipy.openai_apis.chat import get_chat_response
from genaipy.prompts.generate_summaries import build_summary_prompt, REDUCE_SUMMARY_PROMPT_TPL

In [2]:
# Logging settings
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# PDF settings
PDF_URL = "demos/data/transformers_attention_paper.pdf" # https://arxiv.org/abs/1706.03762
START = 1
END = 10 # exclude references and appendix

# LLM settings
MAP_LLM = "gpt-3.5-turbo"
REDUCE_LLM = "gpt-4-1106-preview"
SYS_MESSAGE = "You are a generative AI expert. You explain complex AI concepts in simple language so regular users can understand your answers."
MAP_MAX_WORDS = 150
REDUCE_MAX_WORDS = 300

## Step [1/3]: Load and split PDF

In the first step, we load the PDF article and split the text content of each page in separate chunks. This enables us to generate an individual summary per page in the following `Map-Step`.

In [3]:
pages = extract_pages_text(pdf_path=PDF_URL, start_page=START, end_page=END)
print(f"Successfully loaded text from {len(pages)} PDF pages.")

Successfully loaded text from 10 PDF pages.


## Step [2/3]: Map-Step

In the second step, we execute the mapping loop of the `Map-Reduce` technique. Specifically, we generate a summary of each page's text content with `GPT-3.5-turbo` and store the resulting intermediate summaries in a list.

In [5]:
map_summaries = []
for page in tqdm(pages, desc="Generating Map Summaries"):
    try:
        map_prompt = build_summary_prompt(pages[page]["content"], max_words=MAP_MAX_WORDS)
        summary = get_chat_response(map_prompt, sys_message= SYS_MESSAGE, model=MAP_LLM)
        map_summaries.append(summary)
        logging.info("Map Summary #%d: %s", page, summary)
    except Exception as e:
        logging.error("An error occured while generating summary #%d: %s", page, summary)


Generating Map Summaries:   0%|          | 0/10 [00:00<?, ?it/s]

2023-11-27 11:13:44,413 - INFO - Successfully completed Chat API request. Total token usage: 889
2023-11-27 11:13:44,414 - INFO - Map Summary #1: The text sample introduces the concept of the Transformer, a network architecture based solely on attention mechanisms. Unlike traditional models that use recurrent or convolutional neural networks, the Transformer connects an encoder and a decoder through attention mechanisms. The authors conducted experiments on machine translation tasks and found that the Transformer outperformed existing models in terms of quality, parallelizability, and training time. It achieved a BLEU score of 28.4 on the English-to-German translation task and established a state-of-the-art BLEU score of 41.8 on the English-to-French translation task. The authors also demonstrated that the Transformer can generalize well to other tasks, such as English constituency parsing.
2023-11-27 11:13:53,540 - INFO - Successfully completed Chat API request. Total token usage: 107

## Step [3/3]: Reduce-Step

In the last step of the workflow, we combine the individual intermediate summaries, so we can distill them into a cohesive summary in the `Reduce-Step` with `GPT-4-turbo`. As we can see, the resulting final summary gives a comprehensive overview of the paper without omitting important information. 

In [6]:
# Process generated map summaries for reduce step

#map_summaries = ['The text introduces a new network architecture called the Transformer for sequence transduction tasks, such as machine translation. Unlike existing models that use recurrent or convolutional neural networks, the Transformer is solely based on attention mechanisms, eliminating the need for recurrence and convolutions. The experiments show that the Transformer models outperform existing models in terms of quality, parallelizability, and training time. For example, the Transformer achieves a BLEU score of 28.4 on the WMT 2014 English-to-German translation task, surpassing previous results by more than 2 BLEU. It also achieves a state-of-the-art BLEU score of 41.8 on the WMT 2014 English-to-French translation task. Additionally, the Transformer shows good generalization to other tasks, such as English constituency parsing.', 'The text discusses the limitations of recurrent models in language modeling and machine translation tasks, such as slow sequential computation and difficulty in learning dependencies between distant positions. It introduces the Transformer model, which relies on an attention mechanism to establish global dependencies between input and output. The Transformer allows for more parallelization, achieves state-of-the-art translation quality in a short training time, and reduces the complexity of relating signals from different positions. The model architecture includes an encoder-decoder structure with stacked self-attention and fully connected layers. The Transformer is the first transduction model to use self-attention exclusively, without relying on recurrent networks or convolution.', "The text describes the architecture and components of the Transformer model, which is used in natural language processing tasks. The architecture consists of an encoder and decoder stack. The encoder has multiple layers, each containing a self-attention mechanism and a feed-forward network. The decoder stack is similar, with an additional sub-layer for attention over the encoder's output. The self-attention mechanism calculates weights for each value based on the compatibility of the query with the corresponding key. Residual connections and layer normalization are used to improve information flow and prevent positions from attending to subsequent positions. The model produces outputs of dimension 512.", 'The text discusses two key components of attention mechanisms in artificial intelligence. The first is Scaled Dot-Product Attention, which computes the dot products of queries and keys, divides them by a scaling factor, and applies a softmax function to obtain weights on values. Scaled Dot-Product Attention is faster and more space-efficient than additive attention, which uses a feed-forward network. The second component is Multi-Head Attention, which involves linearly projecting queries, keys, and values multiple times before performing the attention function. The resulting output values are concatenated and projected again to obtain the final values. This technique improves the performance of attention mechanisms when working with high-dimensional data.', 'The multi-head attention mechanism allows a model to pay attention to different information at different positions. It combines attention heads to achieve this. The Transformer model uses multi-head attention in three ways: encoder-decoder attention, self-attention in the encoder, and self-attention in the decoder. The encoder-decoder attention allows every position in the decoder to attend to all positions in the input sequence. Self-attention in the encoder and decoder allows each position to attend to all positions in the previous layer. The model also includes position-wise feed-forward networks, which apply a fully connected network to each position separately and identically. The model uses embeddings to convert input and output tokens to vectors of a specified dimension. Positional encoding is used to inject information about the order of the sequence into the model.', 'The text discusses the complexity and efficiency of different layer types in sequence transduction tasks, such as self-attention, recurrent, and convolutional layers. Self-attention layers have a constant number of sequential operations and shorter maximum path lengths compared to recurrent layers. They are faster when the sequence length is smaller than the representation dimensionality. To improve performance with longer sequences, self-attention can be restricted to a neighborhood size. The text also mentions the use of positional encodings to incorporate the position information of tokens in the sequence. Sine and cosine functions of different frequencies are used as positional encodings. The choice between learned and fixed positional encodings does not significantly impact the results. The rationale behind using self-attention layers is their computational efficiency, parallelizability, and ability to learn long-range dependencies.', 'The text discusses various aspects related to the training and complexity of models in the context of convolutional and self-attention layers. It mentions that using a single convolutional layer may not connect all input and output positions and requires multiple layers to do so. However, separable convolutions decrease complexity considerably. The text also highlights the potential interpretability of models achieved through self-attention. \n\nIn terms of training, the models were trained on a specific dataset using byte-pair encoding, and the training batches consisted of approximately 25,000 source tokens and 25,000 target tokens. The text then describes the hardware, schedule, optimizer, and regularization techniques used during training.', 'The Transformer model achieves better BLEU scores than previous state-of-the-art models on English-to-German and English-to-French translation tests, while also having a lower training cost. The big Transformer model performs particularly well, outperforming all previous models on both translation tasks. This model achieved a BLEU score of 28.4 on English-to-German translation and a BLEU score of 41.0 on English-to-French translation. The base model also surpasses previous models and ensembles. The Transformer model uses label smoothing during training, which improves accuracy and BLEU score but hurts perplexity. The results are summarized in Table 2, comparing translation quality and training costs to other model architectures. Different variations of the base model were also evaluated, and the importance of attention heads and dimensions was highlighted.', "The text provides a table showing variations on the Transformer architecture and their impact on model performance in English-to-German translation. It also mentions that bigger models with dropout tend to perform better. Additionally, the text mentions that replacing sinusoidal positional encoding with learned positional embeddings does not significantly affect the model's performance. The text then briefly mentions that the Transformer architecture also shows good results in English constituency parsing, performing well compared to other models and approaches.", 'The text discusses the performance of the Transformer model, which is a sequence transduction model based entirely on attention. It explains that the Transformer outperforms recurrent neural network (RNN) models and the Berkeley-Parser even with small amounts of training data. The paper presents results from experiments using the Wall Street Journal dataset and high-confidence and BerkleyParser corpora. The model achieves state-of-the-art results in translation tasks, surpassing previously reported models. The paper also mentions plans to extend the Transformer to handle other input/output modalities, such as images, audio, and video. Code used to train and evaluate the models is available on GitHub.']
string = "\n".join(map_summaries)
text = string.replace("\n\n", "") # clean potential double newlines in joined map summaries
print(text)

The text sample introduces the concept of the Transformer, a network architecture based solely on attention mechanisms. Unlike traditional models that use recurrent or convolutional neural networks, the Transformer connects an encoder and a decoder through attention mechanisms. The authors conducted experiments on machine translation tasks and found that the Transformer outperformed existing models in terms of quality, parallelizability, and training time. It achieved a BLEU score of 28.4 on the English-to-German translation task and established a state-of-the-art BLEU score of 41.8 on the English-to-French translation task. The authors also demonstrated that the Transformer can generalize well to other tasks, such as English constituency parsing.
The text sample introduces the concept of the Transformer model, which is a type of neural network architecture that does not rely on recurrent connections or convolutional layers. Instead, it uses self-attention mechanisms to capture depende

In [9]:
# Execute reduce step to generate final summary

try:
    reduce_prompt = build_summary_prompt(text=text, max_words=REDUCE_MAX_WORDS, template=REDUCE_SUMMARY_PROMPT_TPL)
    final_summary = get_chat_response(prompt=reduce_prompt, sys_message=SYS_MESSAGE, model=REDUCE_LLM, max_tokens=1024)
    logging.info("Final Reduce Summary:\n%s", final_summary)
except Exception as e:
    logging.error(f"An error occured while generating final summary: {e}")
else:
    output = {"final_summary": final_summary, "map_summaries": map_summaries}

2023-11-27 11:23:48,243 - INFO - Successfully completed Chat API request. Total token usage: 1739
2023-11-27 11:23:48,244 - INFO - Final Reduce Summary:
### Overview of the Transformer Model

#### Architecture:
- The Transformer is an innovative neural network design for handling sequence-to-sequence tasks, such as language translation.
- It comprises two main components: an encoder to process the input and a decoder to generate the output.
- Multiple layers within both the encoder and decoder use self-attention and feed-forward networks.

#### Self-Attention Mechanisms:
- Self-attention enables the model to weigh the importance of different words within the input sequence, regardless of their position.
- It replaces recurrent and convolutional layers, allowing for higher computational efficiency and better handling of long-range dependencies.
- Multi-Head Attention is a key feature that allows the model to focus on different segments simultaneously, capturing a wider range of informat

In [10]:
try:
    reduce_prompt = build_summary_prompt(text=text, max_words=500, template=REDUCE_SUMMARY_PROMPT_TPL)
    final_summary = get_chat_response(prompt=reduce_prompt, sys_message=SYS_MESSAGE, model=REDUCE_LLM, max_tokens=1024)
    logging.info("Final Reduce Summary:\n%s", final_summary)
except Exception as e:
    logging.error(f"An error occured while generating final summary: {e}")
else:
    output = {"final_summary": final_summary, "map_summaries": map_summaries}

2023-11-27 11:44:54,971 - INFO - Successfully completed Chat API request. Total token usage: 2065
2023-11-27 11:44:54,973 - INFO - Final Reduce Summary:
# Final Summary of the Transformer Model

## Introduction to the Transformer
The Transformer is an innovative neural network architecture that has significantly impacted the field of natural language processing (NLP). Unlike traditional models that relied on recurrent or convolutional layers, the Transformer solely utilizes attention mechanisms to handle sequences of data.

## Core Components of the Transformer
- **Encoder and Decoder:** The model is structured with an encoder that processes the input sequence and a decoder that produces the output sequence.
- **Self-Attention Mechanism:** At the heart of the Transformer is the self-attention mechanism, which allows the model to weigh the importance of different parts of the input when producing a specific part of the output.
- **Scaled Dot-Product Attention:** This attention function 