# Generative AI on GCP


## Overview

In research you often need to read several papers to understand new method or finidings and this can be quite taxing if you only need to get the gist of a article or need to quickly skim the article. In this tutorial we will using four different methods using generative AI to summarize long documents but with the goal of still perserving the most important information. 

This tutorial is based on the [GCP generative AI tutorials](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/examples/document-summarization/summarization_large_documents.ipynb).

### Objective

In this tutorial, you will learn how to use generative models to summarize information from text by working through the following examples:

- [Stuffing method](#stuffing)
- [MapReduce method](#map)
- [MapReduce with Overlapping Chunks method](#mapo)
- [MapReduce with Rolling Summary method](#mapr)
- [Summarizing with Generative AI Studio](#genstudio)

### Costs

This tutorial uses billable components of Google Cloud:
- Vertex AI Generative AI Studio

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing), [Generative AI pricing](https://cloud.google.com/vertex-ai/pricing#generative_ai_models), and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

## Getting Started

### Install Vertex AI SDK, other packages and their dependencies

In [None]:
!pip install google-cloud-aiplatform PyPDF2 ratelimit backoff --upgrade --quiet --user

If a warning comes up to 'add ~/.local/bin to your path' run the following command and restart your notebook.

In [None]:
!PATH=$PATH:~/.local/bin

### Authenticating your notebook environment
* If you are using **Colab** to run this notebook instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)
* If you are using **Vertex AI Workbench**, check out the setup below.

Download Generative AI plug-in form the GCP github

In [None]:
!git clone https://github.com/GoogleCloudPlatform/generative-ai.git

In [None]:
!pip install google-cloud-aiplatform --upgrade

Enter in your project ID and the location nearest to you.

In [None]:
PROJECT_ID = "<PROJECT ID>"
LOCATION = "<LOCATION>" #e.g. us-central1

import vertexai
vertexai.init(project=PROJECT_ID, location=LOCATION)

### Import libraries

In [None]:
import re
import urllib
import warnings
from pathlib import Path

import backoff
import pandas as pd
import PyPDF2
import ratelimit
from google.api_core import exceptions
from tqdm import tqdm
from vertexai.language_models import TextGenerationModel

warnings.filterwarnings("ignore")

If you receive a error saying **'PyPDF2 is not found'** shut down and restart your notebook then run the script again.

### Import models

Here you load the pre-trained text generation model called `text-bison@001`.


In [None]:
generation_model = TextGenerationModel.from_pretrained("text-bison@001")

### Preparing data files

Now you will need to download a PDF file for the summarizing tasks below. For this tutorial this article is about how gut microbiota affects Alzeheimer's disease because of the gut-brain-microbiota axis network.

In [None]:
# Define a folder to store the files
data_folder = "data"
Path(data_folder).mkdir(parents=True, exist_ok=True)

# Define a pdf link to download and place to store the download file
pdf_url = "https://www.aging-us.com/article/102930/pdf"
pdf_file = Path(data_folder, pdf_url.split("/")[-1])

# Download the file using `urllib` library
urllib.request.urlretrieve(pdf_url, pdf_file)

Here you will take a peak at a few pages of the downloaded pdf file

In [None]:
# Read the PDF file and create a list of pages
reader = PyPDF2.PdfReader(pdf_file)
pages = reader.pages

# Print three pages from the pdf
for i in range(3):
    text = pages[i].extract_text().strip()
    print(f"Page {i}: {text} \n\n")

## Method 1: Stuffing <a name="stuffing"></a>

The simplest way to pass data to a language model is to "stuff" it all into the prompt as context. This means simply including all of the relevant information in the prompt, in the order that you want the model to process it.

Here you will extract the text from all the pages in the pdf file.

In [None]:
# Read the PDF file and create a list of pages
reader = PyPDF2.PdfReader(pdf_file)
pages = reader.pages

# Entry string to concatenate all the extacted texts
concatenated_text = ""

# Loop through the pages
for page in tqdm(pages):

    # Extract the text from the page and remove any leading or trailing whitespace
    text = page.extract_text().strip()

    # Concate the extracted text to the concatenated text
    concatenated_text += text

print(f"There are {len(concatenated_text)} characters in the pdf")

You will now create a prompt template that can be used later in the notebook.

In [None]:
prompt_template = """
    Write a concise summary of the following text delimited by triple backquotes.
    Return your response in bullet points which covers the key points of the text.

    ```{text}```

    BULLET POINT SUMMARY:
"""

Here you will use LLM via the API to summarize the extracted texts. Please note that LLMs currently have input text limit and stuffing a large input text might not be accepted. You can read more about quotas and limits [here](https://cloud.google.com/vertex-ai/docs/quotas).

For explainations on the parameters **prompt, max_output_tokens, temperature, top_p, and top_k** see the following article [here](https://cloud.google.com/vertex-ai/docs/generative-ai/text/test-text-prompts#generative-ai-test-text-prompt-drest).


The following code will cause **an exception**!

In [None]:
# Define the prompt using the prompt template
prompt = prompt_template.format(text=concatenated_text)

# Use the model to summarize the text using the prompt
summary = generation_model.predict(prompt=prompt, max_output_tokens=1024, temperature= 0.2, top_p=0.95, top_k=40).text

print(summary)

#### Retrying

The model responded with an error message: **400 Request contains an invalid argument** or it will say **400 The model supports up to 8192 input tokens, but received 19043 tokens** because the extracted text is too long for the generative model to process.

To avoid this issue, you will only input a chunk of the extracted text (e.g. the first 30,000 words).

In [None]:
# Define the prompt using the prompt template
prompt = prompt_template.format(text=concatenated_text[:30000])

# Use the model to summarize the text using the prompt
summary = generation_model.predict(prompt=prompt, max_output_tokens=1024, temperature= 0.2, top_p=0.95, top_k=40).text

print(summary)

Lets try increasing the temperature parameter and see if you recieve a different output.

In [None]:
# Define the prompt using the prompt template
prompt = prompt_template.format(text=concatenated_text[:30000])

# Use the model to summarize the text using the prompt
summary = generation_model.predict(prompt=prompt, max_output_tokens=1024, temperature= 1, top_p=0.95, top_k=40).text

print(summary)

As you can see our output becomes shorter and straight to the point this is because the temperature parameter controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a more deterministic and less creative responses, while higher temperatures can lead to more diverse or creative results. A temperature of 0 is deterministic, meaning that the highest probability response is always selected.

### Recap

Although full text is too large for the model, you have managed to create a concise, bulleted list of the most important information from a portion of the PDF using the model. This method is the most simiplest and is ideal for shorter documents but can still be used when you limit the character number you want the model to read. 

## Method 2: MapReduce <a name="map"></a>

This method works by first splitting the large data into chunks, then running a prompt on each chunk of text. For summarization tasks, the output from the initial prompt would be a summary of that chunk. Once all the initial outputs have been generated, a different prompt is run to combine them.

This method is a bit more complex than the first method, but it can be more effective for large datasets. Here you will prepare two prompt templates: one for the initial summary step and another for the final combine step. You will be using these two templates later in this notebook.

In [None]:
initial_prompt_template = """
    Write a concise summary of the following text delimited by triple backquotes.

    ```{text}```

    CONCISE SUMMARY:
"""

final_prompt_template = """
    Write a concise summary of the following text delimited by triple backquotes.
    Return your response in bullet points which covers the key points of the text.

    ```{text}```

    BULLET POINT SUMMARY:
"""

### Adding rate limit to model calls

When you use MapReduce or other similar methods, you will be making multiple API calls to the model in a short period of time. There is a limit on the number of API calls you can make per minute, so you will need to add a safety measure to your code to prevent exceeding the limit. This will help to ensure that your code runs smoothly and does not encounter any errors.

For this method, here are a few specific things that you will do:
1. You will make use of a Python library called [ratelimit](https://pypi.org/project/ratelimit/) to limit the number of API calls per minute
2. You will make use of a Python library called [backoff](https://pypi.org/project/backoff/) to retry until the maximum time limit has reached

The following function improves the API call process by limiting the number of calls to **20 per minute**. It also back offs and retries calling the API after encountering **Resource Exhausted** exception. The wait duration grows **exponentially until the 5-minute mark**, and then the function will give up on retrying.

In [None]:
CALL_LIMIT = 20  # Number of calls to allow within a period
ONE_MINUTE = 60  # One minute in seconds
FIVE_MINUTE = 5 * ONE_MINUTE

# A function to print a message when the function is retrying
def backoff_hdlr(details):
    print(
        "Backing off {} seconds after {} tries".format(
            details["wait"], details["tries"]
        )
    )


@backoff.on_exception(  # Retry with exponential backoff strategy when exceptions occur
    backoff.expo,
    (
        exceptions.ResourceExhausted,
        ratelimit.RateLimitException,
    ),  # Exceptions to retry on
    max_time=FIVE_MINUTE,
    on_backoff=backoff_hdlr,  # Function to call when retrying
)
@ratelimit.limits(  # Limit the number of calls to the model per minute
    calls=CALL_LIMIT, period=ONE_MINUTE
)

# This function will call the `generation_model.predict` function, but it will retry if defined exceptions occur.
def model_with_limit_and_backoff(**kwargs):
    return generation_model.predict(**kwargs)

#### Map step

In this section, you will read the PDF file again and use the model to summarize each page individually using the initial prompt template.

In [None]:
# Read the PDF file and create a list of pages
reader = PyPDF2.PdfReader(pdf_file)
pages = reader.pages

# Create an empty list to store the summaries
initial_summary = []

# Iterate over the pages and generate a summary for each page
for page in tqdm(pages):

    # Extract the text from the page and remove any leading or trailing whitespace
    text = page.extract_text().strip()

    # Create a prompt for the model using the extracted text and a prompt template
    prompt = initial_prompt_template.format(text=text)

    # Generate a summary using the model and the prompt
    summary = model_with_limit_and_backoff(prompt=prompt, max_output_tokens=1024, temperature= 0.2, top_p=0.95, top_k=40).text

    # Append the summary to the list of summaries
    initial_summary.append(summary)

Take a look at the first few summaries of from the initial Map phrase.

In [None]:
print("\n\n".join(initial_summary[:10]))

Here you will count the number of characters in the initial summary to see if they are small enough to fit in a prompt.

In [None]:
len("\n".join(initial_summary))

As you managed to input 30,000 characters in a prompt previously, you can input this whole summary which has fewer characters to a prompt directly too. You will do that in the next step.

#### Reduce step

Here you will create a reduce function that concatenate the summaries from the inital summarization step (Map step) and use the final prompt template to summarize the summaries again.

In [None]:
# Define a function to create a summary of the summaries
def reduce(initial_summary, prompt_template):

    # Concatenate the summaries from the inital step
    concat_summary = "\n".join(initial_summary)

    # Create a prompt for the model using the concatenated text and a prompt template
    prompt = prompt_template.format(text=concat_summary)

    # Generate a summary using the model and the prompt
    summary = model_with_limit_and_backoff(prompt=prompt, max_output_tokens=1024, temperature= 0.2, top_p=0.95, top_k=40).text

    return summary

You are ready to proceed on to the next step to combine all the summary into an even smaller summary using the final prompt template and the function that you created earlier.

In [None]:
# Use defined `reduce` function to summarize the summaries
summary = reduce(initial_summary, final_prompt_template)

print(summary)

#### Recap

You just summarized the whole paper into a few bullet points using the MapReduce method. This method is better suited for summarizing large docs and also parallelizes the process by summarizing the pages of our PDF independently but becuase of this it can also lead to loss in context.

In the next section, you will try another method which makes use of more than one chunk (page) per prompt to summarize.

## Method 3: MapReduce with Overlapping Chunks <a name="mapo"></a>

It is similar to MapReduce, but with one key difference: overlapping chunks. This means that a few pages will be summarized together, rather than each page being summarized separately. This helps to preserve more context or information between chunks, which can improve the accuracy of the results.

It is important to note that combining chunks may sometimes exceed the token limit imposed by the model. If this occurs, you can either implement the chunk splitting method show or creatively solve the issue (e.g. removing a few initial chunks).

#### Map step

In this section, you will read the PDF file again and use the model to summarize <b>a few pages</b> together using the initial prompt template that you defined earlier.

In [None]:
# Read the PDF file and create a list of pages
reader = PyPDF2.PdfReader(pdf_file)
pages = reader.pages

# Create an empty list to store the extracted text from the pages
text_from_pages = []

# Iterate over the pages and generate a summary for each page
for page in tqdm(pages):

    # Extract the text from the page and remove any leading or trailing whitespace
    text = page.extract_text().strip()

    # Append the extracted text to the list of extracted text
    text_from_pages.append(text)

Here you will define the chunk size (number of pages to combine in this example) and summarize the chunks.

In [None]:
CHUNK_SIZE = 2  # number of overlapping pages

# Read the PDF file and create a list of pages
reader = PyPDF2.PdfReader(pdf_file)
pages = reader.pages

# Create an empty list to store the summaries
initial_summary = []

# Iterate over the pages and generate a summary for a few pages as one chunk based on `CHUNK_SIZE`
for i in tqdm(range(len(pages))):

    # Select a list of pages to merge as one chunk
    pages_to_merge = [x for x in range(i, i + CHUNK_SIZE) if x < len(pages)]

    extracted_texts = [text_from_pages[x] for x in pages_to_merge]

    # Concatenate the
    text = "\n".join(extracted_texts)

    # Create a prompt for the model using the concatenated text and a prompt template
    prompt = initial_prompt_template.format(text=text)

    # Generate a summary using the model and the prompt
    summary = model_with_limit_and_backoff(prompt=prompt, max_output_tokens=1024, temperature= 0.2, top_p=0.95, top_k=40).text

    # Append the summary to the list of summaries
    initial_summary.append(summary)

    # If the last page is reached, break the loop
    if pages_to_merge[-1] == len(reader.pages):
        break

Take a look at the first few summaries of from the initial Map phrase.

In [None]:
print("\n\n".join(initial_summary[:10]))

#### Reduce step

You are ready to proceed on to the next step to combine all the summary into an even smaller summary using the final prompt template and the function that you created earlier.

In [None]:
# Use defined `reduce` function to summarize the summaries
summary = reduce(initial_summary, final_prompt_template)

print(summary)

#### Recap

The model was able to summarize the whole paper into a few bullet points using the MapReduce with Overlapping Chunks method but you will notice that the summary is longer than the ones we had before. With this method we were able to parallelize the process withour losing context but this process is slower as multiple call are need to be made to the model.


In the next section, you will try a different approach that make use of a summary from the previous page instead of the entire text.

## Method 4: MapReduce with Rolling Summary (Refine) <a name="mapr"></a>

On some occasions, combining a few pages might be too large to summarize. To resolve that issue, we will try a different approach that uses an initial summary from the previous step along with the next page to summarize each prompt. This helps to ensure that the summary is complete and accurate, as it takes into account the context of the previous page.

In [None]:
initial_prompt_template = """
    Taking the following context delimited by triple backquotes into consideration:

    ```{context}```

    Write a concise summary of the following text delimited by triple backquotes.

    ```{text}```

    CONCISE SUMMARY:
"""

In [None]:
# Read the PDF file and create a list of pages.
reader = PyPDF2.PdfReader(pdf_file)
pages = reader.pages

# Create an empty list to store the summaries.
initial_summary = []

# Iterate over the pages and generate a summary
for idx, page in enumerate(tqdm(pages)):

    # Extract the text from the page and remove any leading or trailing whitespace.
    text = page.extract_text().strip()

    if idx == 0:  # if current page is the first page, no previous context
        prompt = initial_prompt_template.format(context="", text=text)

    else:  # if current page is not the first page, previous context is the summary of the previous page
        prompt = initial_prompt_template.format(
            context=initial_summary[idx - 1], text=text
        )

    # Generate a summary using the model and the prompt
    summary = model_with_limit_and_backoff(prompt=prompt, max_output_tokens=1024, temperature= 0.2, top_p=0.95, top_k=40).text

    # Append the summary to the list of summaries
    initial_summary.append(summary)

Here you will list out a few entries from the initial summary list.

In [None]:
initial_summary[:10]

It is expected that there will be a few duplicate entries in the list, as you are rolling in context from previous pages to the next. You can easily remove these duplicates by using the set function.

In [None]:
initial_summary = set(initial_summary)  # set() function removes duplicate items

#### Reduce step
You are ready to proceed on to the next step to combine all the summary into an even smaller summary using the final prompt template and the function that you created earlier.

In [None]:
# Use defined `reduce` function to summarize the summaries
summary = reduce(initial_summary, final_prompt_template)

print(summary)

#### Recap

The model was able to summarize the whole paper into a few bullet points using the MapReduce with Rolling Summary method. This method allowed less context to be lost because sequential pages are summarized using the context from previous pages but this method does not work well with parallel processing as the processes to summarize pages are dependent to each other.

## Summarizing with Generative AI Studio <a name="genstudio"></a>

Go to the Generative AI Studio console [here](https://console.cloud.google.com/vertex-ai/generative/language?_ga=2.182664366.923116401.1692009977-1042353744.1691708677).

Scroll down to **Summarization** and click on the model **Article Summary**. You will see a prompt session were you will need to enter in the contents of your article as the console does not allow you to upload files. In order for the model to work best try to not include any abstract or reference sections of your article, if errors still come up try limiting the article even more by removing other sections such as the intro.

 <img src="images/GCPGenStudio2.png" width="500" height="500">

To the left you can control the parameters that we have been using before this is a great way to test what each parameter does and how they effect each other. Once you are done click **submit**, you should have a similar output as below.

 <img src="images/GCPGenStudio3.png" width="600" height="600">
 


## Conclusion

You have successfully summarized a long document using four different methods. Its important to remember that each moethod has its advantages and disadvantages before deciding which method is the right one for you.


Summarizing a long document can be challenging and time consuming as you are still required to ensure your model has correctly idetified the main points of the article in a concise and coherent way. This can be especially difficult if the document is complex or technical. While these methods allow you to interact with LLMs and summarize long documents in a flexible way, you may sometimes want to speed up the process by using bootstrapping or pre-built methods. This is where libraries like LangChain come in. You can read more about LangChain support on Vertex AI [here](https://python.langchain.com/en/latest/modules/models/llms/integrations/google_vertex_ai_palm.html).