# **Use Case: Report summarizations**

Demonstration only, not for production**

### **CUJ**
As a Business Analyst, I want to understand a summary of reports to inform business decision making

### **Description**
This demo extracts text from each page of the [Google 2022 Environmental Report](https://www.responsibilityreports.com/HostedData/ResponsibilityReports/PDF/NASDAQ_GOOG_2022.pdf) via the Document AI Form parser. The text is then summarized by the LLM

### **Instructions**
To run the demo, run all of the cells in order.

### **Example Output**
Google's Environmental Report 2022 details the company's progress in reducing its environmental impact. Google has been carbon neutral for its operations since 2007 and has matched 100% of its annual electricity use with renewable energy for five consecutive years. The company is working towards a carbon-free and circular economy, and has set goals to achieve net-zero emissions across all of its operations and value chain by 2030, become the first major company to run on carbon-free energy 24/7, enable 5 gigawatts of new carbon-free energy, and help more than 500 cities and local governments reduce an aggregate of 1 gigaton of carbon emissions annually. Google is also committed to helping 1 billion people make more sustainable choices by the end of 2022 through its core products.


In [None]:
from google.colab import auth as google_auth
google_auth.authenticate_user()

In [None]:
!pip install PyPDF2
!pip install google-cloud-documentai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:

!pip install google_cloud_aiplatform "shapely<2.0.0"

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import vertexai

PROJECT_ID = "pe2024"  # @param {type:"string"}
vertexai.init(project=PROJECT_ID, location="us-central1")

In [None]:
from vertexai.preview.language_models import TextGenerationModel
generation_model = TextGenerationModel.from_pretrained("text-bison@001")


In [None]:
pip install gcsfs

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
from PyPDF2 import PdfReader, PdfWriter
from google.cloud import documentai_v1beta3 as documentai
import io
import gcsfs

client = documentai.DocumentProcessorServiceClient()
docs = []

gcs_file_system = gcsfs.GCSFileSystem(project="pe2024")
#Donwload file from https://www.responsibilityreports.com/HostedData/ResponsibilityReports/PDF/NASDAQ_GOOG_2022.pdf
gcs_pdf_path = "gs://pe2024/NASDAQ_GOOG_2022.pdf"
f_object = gcs_file_system.open(gcs_pdf_path, "rb")

pdf = PdfReader(f_object)
for page in pdf.pages:
  buf = io.BytesIO()
  writer = PdfWriter()
  writer.add_page(page)
  writer.write(buf)
  #Need to create a procecessot in DocAI https://console.cloud.google.com/ai/document-ai/processors (e.g. Document OCR processor)
  process_request = {
    "name": "projects/pe2024/locations/us/processors/ef1a578c06fa9ca3",
    "raw_document": {
        "content": buf.getvalue(),
        "mime_type": "application/pdf",
    },
  }
  docs.append(client.process_document(request=process_request).document)

# New Section

In [None]:
import time

map_nums = ["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"]

#client = VertexGenAI()

def summarize(text, num_sentences):
  lines = [f"Provide a summary with about {map_nums[num_sentences]} sentences for the following text: ", text]
  prompt = "\n".join(lines)

  resp = generation_model.predict(prompt, temperature=0.2, max_output_tokens=256, top_k=40, top_p=0.8).text
  return resp




In [None]:
summaries = []
for page in docs[:5]:
  summaries.append(summarize(page.text, 2))

summarize("\n".join(summaries), 4)

"Google's Environmental Report 2022 details the company's progress in reducing its environmental impact. Google has been carbon neutral for its operations since 2007 and has matched 100% of its annual electricity use with renewable energy for five consecutive years. The company is working towards a carbon-free and circular economy, and has set goals to achieve net-zero emissions across all of its operations and value chain by 2030, become the first major company to run on carbon-free energy 24/7, enable 5 gigawatts of new carbon-free energy, and help more than 500 cities and local governments reduce an aggregate of 1 gigaton of carbon emissions annually. Google is also committed to helping 1 billion people make more sustainable choices by the end of 2022 through its core products."