In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Game Review Analysis Workflow with Vertex AI Extensions

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/vertex_ai_extensions/notebooks/game_review_analysis_vertexai_extensions.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fmain%2Fgenai-on-vertex-ai%2Fvertex_ai_extensions%2Fnotebooks%2Fgame_review_analysis_vertexai_extensions.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>    
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/applied-ai-engineering-samples/main/genai-on-vertex-ai/vertex_ai_extensions/game_review_analysis_vertexai_extensions.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/vertex_ai_extensions/notebooks/game_review_analysis_vertexai_extensions.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

| | |
|----------|-------------|
| Author(s)   | [Meltem Subasioglu](https://github.com/5Y5TEM)|
| Reviewers(s) | Yan Sun, Michael Sherman |
| Last updated | 2024-04-15: Documentation Changes |

# Overview

[Vertex AI Extensions](https://cloud.google.com/vertex-ai/docs/generative-ai/extensions/private/overview) is a platform for creating and managing extensions that connect large language models to external systems via APIs. These external systems can provide LLMs with real-time data and perform data processing actions on their behalf.

In this tutorial, you'll use Vertex AI Extensions to complete a review analysis of a Steam game:

- Retrieve 50 reviews about the game from Steam
- Create a pre-built Code Interpreter extension in your project
- Use Code Interpreter to analyze the reviews and generate plots
- Retrieve 10 websites with more detailed reviews on the game
- Create and use the Vertex AI Search extension to research and summarize the website reviews
- Use Code Interpreter to build a report with all the generated assets
- Convert the report to PDF and upload to your Google Drive  
- **[Optional]:** Send the PDF Report as an attachment via Gmail

▶ If you're already familiar with Google Cloud and the Vertex AI Extensions Code Interpreter Extension, you can skip reading between here and the "**Getting Started**" section.

## Vertex AI Extensions

[Vertex AI Extensions](https://cloud.google.com/vertex-ai/generative-ai/docs/extensions/overview) is a platform for creating and managing extensions that connect large language models to external systems via APIs. These external systems can provide LLMs with real-time data and perform data processing actions on their behalf. You can use pre-built or third-party extensions in Vertex AI Extensions.

## Vertex AI Extensions Code Interpreter Extension

The [Code Interpreter](https://console.cloud.google.com/vertex-ai/generative-ai/docs/extensions/google-extensions.md#google_code_interpreter_extension) extension provides access to a Python interpreter with a sandboxed, secure execution environment that can be used with any model in the Vertex AI Model Garden. This extension can generate and execute code in response to a user query or workflow. It allows the user or LLM agent to perform various tasks such as data analysis and visualization on new or existing data files.

You can use the Code Interpreter extension to:

* Generate and execute code.
* Perform a wide variety of mathematical calculations.
* Sort, filter, select the top results, and otherwise analyze data (including data acquired from other tools and APIs).
* Create visualizations, plot charts, draw graphs, shapes, print results, etc.

## Vertex AI Extensions Search Extension

The Vertex AI [Search](https://cloud.google.com/vertex-ai/generative-ai/docs/extensions/google-extensions#vertex_ai_search_extension) extension lets you access and search website corpuses and unstructured data to provide relevant responses to natural language questions, such as:

* "How did the competitive threats for the company change from Q1 of last year to Q1 of this year?"
* "What parts of the company are growing the fastest? How fast?"

## Using this Notebook

Colab is recommended for running this notebook, but it can run in any iPython environment where you can connect to Google Cloud, install pip packages, etc.

If you're running outside of Colab, depending on your environment you may need to install pip packages (like pandas) that are included in the Colab environment by default but are not part of the Python Standard Library. You'll also notice some comments in code cells that look like #@something -- these may contain informative text

This tutorial uses the following Google Cloud services and resources:

* Vertex AI Extensions
* Google Cloud Storage Client
* Google Drive Client
* Gmail API Client

This notebook has been tested in the following environment:

* Python version = 3.10.12 & 3.12.0
* [google-cloud-aiplatform](https://pypi.org/project/google-cloud-aiplatform/) version = 1.47.0

**Note:** Vertex AI Extensions requires google-cloud-aiplatform version >= 1.47.0

## Useful Tips

1. This notebook uses Generative AI cababilities. Re-running a cell that uses Generative AI capabilities may produce similar but not identical results.
2. Because of #1, it is possible that an output from Code Interpreter producess errors. If that happens re-run the cell that produced the coding error. The different generated code will likely be bug free. The `run_code_interpreter` method below helps automate this, but you still may need to rerun cells that generate working code that doesn't perfectly follow the instructions in the prompt.
3. The use of Extensions and other Generative AI capabilities is subject to service quotas. Running the notebook using "Run All" may exceed  your queries per minute (QPM) limitations. Run the notebook manually and if you get a quota error pause for up to 1 minute before retrying that cell. Code Interpreter defaults to Gemini on the backend and is subject to the Gemini quotas, [view your Gemini quotas here](https://console.cloud.google.com/iam-admin/quotas?pageState=(%22allQuotasTable%22:(%22f%22:%22%255B%257B_22k_22_3A_22_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22base_model_5C_22_22%257D_2C%257B_22k_22_3A_22_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22gemini_5C_22_22%257D%255D%22%29%29&e=13802955&mods=logs_tg_staging).
4. The Code Interpreter Extension is stateless and therefore every request to Code Interpreter does not have knowledge of previous operations nor files injested or produced in previous steps. Therefore, with any request to Code Interpreter you need to submit all files and instructions for that request to complete successfully.
5. The Code Interpreter runs in a sandbox environment. So try to avoid prompts that need additional python packages to run or tell the Code Interpreter to ignore anything that needs packages beyond the built-in ones
6. Tell the Code Interpreter to catch and print any exceptions for you, and to suppress UserWarnings and FutureWarnings
7. For debugging the output of the Vertex Code Interpreter extension, it usually helps copying the error message into the prompt and telling the extension to properly handle that error.

# Getting Started

The following steps are necessary to run this notebook, no matter what notebook environment you're using.

If you're entirely new to Google Cloud, [get started here](https://cloud.google.com/docs/get-started).

## Google Cloud Project Setup

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.
1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).
1. [Enable the Service Usage API](https://console.cloud.google.com/apis/library/serviceusage.googleapis.com)
1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).
1. [Enable the Cloud Storage API](https://console.cloud.google.com/flows/enableapi?apiid=storage.googleapis.com).
1. [Enable the Google Drive API](https://console.cloud.google.com/flows/enableapi?apiid=drive.googleapis.com).
1. [Enable the Gmail API](https://console.cloud.google.com/flows/enableapi?apiid=gmail.googleapis.com).
1. [Enable the Discovery Engine API for your project](https://console.cloud.google.com/marketplace/product/google/discoveryengine.googleapis.com)
1. [Enable the Agent Builder API](https://console.cloud.google.com/gen-app-builder/start)

## Google Cloud Permissions

**To run the complete Notebook, including the optional section, you will need to have Owner permisions to the project.**

If you want to skip the optional section, you need at least the following [roles](https://cloud.google.com/iam/docs/granting-changing-revoking-access):
* **`roles/serviceusage.serviceUsageAdmin`** to enable APIs
* **`roles/iam.serviceAccountAdmin`** to modify service agent permissions
* **`roles/discoveryengine.admin`** to modify discoveryengine assets
* **`roles/aiplatform.user`** to use AI Platform components
* **`roles/storage.objectAdmin`** to modify and delete GCS buckets





## Install Vertex AI SDK and other required packages


In [None]:
!pip install google-cloud-discoveryengine --upgrade
!pip install google-cloud-aiplatform --upgrade
!pip install xhtml2pdf

## If you're running outside of colab, make sure to install the following modules as well:
# !pip install pandas
# !pip install google
# !pip install google-api-python-client
# !pip install google-oauth
# !pip install google-auth-oauthlib

### Restart runtime

To use the newly installed packages in this Jupyter runtime, you may need to restart the runtime. You can do this by running the cell below, which restarts the current kernel.

You may see the restart reported as a crash, but it is working as-intended -- you are merely restarting the runtime.

The restart might take a minute or longer. After it's restarted, continue to the next step.

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>


## Authenticate (Colab)

If you're using Colab, run the code in the next cell. Follow the popups and authenticate with an account that has access to your Google Cloud [project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#identifying_projects).


In [None]:
import sys
from google.auth import default
from google.colab import auth as google_auth

if "google.colab" in sys.modules:
    google_auth.authenticate_user()

creds, _ = default()

## Authenticate (Outside Colab)

If you're running this notebook somewhere besides Colab, make sure your environment has the right Google Cloud access. If that's a new concept to you, consider looking into [Application Default Credentials for your local environment](https://cloud.google.com/docs/authentication/provide-credentials-adc#local-dev) and [initializing the Google Cloud CLI](https://cloud.google.com/docs/authentication/gcloud). More authentication options are discussed [here](https://cloud.google.com/docs/authentication).

Once the Google Cloud CLI is properly installed on your system, follow the instructions in the next cells to set up your ADC.

###Setting up Application Default Credentials

Outside of Colab, you can authenticate through Google Cloud via Application Default Credentials.
It is recommended that you set up a new configuration to run this notebook.

To do so, open a terminal and run:

`$ gcloud config configurations create CONFIG_NAME`

This creates a new config with the specified name.


💡 **NOTE:** You can list all available configurations by running
`$ gcloud config configurations list` 💡



The configuration should be activated automatically.
Next, login with your account by running

`$ gcloud auth login EMAIL_ADDRESS`

Set your project:

`$ gcloud config set project PROJECT_ID`

You will likely get a warning that the active project doesn't match the quota project.
To change this, run:

`$ gcloud auth application-default set-quota-project PROJECT_ID`

Confirm that the API cloudresourcemanager.googleapis.com will be enabled with Y.


**You're ADC is all set now. Fetch your credentials by running the next cell:**

In [None]:
from google.auth import default
creds, _ = default()

## Set Google Cloud project information and initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and enable all the APIs mentioned in the 'Getting Started' section of this notebook.

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).


In [None]:
import vertexai

PROJECT_ID = "YOUR_PROJECT_ID"  # @param {type:"string"}
REGION = "us-central1"  # @param {type: "string"}
API_ENV = "aiplatform.googleapis.com"  # @param {type:"string"}

!gcloud config set project {PROJECT_ID}


vertexai.init(
    project=PROJECT_ID,
    location=REGION,
    api_endpoint=f"{REGION}-{API_ENV}",
)

## Create a public Google Cloud Storage bucket

You will further need a GCS bucket. For the scope of this notebook, we will create a public bucket by running the cells below.

**Note:** This is needed to embed generated images into the pdf report further below. Alternatively, you can prompt the Code Interpreter to input the image links into the report instead of embedding them directly, and use your own non-public bucket instead.

In [None]:
# @markdown Select a **unique** name for your bucket
GCS_BUCKET = "my_testbucket"  # @param {type:"string"}


In [None]:
from google.cloud import storage

# Create a client object
client = storage.Client(project=PROJECT_ID)

# Create the bucket with public access
bucket = client.create_bucket(GCS_BUCKET)
bucket.make_public(future=True)  # Make the bucket publicly accessible

print(f"Public bucket {GCS_BUCKET} created successfully.")

## Import libraries

# Using Vertex AI Extensions to Analyze Game Reviews - Tutorial

## Step 1: Create a Code Interpreter Extension

Now you can create the extension. The following cell uses the Python SDK to import the extension (thereby creating it) in Vertex AI Extensions.

In [None]:
from vertexai.preview import extensions

extension_code_interpreter = extensions.Extension.from_hub("code_interpreter")
extension_code_interpreter

### Code Interpreter Helper Functions

These functions make it easier to inspect Code Interpreter's output, assemble Code Interprer requests, and run generated code.

#### `process_response`

`process_response` displays the generated code and any output files, shows the output from code execution, surfaces code execution errors, and saves output files.

If the output of `process_response` looks strange, try making your noteboook window wider--this will help keep the HTML layout organized.

**To use this functionality** call `process_response(response)`, where `response` is the Code Interpreter `response` object.


In [None]:
import base64
import json
import pprint
import pandas
import sys
import IPython
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO

css_styles = """
<style>
.main_summary {
  font-weight: bold;
  font-size: 14px; color: #4285F4;
  background-color:rgba(221, 221, 221, 0.5); padding:8px;}
.main_summary:hover {background-color: rgba(221, 221, 221, 1);}
details {
  background-color:#fff;
  border: 1px solid #E8EAED;
  padding:0px;
  margin-bottom:2px; }
details img {width:50%}
details > div {padding:10px; }
div#left > * > div {
    overflow:auto;
    max-height:400px; }

div#right > pre {
    overflow:auto;
    max-height:600px;
    background-color: ghostwhite;
    padding: 10px; }
details details > div { overflow: scroll; max-height:400px}
details details {
  background-color:rgba(246, 231, 217, 0.2);
  border: 1px solid #FBBC04;}
details details > summary {
  padding: 8px;
  background-color:rgba(255, 228, 196, 0.6); }
details details > summary:hover { background-color:rgba(255, 228, 196, 0.9); }
div#left {width: 64%; padding:0 1%;  }
div#right {
  border-left: 1px solid silver;
  width: 30%;
  float: right;
  padding:0 1%; }
body {color: #000; background-color: white; padding:10px 10px 40px 10px; }
#main { border: 1px solid #FBBC04; padding:10px 0; display: flow-root; }
h3 {color: #000; }
code  { font-family: monospace; color: #900; padding: 0 2px; font-size: 105%; }
</style>
        """

# Parser to visualise the content of returned files as HTML.
def parse_files_to_html(outputFiles, save_files_locally = True):
    IMAGE_FILE_EXTENSIONS = set(["jpg", "jpeg", "png"])
    file_list = []
    details_tml = """<details><summary>{name}</summary><div>{html_content}</div></details>"""

    if not outputFiles:
      return "No Files generated from the code"
    # Sort output_files so images are displayed before other files such as JSON.
    for output_file in sorted(
        outputFiles,
        key=lambda x: x["name"].split(".")[-1] not in IMAGE_FILE_EXTENSIONS,
    ):
        file_name = output_file.get("name")
        file_contents = base64.b64decode(output_file.get("contents"))
        if save_files_locally:
          open(file_name,"wb").write(file_contents)

        if file_name.split(".")[-1] in IMAGE_FILE_EXTENSIONS:
            # Render Image
            file_html_content = ('<img src="data:image/png;base64, '
                                f'{output_file.get("contents")}" />')
        elif file_name.endswith(".json"):
            # Pretty print JSON
            json_pp = pprint.pformat(
                        json.loads(file_contents.decode()),
                        compact=False,
                        width=160)
            file_html_content =  (f'<span>{json_pp}</span>')
        elif file_name.endswith(".csv"):
            # CSV
            csv_md = pandas.read_csv(
                  StringIO(file_contents.decode())).to_markdown(index=False)
            file_html_content = f'<span>{csv_md}</span>'
        elif file_name.endswith(".pkl"):
            # PKL
            file_html_content = f'<span>Preview N/A</span>'
        else:
            file_html_content = f"<span>{file_contents.decode()}</span>"

        file_list.append({'name': file_name, "html_content": file_html_content})

    buffer_html = [ details_tml.format(**_file) for _file in file_list ]
    return "".join(buffer_html)

# Processing code interpreter response to html visualization.
def process_response(response: dict, save_files_locally = True) -> None:

  result_template = """
  <details open>
    <summary class='main_summary'>{summary}:</summary>
    <div><pre>{content}</pre></div>
  </details>
  """

  result = ""
  code = response.get('generated_code')
  if 'execution_result' in response and response['execution_result']!="":
    result = result_template.format(
        summary="Executed Code Output",
        content=response.get('execution_result'))
  else:
    result = result_template.format(
      summary="Executed Code Output",
      content="Code does not produce printable output.")

  if response.get('execution_error', None):
    result += result_template.format(
        summary="Generated Code Raised a (Possibly Non-Fatal) Exception",
        content=response.get('execution_error', None))

  result += result_template.format(
    summary="Files Created <u>(Click on filename to view content)</u>",
    content=parse_files_to_html(
        response.get('output_files', []),
        save_files_locally = True))

  display(
      IPython.display.HTML(
        ( f"{css_styles}"
f"""
<div id='main'>
    <div id="right">
      <h3>Generated Code by Code Interpreter</h3>
      <pre><code>{code}</code></pre>
    </div>
    <div id="left">
      <h3>Code Execution Results</h3>
      {result}
    </div>
</div>
"""
        )
      )
  )

#### `run_code_interpreter`
`run_code_interpreter` eases calling Code Interpreter by encoding files to base 64 (a Code Interpreter requirement) and submitting the files alongside the instructions. It also automates retries (5 by default) if the generated code doesn't execute or if Code Interpreter fails due to exceeding Gemini (time-based) quotas. Additionally, a global `CODE_INTERPRETER_WRITTEN_FILES` variable is populated by `run_code_interpreter` to aid with cleaning up files created by Code Intprereter.

**To use this functionality**  call `run_code_interpreter(instructions, filenames, retry_num, retry_wait_time)`
where `instructions` is the prompt for Code Interpreter, `filenames` is a list of local files in the working directory to submit to Code Interpreter, optionally `retry_num` if you want to change the default number of retries from 5, and optionally `retry_wait_time` if you want to change the default 15 second wait between retries.

In [None]:
from time import sleep

global CODE_INTERPRETER_WRITTEN_FILES
CODE_INTERPRETER_WRITTEN_FILES = []

def run_code_interpreter(instructions: str,
                         filenames: list[dict] = [],
                         retry_num: int = 5,
                         retry_wait_time: int = 15) -> dict['str', 'str']:

  global CODE_INTERPRETER_WRITTEN_FILES

  file_arr = [
      {
          "name": filename,
          "contents":  base64.b64encode(open(filename, "rb").read()).decode()
      }
      for filename in filenames
  ]

  attempts = 0
  res = {}

  while attempts <= retry_num:
    attempts += 1

    res = extension_code_interpreter.execute(
        operation_id = "generate_and_execute",
        operation_params = {
            "query": instructions,
            "files": file_arr
        },
    )

    CODE_INTERPRETER_WRITTEN_FILES.extend(
        [item['name'] for item in res['output_files']])

    if not res.get('execution_error', None):
      return res
    elif attempts <= retry_num:
      print(f"The generated code produced an error {res.get('execution_error')}"
            f" -Automatic retry attempt # {attempts}/{retry_num}")

#### `run_locally`
`run_locally` executes code generated by Code Interpreter.

**To use this functionality**  call `run_locally(response)` with the `response` object returned by Code Interpreter.

Note: to avoid unexpected issues you should always inspect generated code before you run it locally.

In [None]:
def run_locally(response):
  my_code = "\n".join(response['generated_code'].split('\n')[1:-1])
  exec(my_code)

## Step 2: Use Code Interpreter to Analyze Steam Reviews

In this section, you will specify a game title and parse some steam reviews for the title from store.steampowered.com.
Using the Code Interpreter extension, you will then perform automated analysis on the reviews.

In [None]:
#@markdown Specify the name of the game
game = "Palworld"  # @param {type: "string"}

### Prepare the Reviews Dataset

Now, grab the steam App ID for the game, if the game is supported on the platform. For this, we will do a Google Search to retrieve the Steam Game URL, and parse the ID out of the URL.

**Note:** if you are facing errors with importing googlesearch, make sure that you don't have any conflicting packages installed. This is the googlesearch module that's installed when running `pip install google`.

In [None]:
# Fetch steam review URL and the games App ID
from googlesearch import search

query = f"{game} steampowered.com "
steam_url = list()

for j in search(query, tld="com", num=1, stop=1, pause=1):
    print("URL: ",j)
    steam_url.append(j)

try:
  steam_url = steam_url[0].split('app/')[1]
  steam_appId = steam_url.split('/')[0]

  print("App ID: ", steam_appId)

except:
  print("Could not parse the steam ID out of the URL. The game is likely not supported on Steam.")
  steam_appId = None

Now, grab some reviews from steam.
The steam website loads infinitely and does not allow to search through the pages by the url. So we are limited to retrieving 10 hits for now.
To circumvent, we will set five different filters to get the reviews:
1. Top rated reviews of all time
2. Trending reviews today
3. Trending reviews this week
4. Trending reviews this month  
5. Most recent reviews

This will give us a total of 50 reviews to work with.


In [None]:
import requests
from bs4 import BeautifulSoup
import json

def get_steam_reviews(filter, num_reviews=10):
    """
    Fetches Steam reviews for a given filter and number of reviews.

    Args:
        filter (str): The filter type (e.g., 'toprated', 'trendweek').
        num_reviews (int): The desired number of reviews to fetch. Defaults to 10.

    Returns:
        list: A list of dictionaries, each representing a review with
            'author', 'content', 'rating', 'date', and 'hours_played' keys.
    """
    url = f'https://steamcommunity.com/app/{steam_appId}/reviews/?p=1&browsefilter={filter}'

    print("URL: ", url)

    reviews = []

    # Iterate over reviews until we have num_reviews
    while len(reviews) < num_reviews:
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')

        review_blocks = soup.find_all('div', class_='apphub_Card') #find all review cards

        for block in review_blocks:
            #print("\nReview Block: \n", block)

            # Author
            author_block = block.find('div', class_='apphub_CardContentAuthorName') #fetch author
            if author_block:
                author = author_block.text.strip()

            # Rating
            rating_block = block.find('div', class_='title') #fetch title
            if rating_block:
                rating = rating_block.text.strip()

            # Review Content
            content_block = block.find('div', class_='apphub_CardTextContent') #fetch content
            if content_block:
                content = content_block.text.strip()

            # Review Date
            date_block = content_block.find('div', class_='date_posted') #fetch date
            if date_block:
                date = date_block.text.replace('Posted:', '').strip()

            # Total Hours Played
            hours_block = block.find('div', class_='hours') #fetch total hours played
            if hours_block:
                hours_played = hours_block.text.strip()


            reviews.append({'author': author, 'content': content, 'rating': rating, 'date': date, 'hours_played' : hours_played})


            if len(reviews) >= num_reviews:
                break

    return reviews

topRated_reviews = get_steam_reviews('toprated')
trendWeek_reviews = get_steam_reviews('trendweek')
trendMonth_reviews = get_steam_reviews('trendmonth')
trendDay_reviews = get_steam_reviews('trendday')
mostRecent_reviews = get_steam_reviews('mostrecent')


Concatenate all the reviews in one single list:

In [None]:
all_reviews = topRated_reviews + trendWeek_reviews + trendMonth_reviews+ trendDay_reviews+ mostRecent_reviews

Write the reviews into a .csv file so you can parse it with the Code Interpreter extension.

In [None]:
import csv

filename = 'reviews.csv'

with open(filename, 'w', newline='') as csvfile:
    # Determine field names (header row)
    fieldnames = all_reviews[0].keys()

    # Create a DictWriter
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    # Write the header
    writer.writeheader()

    # Write the data rows
    writer.writerows(all_reviews)

Get the reviews in a pandas dataframe, so you can take a look into its content and inspect the reviews.

In [None]:
import pandas as pd

df = pd.read_csv('reviews.csv')
df.head(10)

### Let Code Interpreter do its Magic

Write a helper function to collect all of the assets created by a Vertex AI Extension. This will help later when generating the PDF Report and with cleaning up the generated files afterwards. For this purpose, this function collects the file names of any generated images from Code Interpreter Extension as well as the text outputs generated by the Vertex AI Search Extension.

In [None]:
output_list = []

def is_string(value):
    return isinstance(value, str)

def grab_outs(response):
  # Check if response is a string from Search Extension
  if is_string(response):
    output_list.append(response)

  # Else it's a dict output from Code Interpreter Extension
  else:
    for dict in response['output_files']:
      output_list.append(dict["name"])

You can call the Vertex AI Code Interpreter Extension to generate plots and graphs on your dataset. However, you can also ask the Code Interpreter extension to take a look at the dataset for you and generate a few ideas for insightful visualizations. The following cell prompts the Code Interpreter extension to save some plot ideas in the ideas.txt file:

In [None]:
response = run_code_interpreter(instructions=f"""
You are given a dataset of reviews. I want you to come up with some ideas for relevant visualization for this dataset.
Create natural language **instructions** and save them into the file ideas.txt
Please put your ideas as natural language **instructions** into the file ideas.txt
Do not generate any plots yourself.
""", filenames= ['reviews.csv'])
process_response(response)

That looks interesting! You could go ahead and parse these ideas automatically by another Code Interpreter extension call. We will see an optional cell below on how to do that. But for now, we want to reformulate things a bit, so let's go ahead and plot some of the ideas above:

In [None]:
response = run_code_interpreter(instructions=f"""
    You are given a dataset of reviews. Create a pie chart showing the following:
    - how many ratings have 'recommended' vs 'not recommended'?
    Save the plot with a descriptive name.
""", filenames= ['reviews.csv'])
process_response(response)


In [None]:
# Grab the output if it looks good.
grab_outs(response)

Easy peasy. But what if we want to generate a more complex plot with the Code Interpreter extension? You can try that with the next cell:

In [None]:
response = run_code_interpreter(instructions=f"""
    You are given a dataset of reviews. The hours_played column contains information on the total hours played, in the format '3,650.6 hrs on record' or '219.6 hrs on record'.
    Avoid and handle conversion errors, e.g. 'could not convert string to float: '3,650.6''.
    Make a plot that shows the relationship between hours played and the count of the ratings 'Not Recommended'.
    Put the hours_played into the different buckets 0-50, 50-100, 100-1000, >1000.
    Save the plot with a descriptive name.

    Make sure Plots have visible numbers or percentages when applicable, and labels.
    Make sure to avoid and handle the error 'Expected value of kwarg 'errors' to be one of ['raise', 'ignore']. Supplied value is 'coerce' '.
    Use >>> import warnings
    warnings.simplefilter(action='ignore', category=FutureWarning) <<< to avoid any FutureWarnings from pandas.

    """, filenames= ['reviews.csv'])
process_response(response)


In [None]:
# Grab the output if it looks good.
grab_outs(response)

###Optional: Plotting ideas.txt

**OPTIONAL**: You can also parse the plot ideas that Code Interpreter extension created and use it as a direct set of instructions when making another call to Code Interpreter.
Code Interpreter may generate some weird plots at this step - this is usually because the instructions are not clearly defined.

💡**Tip**: you will need to grab the instructions form the ideas.txt file and put them along in the prompt, instead of passing the file over in the filenames. Code Interpreter is not parsing any instructions from attached files.


In [None]:
with open('ideas.txt', 'r', encoding='utf-8') as file:
    ideas = file.read()

response = run_code_interpreter(instructions=f"""
    Create and save the following plots.
    Make sure each plot is in  its own file and do not overlay multiple plots so for every plot reset the process.
    Save the plot with a descriptive name.
    Make sure Plots have visible numebers or percentages, when applicable, and labels.
    Do not use the library 'wordcloud', it's not available. Skip an idea if it uses wordcloud.
    Make sure to avoid 'Rectangle.set() got an unexpected keyword argument 'kind''
    Make sure to surpress any user warnings.
    **If any of the following produces an exception make sure you catch and print it, and continue to the next item in the list**:
    {str(ideas)}
""", filenames= ['reviews.csv'])
process_response(response)

## Step 3: Use Vertex AI Search Extension to do a Qualitative Analysis of the Reviews

For using the Vertex AI Search Extension, please grant the [Vertex AI Extension Service agent](https://cloud.google.com/vertex-ai/docs/general/access-control#service-agents) the [permission needed](https://cloud.google.com/vertex-ai/docs/general/access-control#home-project) by following the UI instructions or by running the next cell.

To do so in the UI:
1. Go to https://console.cloud.google.com/iam-admin/iam
2. Make sure you're in the right project.
3. Enable the checkfield `Include Google-provided role grants`. This will show you the active service accounts in your project.
4. Locate the service agent with the name **Vertex AI Extension Service Agent**.
5. Click on the pen icon to edit the roles for this service agent.
6. Click on `add another role` and add **Discovery Engine Editor**.
7. Save the changes.


**Alternatively, run the next cell to assign the role to the Service Agent programmatically:**

In [None]:
%%bash -s "$PROJECT_ID"

# Get project number using gcloud
PROJECT_NUMBER=$(gcloud projects describe $1 --format="value(projectNumber)")

# Service agent email
SERVICE_AGENT_EMAIL="service-$PROJECT_NUMBER@gcp-sa-vertex-ex.iam.gserviceaccount.com"

# Role to add
ROLE="roles/discoveryengine.editor"

# Add the role using gcloud CLI (with the correct service agent email)
gcloud projects add-iam-policy-binding $1 \
    --member="serviceAccount:$SERVICE_AGENT_EMAIL" \
    --role=$ROLE


### Set Up Qualitative Review Dataset

Grab some more detailed reviews of the game for qualitative analysis. For this, you can use google search to get urls of the top 10 results for the game's reviews.

In [None]:
from googlesearch import search

# Search
query = f"{game} Reviews"
urls = list()

for j in search(query, tld="com", num=10, stop=10, pause=2):
    print(j)
    urls.append(j)

We want the Vertex AI Search extension to summarize the contents for us and to answer our questions. To do this, we could manually grab the above URLs and set up a data store for websites in the Google Cloud Console.

But, we want to ensure cleaner results. For this reason, first fetch the text contents from the websites, then store the .txt files in your Google Cloud Storage Bucket.

The following cell lets you grab the contents from the websites and write them into .txt files. Then, these files will be uploaded to your GCS bucket.

In [None]:
import requests
import os
from bs4 import BeautifulSoup
from google.cloud import storage

def url_txt_to_gcs(id, url, filename, bucket_name):

    headers = {'User-Agent': 'Mozilla/5.0'}
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')

    # Extract all text content
    all_text = soup.get_text(separator='\n', strip=True)

    # Save to .txt file
    with open(filename, "w", encoding='utf-8') as file:
        file.write(id +"\n"+ all_text)

    # Upload
    client = storage.Client()
    bucket = client.get_bucket(bucket_name)
    blob = bucket.blob(filename)

    # Assuming the file is in the root of your Colab temp directory
    local_temp_path = os.path.join(filename)
    blob.upload_from_filename(local_temp_path)

    print(f"File uploaded to gs://{bucket_name}/{filename}")


# Upload the website content .txt files into GCS
txt_files = []

for idx, url in enumerate(urls):
  id = "doc-"+str(idx)
  filename = f"website_text_{idx}.txt"
  txt_files.append(f"website_text_{idx}.txt")
  url_txt_to_gcs(id, url, filename, GCS_BUCKET)

### Create a Search Data Store and Ingest your Files

The Vertex AI Search extension needs a **Data Store** and **Vertex Search Engine** to run. [You can learn more about Data Stores and Vertex Search Engines here](https://cloud.google.com/generative-ai-app-builder/docs/create-datastore-ingest).

The following cells will help you in the setup.

In [None]:
# @markdown Specify an id for your datastore. It should only use lowercase letters.
data_store_id = "gamereview-extensions" # @param {type:"string"}

Use the following bash command to **create** your Data Store:

In [None]:
%%bash -s "$PROJECT_ID" "$data_store_id"

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: $1" \
"https://discoveryengine.googleapis.com/v1alpha/projects/$1/locations/global/collections/default_collection/dataStores?dataStoreId=$2" \
-d '{
  "displayName": "GameReview-Extensions-Store",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
  "contentConfig": "CONTENT_REQUIRED",
}'

🎉 Your Data Store is all set! You can inspect it under: https://console.cloud.google.com/gen-app-builder/data-stores

Now you just need to **ingest** your .txt files with the website contents into it by running the cell below.

**This process can take somewhere between 5-10 mins.** The cell will finish running once the ingestion is done.

In [None]:
from typing import Optional

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

def import_documents_sample(
    project_id: str,
    location: str,
    data_store_id: str,
    gcs_uri: Optional[str] = None,
) -> str:
    """Imports documents into a Vertex AI data store from GCS.

    This function imports documents into a specified data store within Vertex AI Agent Builder
    from a GCS bucket. It uses the incremental reconciliation
    mode, which adds new documents and updates existing ones.

    Args:
        project_id: The ID of the Google Cloud project.
        location: The region where the data store is located (e.g., "us-central1").
        data_store_id: The ID of the data store.
        gcs_uri: The GCS URI of the documents to import (e.g., "gs://my-bucket/docs/*.txt").

    Returns:
        str: The name of the long-running operation that imports the documents.

    Raises:
        google.api_core.exceptions.GoogleAPICallError: If the API call fails.

    """

    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DocumentServiceClient(client_options=client_options)

    # The full resource name of the search engine branch.
    # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
    parent = client.branch_path(
        project=project_id,
        location=location,
        data_store=data_store_id,
        branch="default_branch",
    )

    request = discoveryengine.ImportDocumentsRequest(
        parent=parent,
        gcs_source=discoveryengine.GcsSource(
            input_uris=[gcs_uri], data_schema="content"
        ),
        # Options: `FULL`, `INCREMENTAL`
        reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
    )


    # Make the request
    operation = client.import_documents(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # Once the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name


gcs_uri = f"gs://{GCS_BUCKET}/*.txt" # grabs all the .txt files we generated
import_documents_sample(PROJECT_ID, 'global', data_store_id, gcs_uri)

### Connect Data Store to a Vertex AI Search Engine

The following cell let's you create a Vertex AI Search Engine to connect to your newly created Data Store. For the Vertex AI Search Extension to work, we need to enable Enterprise features by setting `"searchTier": "SEARCH_TIER_ENTERPRISE" `and Advanced LLM Features by setting `"searchAddOns": ["SEARCH_ADD_ON_LLM"]` in the code cell below.

**These settings will be set automatically by running the cell below.**






In [None]:
%%bash -s "$PROJECT_ID" "$data_store_id"

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: $1" \
"https://discoveryengine.googleapis.com/v1/projects/$1/locations/global/collections/default_collection/engines?engineId=$2" \
-d '{
  "displayName": "game-review-engine",
  "dataStoreIds": ["'$2'"],
  "solutionType": "SOLUTION_TYPE_SEARCH",
  "searchEngineConfig": {
     "searchTier": "SEARCH_TIER_ENTERPRISE",
     "searchAddOns": ["SEARCH_ADD_ON_LLM"]
   }
}'

### Set up the Vertex AI Search Extension

Your Data Store and Search Engine are all set. Now you just need to create an instance of the Vertex AI Search Extension by running the cell below.


In [None]:
# Construct an object that points to the relevant data store
DATASTORE = f"projects/{PROJECT_ID}/locations/global/collections/default_collection/dataStores/{data_store_id}/servingConfigs/default_search"

# Instantiate extension
extension_vertex_ai_search = extensions.Extension.from_hub(
    "vertex_ai_search",
    runtime_config={
        "vertex_ai_search_runtime_config": {
            "serving_config_name": DATASTORE,
        }
    })

extension_vertex_ai_search

The following is a helper function. We can let the Vertex AI Search Engine generate an answer for our prompt directly. However, for a more descriptive response, we can retrieve the segment matches provided by the search engine and let Gemini generate an answer over it.

In [None]:
from vertexai.preview.generative_models import GenerativeModel, Part
import vertexai.preview.generative_models as generative_models
model = GenerativeModel("gemini-1.0-pro-001")

# Helper function
def get_vertexSearch_response(QUERY, mode):
  """Queries Vertex AI Search and generates a response using either Vertex Search or Gemini.

  This function takes a query and a mode as input. It first sends the query to Vertex AI Search.
  Depending on the specified mode, it either:

  - Returns the extractive answers directly from Vertex AI Search (mode='vertex').
  - Uses the extractive segments from Vertex AI Search as context for Gemini to generate a more
    comprehensive response (mode='gemini').

  Args:
      QUERY: The query string to send to Vertex AI Search.
      mode: The response generation mode, either 'vertex' or 'gemini'.

  Returns:
      str: The generated response, either from Vertex AI Search or Gemini.

  Raises:
      ValueError: If the `mode` is not 'vertex' or 'gemini'.
      vertexai.preview.generative_models.errors.GenerativeModelError: If the Gemini API call fails.
  """
  vertex_ai_search_response = extension_vertex_ai_search.execute(
    operation_id = "search",
    operation_params = {"query": QUERY},
  )

  # Let Vertex Search Extension generate a response
  if mode == 'vertex':
    list_extractive_answers = []
    for i in vertex_ai_search_response:
      list_extractive_answers.append(i["extractive_answers"][0])
      return list_extractive_answers


  # Let Gemini generate a response over the Vertex Search Extension segments
  elif mode == 'gemini':
    list_extractive_segments = []

    for i in vertex_ai_search_response:
      list_extractive_segments.append(i["extractive_segments"][0])

    prompt = f"""
    Prompt: {QUERY};
    Contents: {str(list_extractive_segments)}
    """

    res = model.generate_content(
        prompt,
        generation_config={
            "max_output_tokens": 2048,
            "temperature": 0.1,
            "top_p": 1
        },
        safety_settings={
              generative_models.HarmCategory.HARM_CATEGORY_HATE_SPEECH: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
              generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
              generative_models.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
              generative_models.HarmCategory.HARM_CATEGORY_HARASSMENT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
        },
        stream=False,
      )

    return res.text

### Use Vertex AI Search Extension to answer Questions and retrieve Summaries

Now you can run Vertex AI Search Extension. The cell below demonstrates an output of Vertex AI Search Engine without Gemini.

ㅤ

❗❗❗ **NOTE:** if you are facing the following error:

`FailedPrecondition: 400 Cannot use enterprise edition features (website search, multi-modal search, extractive answers/segments, etc.) in a standard edition search engine...`


when running the cell below, simply wait a few minutes and try to run the cell again. That means the settings from the Vertex AI Search Engine creation have not yet propagated to the system. ❗❗❗

In [None]:
QUERY = f"What are some negative review points for {game}?" # @param {type:"string"}

search_res = get_vertexSearch_response(QUERY, mode='vertex')

search_res

The following cell highlights the differences between the pure Vertex AI Search Extension output above, and the hybrid response generated with Gemini below:

In [None]:
QUERY = f"List 10 positive review points for {game}"

response = get_vertexSearch_response(QUERY, mode='gemini')

print(response)

# Grab the output for report generation
grab_outs(response)

Looks good. Collect more information from the website contents by giving the extension some more prompts:

In [None]:
QUERY = f"List 10 negative review points for {game}"

response = get_vertexSearch_response(QUERY, mode='gemini')

response

# Grab the output for report generation
grab_outs(response)

In [None]:
QUERY = f"Provide a summary description of the game {game}"

response = get_vertexSearch_response(QUERY, mode='gemini')

response

# Grab the output for report generation
grab_outs(response)

## Step 4: Populate your Results in a PDF Report

Now it's time to put everything together. We have collected the generated responses (both images and texts) from Vertex AI Code Interpreter and Search Extensions.



In [None]:
output_list

Next you need to fetch the image filenames from the output_list:

In [None]:
imgs_files = []
other_files = []
txt_outs = []

for element in output_list:
  if ".png" in element or ".jpg" in element or ".jpeg" in element:

    # Ignore images with code_execution in filename (these are doubles)
    if "code_execution" in element:
      other_files.append(element)

    else:
    # Grab image filenames
      imgs_files.append(element)

  else:
    # Get text outputs
    txt_outs.append(element)

Upload the images to GCS to get a public URL:

In [None]:
from google.cloud import storage

def upload_to_gcs(local_file, bucket_name, blob_name):
    """
    Upload a file to GCS bucket.
    """
    client = storage.Client()
    bucket = client.get_bucket(bucket_name)
    blob = bucket.blob(blob_name)
    blob.upload_from_filename(local_file)

def get_public_url(bucket_name, blob_name):
    """
    Get the public URL of a file in the GCS bucket.
    """
    client = storage.Client()
    bucket = client.get_bucket(bucket_name)
    blob = bucket.blob(blob_name)
    return blob.public_url

# Upload to GCS
gcs_img_files = [] #collect public image urls in this list

for image in imgs_files:
  # Upload Image
  upload_to_gcs(image, GCS_BUCKET, image)

  # Get Image public URL
  public_image_url = get_public_url(GCS_BUCKET, image)
  gcs_img_files.append(public_image_url)
  print(public_image_url)

### Generate the Report with Vertex AI Code Interpreter Extension

With the collected text outputs and the public URLs of the images, you can ask Code Interpreter extension to generate a compelling PDF Report. For this, let it generate a .html file first - you can convert it to PDF in the next cells.

In [None]:
response = run_code_interpreter(instructions=f"""
    You are a report generator. Given a list of filenames and strings, create an interesting report in html language and save it to report.html.
    The report revolves around reviews for the game {game}.

    Structure the report with proper headings. Don't use 'String' as a heading.
    Write the whole report in natural language. You are allowed to use bullet points.
    Start the report with a summary of the game {game}
    Embed the images directly in the html and include image descriptions.

    The contents you can use are these, including images (the filenames indicate the image content):
    {gcs_img_files}

    And string contents:
    {txt_outs}
    """)
process_response(response)


Convert the html to a .pdf file:

In [None]:
import xhtml2pdf.pisa as pisa

with open("report.html") as infile, open("report.pdf", "w+b") as outfile:
    pisa.CreatePDF(infile, outfile)

Your report.pdf is now generated and saved within your (Colab) environment.

## [OPTIONAL] Step 5: Google Workspace APIs (Outside Colab)

This section shows how you can store your generated PDF report in your Google Drive, and how you can send the report as an attachment via Gmail.

🚨 **You will need to run this section outside of Colab in a local environment, as we will set up the API Credentials for a Desktop App.**🚨

For this, you need to configure the Google Workspace API and credentials first.

You can check out the [Python Quick Start Guide](https://developers.google.com/gmail/api/quickstart/python) for more details.

ㅤ

👣 **Steps for setting up the scopes:**
1. [Go to the OAuth consent screen in your project](https://console.cloud.google.com/apis/credentials/consent)
1. For User type select external, then click Create.
1. Complete the app registration form by adding an app name, and adding your email to the user support email & developer contact information, then click Save and Continue.
1. Click on `Add or Remove Scopes`
1. In the filter search bar of the selected scopes window, search for drive and enable the Scope https://www.googleapis.com/auth/drive
1. Now search for Gmail and enable the Scope https://www.googleapis.com/auth/gmail.send
1. Click on Save and Continue.
1. In the Test Users window, add your own Google email address as a User by clicking `Add Users`, then click on Save and Continue.
1. Review your app registration summary. To make changes, click Edit. If the app registration looks OK, click Back to Dashboard.

ㅤ


👣  **Steps for retrieving authorized credentials:**
1. Go to [Credentials](https://console.cloud.google.com/apis/credentials) in the GCP console.
1. Click Create Credentials > OAuth client ID.
1. Click Application type > Desktop app.
1. In the Name field, type a name for the credential. This name is only shown in the Google Cloud console.
1. Click Create. The OAuth client created screen appears, showing your new Client ID and Client secret.
1. Click OK. The newly created credential appears under OAuth 2.0 Client IDs.
1. Save the downloaded JSON file as credentials.json, and move the file to your working directory.




After that, you can run the following cell to get your creds variable by parsing the credentials.json file:

In [None]:
import os
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from google.oauth2 import credentials

SCOPES = ['https://mail.google.com/', 'https://www.googleapis.com/auth/gmail.send', 'https://www.googleapis.com/auth/drive']

creds = None
# Token file typically stores credentials for reuse
token_file = 'token.json'

# Check if authorized credentials exist
if os.path.exists(token_file):
    creds = credentials.Credentials.from_authorized_user_file(token_file, SCOPES)
# If not, or credentials are invalid, trigger the authorization flow
if not creds or not creds.valid:
    if creds and creds.expired and creds.refresh_token:
        creds.refresh(Request())
    else:
        flow = InstalledAppFlow.from_client_secrets_file(
        "credentials.json", SCOPES
        )
        creds = flow.run_local_server(port=0)
    # Save the credentials for the next run
    with open("token.json", "w") as token:
        token.write(creds.to_json())




### Uploading Report to Google Drive
This section let's you upload the generated PDF report to your Google Drive. It will first create a new folder for you (specify the folder name in the next cell) and upload the PDF file to that folder.

In [None]:
# @markdown Provide the folder name on Google Drive where the PDF should be saved into:

folder_name = 'extensions-demo' # @param {type:"string"}

The following function lets you create a new folder in Google Drive:

In [None]:
import os
from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload

def create_folder(folder_name):
    """Creates a folder in Google Drive.
    This function uses the Google Drive API to create a new folder with the specified name.

    Args:
        folder_name: The name of the folder to create.

    Returns:
        str: The ID of the newly created folder.
    """
    drive_service = build('drive', 'v3', credentials=creds)

    file_metadata = {
        'name': folder_name,
        'mimeType': 'application/vnd.google-apps.folder'
    }
    folder = drive_service.files().create(body=file_metadata, fields='id').execute()
    return folder.get('id')



In [None]:
# Create your folder
folder_id = create_folder(folder_name)

Lastly, upload your report.pdf to your new Google Drive Folder. The next function will help you upload a specified file to your newly created folder:

In [None]:
def upload_file(file_path, folder_id):
    """Uploads a file to a specific folder in Google Drive.

    This function uses the Google Drive API to upload a file from the local filesystem
    to a specified folder in Google Drive. It automatically determines the appropriate
    MIME type based on the file extension.

    Args:
        file_path: The path to the file to upload.
        folder_id: The ID of the folder to upload the file to.

    Returns:
        str: The ID of the uploaded file.
    """
    # Build the Gmail API service object
    drive_service = build('drive', 'v3', credentials=creds)

    file_metadata = {
        'name': os.path.basename(file_path),
        'parents': [folder_id]
    }

    # Determine MIME type based on file extension
    extension = os.path.splitext(file_path)[1].lower()
    if extension in ['.jpg', '.jpeg', '.png']:
        mime_type = 'image/jpeg'  # Adjust for other image types if needed
    elif extension == '.pdf':
        mime_type = 'application/pdf'
    else:
        mime_type = 'application/octet-stream'  # Generic fallback

    media = MediaFileUpload(file_path, mimetype=mime_type, resumable=True)
    file = drive_service.files().create(body=file_metadata, media_body=media, fields='id').execute()
    print(f'File uploaded to Drive: {file.get("id")}')

    return file.get("id")

In [None]:
# Upload file to Google Drive folder
file_id = upload_file('report.pdf', folder_id)

### Sending the Report via Gmail
The following sections show how to attach the generated PDF report to an email and send it to a recipient with the Gmail API.

Grab the contents of the pdf report:

In [None]:
import os

def read_pdf_file(filename):
    with open(filename, 'rb') as f:
        pdf_data = f.read()
    return pdf_data

pdf_filename = "report.pdf"  # Path to your PDF in Colab
pdf_data = read_pdf_file(pdf_filename)


Funciton to parse the pdf contents into a raw message for the e-mail attachment:

In [None]:
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.base import MIMEBase
from email import encoders
import base64

def create_message_with_attachment(sender, to, subject, body, filename, attachment):
    message = MIMEMultipart()
    message['to'] = to
    message['from'] = sender
    message['subject'] = subject

    msg_body = MIMEText(body, 'plain')
    message.attach(msg_body)

    part = MIMEBase('application', 'octet-stream')  # For PDFs
    part.set_payload(attachment)
    encoders.encode_base64(part)
    part.add_header('Content-Disposition', f'attachment; filename={filename}')
    message.attach(part)

    raw_message = base64.urlsafe_b64encode(message.as_bytes()).decode()
    return {'raw': raw_message}


#### Setting up e-mail configuration
Provide the recipient and run the next cell to get a API token for accessing Gmail.

In [None]:
# Provide the details for constructing your e-mail

recipient = 'recipient@domain.com' #@param {type: 'string'}

#### Send the e-mail
📧 Now you can send the e-mail with the attached pdf report:

In [None]:
from googleapiclient.discovery import build

# Build the Gmail API service object
service = build('gmail', 'v1', credentials=creds)

# Provide the details for constructing your e-mail
subject = f"{game} Review Analysis Report"
body = f"Attached is the Report on the Review Analysis for {game}"

# Construct e-mail
message = create_message_with_attachment('me', recipient,
                                          subject, body,
                                          pdf_filename, pdf_data)

# Send e-mail
service.users().messages().send(userId='me', body=message).execute()
print("Email sent!")

# 🧹 Cleaning up

Clean up resources created in this notebook.

Remove the extensions instances created in this notebook by running the cell below:  

In [None]:
extension_code_interpreter.delete()
extension_vertex_ai_search.delete()

You can run the next cell to get a list of all other remaining Vertex AI Extension Instances in your environment:

In [None]:
extensions.Extension.list()

Alternatively, you can uncomment the following code block to delete all active extensions in your project, by using the IDs above to clean up:

In [None]:
#clean_ids = []

#for element in extensions.Extension.list():
    #clean_ids.append(str(element).split("extensions/")[1])

#for id in clean_ids:
   #extension = extensions.Extension(id)
   #extension.delete()

Uncomment below to delete your public GCS Bucket by first deleting all files in it, then deleting the bucket itself:

❗❗❗ Only run the below cells if you created a new bucket just for this notebook ❗❗❗

In [None]:
from google.cloud import storage

def empty_bucket(bucket_name):
    """Deletes all objects in the specified GCS bucket."""
    client = storage.Client()
    bucket = client.get_bucket(bucket_name)

    blobs = bucket.list_blobs()  # List all blobs (objects)
    for blob in blobs:
        blob.delete()  # Delete each blob

    print(f"Bucket {bucket_name} emptied.")

In [None]:
## Empty the bucket by deleting all files in it
empty_bucket(GCS_BUCKET)

## Create a client object
client = storage.Client(project=PROJECT_ID)

## Get the bucket object
bucket = client.get_bucket(GCS_BUCKET)

## Delete the bucket
bucket.delete()

print(f"Bucket {GCS_BUCKET} deleted successfully.")

Now, delete all the assets generated by the Vertex AI extensions. First, let's get the filenames:

In [None]:
files = imgs_files + other_files

for i in range (10):
  files.append(f'website_text_{i}.txt')

files.append('report.html')
files.append('report.pdf')
files.append('reviews.csv')
files.append('ideas.txt')
files

Next, delete the files:

In [None]:
import os

for file in files:
  try:
    os.remove(file)
  except FileNotFoundError as e:
    print(e)
    print('Skipping.')

Delete your newly created Google Drive folder and the file in it:

In [None]:
from googleapiclient.discovery import build

# Delete the file with file_id
drive_service = build('drive', 'v3', credentials=creds)  # Assuming 'creds' are set up
drive_service.files().delete(fileId=file_id).execute()
print(f"File with ID {file_id} deleted.")

# Delete the folder with folder_id
drive_service = build('drive', 'v3', credentials=creds)
drive_service.files().delete(fileId=folder_id).execute()
print(f"Folder with ID {folder_id} deleted.")

Delete your Google Cloud CLI ADC Configuration, if you no longer need it, by running:

`$ gcloud config configurations delete CONFIG_NAME`


❗❗❗ Don't forget to delete any other created assets if you don't need them, e.g.

*   Your Vertex Search Engine: https://console.cloud.google.com/gen-app-builder/apps
*   Your Data Store: https://console.cloud.google.com/gen-app-builder/data-stores
