In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Game Review Analysis Workflow with Vertex Extensions

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/vertex_extensions/business_analyst_workflow_vertex_extensions.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fmain%2Fgenai-on-vertex-ai%2Fvertex_extensions%2Fbusiness_analyst_workflow_vertex_extensions.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>    
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/applied-ai-engineering-samples/main/genai-on-vertex-ai/vertex_extensions/business_analyst_workflow_vertex_extensions.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples/tree/main/genai-on-vertex-ai/vertex_extensions/business_analyst_workflow_vertex_extensions.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

| | |
|----------|-------------|
| Author(s)   | [Meltem Subasioglu](https://github.com/5Y5TEM)|
| Reviewers(s) | Yan Sun |
| Last updated | 2024-04-08: Initial Publication |

# Overview

[Vertex AI Extensions](https://cloud.google.com/vertex-ai/docs/generative-ai/extensions/private/overview) is a platform for creating and managing extensions that connect large language models to external systems via APIs. These external systems can provide LLMs with real-time data and perform data processing actions on their behalf.

In this tutorial, you'll use vertex extensions to complete a review analysis of a steam game:

- Retrieve 50 reviews from steam on the game
- Create a pre-built code interpreter extension in your project
- Use code interpreter to analyze the reviews and generate plots
- Retrieve 10 websites with more detailed reviews on the game
- Create and use the vertex AI search extension to research and summarize  the website reviews
- Use code interpreter to build a report with all the generated assets
- Convert the report to PDF and upload to your Google Drive  

▶ If you're already familiar with Google Cloud and the Vertex Extensions Code Interpreter Extension, you can skip reading between here and the "**Getting Started**" section.

## Vertex AI Extensions

[Vertex AI Extensions](https://cloud.google.com/vertex-ai/generative-ai/docs/extensions/overview) is a platform for creating and managing extensions that connect large language models to external systems via APIs. These external systems can provide LLMs with real-time data and perform data processing actions on their behalf. You can use pre-built or third-party extensions in Vertex AI Extensions.

## Vertex AI Extensions Code Interpreter Extension

The [Code Interpreter](https://console.cloud.google.com/vertex-ai/generative-ai/docs/extensions/google-extensions.md#google_code_interpreter_extension) extension provides access to a Python interpreter with a sandboxed, secure execution environment that can be used with any model in the Vertex AI Model Garden. This extension can generate and execute code in response to a user query or workflow. It allows the user or LLM agent to perform various tasks such as data analysis and visualization on new or existing data files.

You can use the Code Interpreter extension to:

* Generate and execute code.
* Perform a wide variety of mathematical calculations.
* Sort, filter, select the top results, and otherwise analyze data (including data acquired from other tools and APIs).
* Create visualizations, plot charts, draw graphs, shapes, print results, etc.

## Using this Notebook

Colab is recommended for running this notebook, but it can run in any iPython environment where you can connect to Google Cloud, install pip packages, etc.

If you're running outside of Colab, depending on your environment you may need to install pip packages (like pandas) that are included in the Colab environment by default but are not part of the Python Standard Library. You'll also notice some comments in code cells that look like #@something -- these may contain informative text

This tutorial uses the following Google Cloud services and resources:

* Vertex AI Extensions
* Google Cloud Storage Client
* Google Drive Client

This notebook has been tested in the following environment:

* Python version = 3.10.12
* [google-cloud-aiplatform](https://pypi.org/project/google-cloud-aiplatform/) version = 1.46.0

## Useful Tips

1. This notebook is using Generative AI cababilities. Re-running a cell that uses Generative AI capabilities may produce similar but not identical results.
2. Because of #1, it is possible that an output from Code Interpreter producess errors. If that happens re-run the cell that produced the coding error. The different generated code will likely be bug free.
3. If you see a session error when using the extension, try re-running the cell.
4. The use of Extensions and other Generative AI capabilities is subject to service quotas. Running the notebook using "Run All" may exceed  your Queries per minute (QPM) limitations. Run the notebook manually and if you get a quota error pause for up to 1 minute before retrying that cell. Code Interpreter uses Gemini on the backend and is subject to the Gemini quotas, [view your Gemini quotas here](https://console.cloud.google.com/iam-admin/quotas?pageState=(%22allQuotasTable%22:(%22f%22:%22%255B%257B_22k_22_3A_22_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22base_model_5C_22_22%257D_2C%257B_22k_22_3A_22_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22gemini_5C_22_22%257D%255D%22%29%29&e=13802955&mods=logs_tg_staging).


💁 Extra tips:
- The Code Interpreter runs in a sandbox environment. So try to avoid prompts that need additional python packages to run or tell the Code Interpreter to ignore anything that needs packages beyond the built-in ones
- Tell the Code Interpreter to catch and print any exceptions for you
- For debugging the output of the Vertex Code Interpreter extension, it usually helps copying the error message into the prompt and telling the extension to properly handle that error.

# Getting Started

The following steps are necessary to run this notebook, no matter what notebook environment you're using.

If you're entirely new to Google Cloud, [get started here](https://cloud.google.com/docs/get-started).

## Google Cloud Project Setup

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.
1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).
1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

## Google Cloud Permissions
Make sure you have been [granted the following roles](https://cloud.google.com/iam/docs/granting-changing-revoking-access) for the GCP project you'll access from this notebook:
* [`roles/aiplatform.user`](https://cloud.google.com/vertex-ai/docs/general/access-control#aiplatform.user)

## Install Vertex AI SDK and other required packages


In [None]:
!pip install google-cloud-discoveryengine --upgrade
!pip install google-cloud-aiplatform --upgrade
!pip install weasyprint

### Restart runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.

You may see the restart reported as a crash, but it is working as-intended -- you are merely restarting the runtime.

The restart might take a minute or longer. After it's restarted, continue to the next step.

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>


## Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the cell below to authenticate your environment.

In [None]:
import sys
from google.auth import default
from google.colab import auth as google_auth

if "google.colab" in sys.modules:
    google_auth.authenticate_user()

creds, _ = default()

## Outside of Colab: Install the Google Cloud CLI

If you are running this notebook in your own environment, you need to install the [Cloud SDK](https://cloud.google.com/sdk) (aka `gcloud`).

## Set Google Cloud project information and initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

You will further need a GCS bucket. Learn more about [creating a Google Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets).

**Note:** For the scope of this Notebook, set the access permissions for your newly created bucket to public. This is needed to embed generated images into the pdf report further below. Alternatively, you can prompt the Code Interpreter to input the image links into the report instead of embedding them directly. 

In [None]:
import vertexai

PROJECT_ID = "MY_PROJECT_ID"  # @param {type:"string"}
REGION = "us-central1"  # @param {type: "string"}
API_ENV = "aiplatform.googleapis.com"  # @param {type:"string"}
GCS_BUCKET = "MY_GCS_BUCKET_NAME"  # @param {type:"string"}

!gcloud config set project {PROJECT_ID}


vertexai.init(
    project=PROJECT_ID,
    location=REGION,
    api_endpoint=f"{REGION}-{API_ENV}",
)

## Import libraries

In [None]:
from vertexai.preview import extensions
from vertexai.generative_models import GenerativeModel

# Using Extensions to Analyze Game Reviews - Tutorial

## Step 1: Create a Code Interpreter Extension

Now you can create the extension. The following cell uses the Python SDK to import the extension (thereby creating it) in Vertex AI Extensions.

In [None]:
extension_code_interpreter = extensions.Extension.from_hub("code_interpreter")
extension_code_interpreter

### Helper Functions
These functions are optional when using Code Interpreter but make it easier to inspect Code Interpreter's output, assemble Code Interprer requests, and run generated code.

In [None]:
import base64
import io
import json
from PIL import Image
import pprint
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Struct
import pandas
import sys
import os
import IPython
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO

css_styles = """
<style>
.main_summary { font-weight: bold; font-size: 14px; color: #4285F4;  background-color:rgba(221, 221, 221, 0.5); padding:8px;}
.main_summary:hover {background-color: rgba(221, 221, 221, 1);}
details { background-color:#fff; border: 1px solid #E8EAED; padding:0px; margin-bottom:2px; }
details img {width:50%}
details > div {padding:10px; }
div#left > details:first-of-type > div {overflow:auto; max-height:400px; }
div#right > pre {overflow:auto; max-height:600px; background-color: ghostwhite; padding: 10px; }
details details > div { overflow: scroll; max-height:400px}
details details { background-color:rgba(246, 231, 217, 0.2);  border: 1px solid #FBBC04;}
details details > summary { padding: 8px; background-color:rgba(255, 228, 196, 0.6); }
details details > summary:hover { background-color:rgba(255, 228, 196, 0.9); }
div#left {width: 64%; padding:0 1%;  }
div#right {border-left: 1px solid silver; width: 30%; float: right; padding:0 1%; }
body {color: #000; background-color: white; padding:10px 10px 40px 10px; }
#main { border: 1px solid #FBBC04; padding:10px 0; display: flow-root; }
h3 {color: #000; }
code  { font-family: monospace; color: #900; padding: 0 2px; font-size: 105%; }
</style>
        """
# Parser that helps visualise the content of the returned files as HTML
def parse_files_to_html(outputFiles, save_files_locally = True):
    IMAGE_FILE_EXTENSIONS = set(["jpg", "jpeg", "png"])
    file_list = []

    if not outputFiles:
      return "No Files generated from the code"
    # Sort the output_files so images are displayed before other files such as JSON
    for output_file in sorted(
        outputFiles,
        key=lambda x: x["name"].split(".")[-1] not in IMAGE_FILE_EXTENSIONS,
    ):
        file_name = output_file.get("name")
        file_contents = base64.b64decode(output_file.get("contents"))
        if save_files_locally:
          open(file_name,"wb").write(file_contents)

        if file_name.split(".")[-1] in IMAGE_FILE_EXTENSIONS:
            # Render Image
            file_html_content = f'<img src="data:image/png;base64, {output_file.get("contents")}" />'
        elif file_name.endswith(".json"):
            # Pretty print JSON
            file_html_content =  f'<span>{pprint.pformat(json.loads(file_contents.decode()), compact=False, width=160)}</span>'
        elif file_name.endswith(".csv"):
            # CSV
            file_html_content = f'<span>{pandas.read_csv(StringIO(file_contents.decode())).to_markdown(index=False)}</span>'
        elif file_name.endswith(".pkl"):
            # PKL
            file_html_content = f'<span>Preview N/A</span>'
        else:
            file_html_content = f"<span>{file_contents.decode()}</span>"


        file_list.append({'name': file_name, "html_content": file_html_content})

    buffer_html = [ f"<details><summary>{_file.get('name')}</summary><div>{_file.get('html_content')}</div></details>" for _file in file_list ]


    return "".join(buffer_html)


# Comments
# @title #### Helper function process_response(response) { form-width: "35%", display-mode: "both" }
# @markdown Here we are defining functions that help processing and visualising the response from our extension within colab environment. \
# @markdown This is an optional process but it helps you: \
# @markdown - Visualise the code generated by Code Interpreter
# @markdown - Visualise the executed code output or Exceptions
# @markdown - Visualise any files generated from Code Interpreter \
# @markdown - Save all generated files locally \

# @markdown To use this functionality simply call **process_response(response)** \
# @markdown where **response** is the code interpreter response object
# @markdown
def process_response(response: dict, save_files_locally = True) -> None:

  result_template = "<details open><summary class='main_summary'>{summary}:</summary><div><pre>{content}</pre></div></details>"
  result = ""
  code = response.get('generated_code')
  if response.get('execution_error', None):
    result = result_template.format(summary="An error occured when executing code", content=response.get('execution_error', None))
  else:
    if 'execution_result' in response and response['execution_result']!="":
      result = result_template.format(summary="Executed Code Output", content=response.get('execution_result'))
    else:
      result = result_template.format(summary="Executed Code Output", content="Code did not produce printable output")

    result += result_template.format(summary="Files Created <u>(Click on filename to view content)</u>", content=parse_files_to_html(response.get('output_files', []),  save_files_locally = True))

  display(
      IPython.display.HTML(
        ( f"{css_styles}"
f"""
<div id='main'>
    <div id="right"><h3>Generated Code by Code Interpreter</h3><pre><code>{code}</code></pre></div>
    <div id="left"><h3>Code Execution Results</h3>{result}</div>
</div>
"""
        )
      )
  )



In [None]:
# Comments
# @title #### Helper function to call code interpreter { form-width: "35%", display-mode: "both" }
# @markdown ### **run_code_interpreter(instructions: str, filenames: list[dict])**
# @markdown run_code_interpreter helps you call the code interpreter by submitting local files and instructions on what to do. \
# @markdown The function will deal with encoding the file content to base 64 and add it to the request payload. \
# @markdown Additionally if there is an error the function retries to up to `retry_num` (default is 2). \

def run_code_interpreter(instructions: str, filenames: list[dict] = [], retry_num = 2):
  file_arr = [{"name": filename, "contents":  base64.b64encode(open(filename, "rb").read()).decode()} for filename in filenames]

  attempts = 0
  while attempts <= retry_num:
    attempts += 1
    res = extension_code_interpreter.execute(
        operation_id = "generate_and_execute",
        operation_params = {
            "query": instructions,
            "files": file_arr
        },
    )
    if not res['execution_error']:
      return res

  return res

# @markdown \
# @markdown ### **run_locally(instructions: str, filenames: list[dict])**
# @markdown run_locally executes the code generated by code interpreter locally (helps better understand the code behaviour and debugging)
def run_locally(instructions: str, filenames: list[dict]):
  response = run_code_interpreter(instructions= instructions, filenames= filenames)
  my_code = "\n".join(response['generated_code'].split('\n')[1:-1])
  exec(my_code)




## Step 2: Use Code Interpreter to Analyze Steam Reviews

In this section, you will specify a game title and parse some steam reviews for the title from store.steampowered.com.
Using the Code Interpreter Extension, you will then perform some automated analysis on the reviews.

In [None]:
#@markdown Specify the name of the game
game = "Palworld"  # @param {type: "string"}

### Prepare the Reviews Dataset

Now, grab the steam App ID for the game, if the game is supported on the platform. For this, we will do a Google Search to retrieve the Steam Game URL, and parse the ID out of the URL.

**Note:** if you are facing errors with importing googlesearch, make sure that you don't have any conflicting packages installed. This is the googlesearch module that's installed when running pip install google. 

In [None]:
# Fetch steam review URL and the games App ID
from googlesearch import search

query = f"{game} steampowered.com "
steam_url = list()

for j in search(query, tld="com", num=1, stop=1, pause=1):
    print("URL: ",j)
    steam_url.append(j)

try:
  steam_url = steam_url[0].split('app/')[1]
  steam_appId = steam_url.split('/')[0]

  print("App ID: ", steam_appId)

except:
  print("Could not parse the steam ID out of the URL. The game is likely not supported on Steam.")
  steam_appId = None

Now, grab some reviews from steam.
The steam website loads infinitely and does not allow to search through the pages by the url. So we are limited to retrieving 10 hits for now.
To circumvent, we will set five different filters to get the reviews:
1. Top rated reviews of all time
2. Trending reviews today
3. Trending reviews this week
4. Trending reviews this month  
5. Most recent reviews

This will give us a total of 50 reviews to work with.


In [None]:
import requests
from bs4 import BeautifulSoup
import json

def get_steam_reviews(filter, num_reviews):

    url = f'https://steamcommunity.com/app/{steam_appId}/reviews/?p=1&browsefilter={filter}'

    print("URL: ", url)

    reviews = []
    page = 0
    reviews_per_page = 20

    while len(reviews) < num_reviews:
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')

        review_blocks = soup.find_all('div', class_='apphub_Card')

        for block in review_blocks:
            #print("\nReview Block: \n", block)

            # Author
            author_block = block.find('div', class_='apphub_CardContentAuthorName')
            if author_block:
                author = author_block.text.strip()

            # Rating
            rating_block = block.find('div', class_='title')
            if rating_block:
                rating = rating_block.text.strip()

            # Review Content
            content_block = block.find('div', class_='apphub_CardTextContent')
            if content_block:
                content = content_block.text.strip()

            # Review Date
            date_block = content_block.find('div', class_='date_posted')
            if date_block:
                date = date_block.text.replace('Posted:', '').strip()

            # Total Hours Played
            hours_block = block.find('div', class_='hours')
            if hours_block:
                hours_played = hours_block.text.strip()


            reviews.append({'author': author, 'content': content, 'rating': rating, 'date': date, 'hours_played' : hours_played})


            if len(reviews) >= num_reviews:
                break

        page += 1

    return reviews

topRated_reviews = get_steam_reviews('toprated', 10)
trendWeek_reviews = get_steam_reviews('trendweek', 10)
trendMonth_reviews = get_steam_reviews('trendmonth', 10)
trendDay_reviews = get_steam_reviews('trendday', 10)
mostRecent_reviews = get_steam_reviews('mostrecent', 10)


Concatenate all the reviews in one single list:

In [None]:
all_reviews = topRated_reviews + trendWeek_reviews + trendMonth_reviews+ trendDay_reviews+ mostRecent_reviews

Write the reviews into a .csv file so you can parse it with the Code Interpreter Extension.

In [None]:
import csv

filename = 'reviews.csv'

with open(filename, 'w', newline='') as csvfile:
    # Determine field names (header row)
    fieldnames = all_reviews[0].keys()

    # Create a DictWriter
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    # Write the header
    writer.writeheader()

    # Write the data rows
    writer.writerows(all_reviews)

Get the reviews in a pandas dataframe, so you can take a look into its content and inspect the reviews.

In [None]:
import pandas as pd

df = pd.read_csv('reviews.csv')
df.head(10)

### Let Code Interpreter do its Magic

Write a helper function to collect all of the assets created by a Vertex Extension. This will help later when generating the PDF Report.

In [None]:
output_list = []

def is_string(value):
    return isinstance(value, str)

def grab_outs(response):
  if is_string(response):
    output_list.append(response)
  else:
    for dict in response['output_files']:
      if "code_execution" in dict["name"]: continue
      output_list.append(dict["name"])

You can call the Vertex Code Interpreter Extension to generate plots and graphs on your dataset. However, you can also ask the Extension to take a look at the dataset for you and generate a few ideas for insightful visualizations. The following cell prompts the Code Interpreter Extension to save some plot ideas in the ideas.txt file:

In [None]:
response = run_code_interpreter(instructions=f"""
You are given a dataset of reviews. I want you to come up with some ideas for relevant visualization for this dataset.
Create natural language **instructions** and save them into the file ideas.txt
Please put your ideas as natural language **instructions** into the file ideas.txt
Do not generate any plots yourself.
""", filenames= ['reviews.csv'])
process_response(response)

That looks interesting! You could go ahead and parse these ideas automatically by another Code Interpreter call. We will see an optional cell below on how to do that. But for now, we want to reformulate things a bit, so let's go ahead and plot some of the ideas above:

In [None]:
response = run_code_interpreter(instructions=f"""
    You are given a dataset of reviews. Create a pie chart showing the following:
    - how many ratings have 'recommended' vs 'not recommended'?
    Save the plot with a descriptive name.
""", filenames= ['reviews.csv'])
process_response(response)


In [None]:
# Grab the output if it looks good.
grab_outs(response)

Easy peasy. But what if we want to generate a more complex plot with the Extension? You can try that with the next cell:

In [None]:
response = run_code_interpreter(instructions=f"""
    You are given a dataset of reviews. The hours_played column contains information on the total hours played, in the format '3,650.6 hrs on record' or '219.6 hrs on record'.
    Avoid and handle conversion errors, e.g. 'could not convert string to float: '3,650.6''.
    Make a plot that shows the relationship between hours played and the count of the ratings 'Not Recommended'.
    Put the hours_played into the different buckets 0-50, 50-100, 100-1000, >1000.
    Save the plot with a descriptive name.

    Make sure Plots have visible numbers or percentages when applicable, and labels.
    Make sure to avoid and handle the error 'Expected value of kwarg 'errors' to be one of ['raise', 'ignore']. Supplied value is 'coerce' '.
    Use >>> import warnings
    warnings.simplefilter(action='ignore', category=FutureWarning) <<< to avoid any FutureWarnings from pandas.

    """, filenames= ['reviews.csv'])
process_response(response)


In [None]:
# Grab the output if it looks good.
grab_outs(response)

####Optional: Plotting ideas.txt

**OPTIONAL**: You can also parse the plot ideas that Code Interpreter created back and prompt it to generate the plots described in ideas.txt .
Code Interpreter may generate some weird plots at this step - this is usually because the instructions are not clearly defined.

💡**Tip**: it works better if you grab the instructions form the ideas.txt file and put them along in the prompt, instead of passing the file over in the filenames.


In [None]:
with open('ideas.txt', 'r', encoding='utf-8') as file:
    ideas = file.read()

response = run_code_interpreter(instructions=f"""
    Create and save the following plots.
    Make sure each plot is in  its own file and do not overlay multiple plots so for every plot reset the process.
    Save the plot with a descriptive name.
    Make sure Plots have visible numebers or percentages, when applicable, and labels.
    Do not use the library 'wordcloud', it's not available. Skip an idea if it uses wordcloud.
    Make sure to avoid 'Rectangle.set() got an unexpected keyword argument 'kind''
    Make sure to surpress any user warnings.
    **If any of the following produces an exception make sure you catch and print it, and continue to the next item in the list**:
    {str(ideas)}
""", filenames= ['reviews.csv'])
process_response(response)

## Step 3: Use Vertex Search Extension to do a Qualitative Analysis of the Reviews

### Set Up Qualitative Review Dataset

Grab some more detailed reviews of the game for qualitative analysis. For this, you can use google search to get urls of the top 10 results for the game's reviews.

In [None]:
from googlesearch import search

# Search
query = f"{game} Reviews"
urls = list()

for j in search(query, tld="com", num=10, stop=10, pause=2):
    print(j)
    urls.append(j)

We want vertex search to summarize the contents for us and to answer our questions. To do this, we could manually grab the above URLs and set up a data store for websites in the Google Cloud Console.

But, we want to ensure cleaner results. For this reason, first fetch the text contents from the websites, then store the .txt files in your Google Cloud Storage Bucket.

The following cell lets you grab the contents from the websites and write them into .txt files. Then, these files will be uploaded to your GCS bucket.

In [None]:
import requests
import os
from bs4 import BeautifulSoup
from google.cloud import storage

def url_txt_to_gcs(id, url, filename, bucket_name):

    headers = {'User-Agent': 'Mozilla/5.0'}
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')

    # Extract all text content
    all_text = soup.get_text(separator='\n', strip=True)

    # Save to .txt file
    with open(filename, "w", encoding='utf-8') as file:
        file.write(id +"\n"+ all_text)

    # Upload
    client = storage.Client()
    bucket = client.get_bucket(bucket_name)
    blob = bucket.blob(filename)

    # Assuming the file is in the root of your Colab temp directory
    local_temp_path = os.path.join(filename)
    blob.upload_from_filename(local_temp_path)

    print(f"File uploaded to gs://{bucket_name}/{filename}")


# Upload the website content .txt files into GCS
txt_files = []

for idx, url in enumerate(urls):
  id = "doc-"+str(idx)
  filename = f"website_text_{idx}.txt"
  txt_files.append(f"website_text_{idx}.txt")
  url_txt_to_gcs(id, url, filename, GCS_BUCKET)

### Create a Search Data Store and Ingest your Files

The Vertex Search Extension needs a Data Store to run. The following cells will help you in the setup.

In [None]:
# @markdown Specify an id for your datastore. It should only use lowercase letters.
data_store_id = "gamereview-extensions" # @param {type:"string"}

Use the following bash command to **create** your Data Store:

In [None]:
%%bash -s "$PROJECT_ID" "$data_store_id"

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: $1" \
"https://discoveryengine.googleapis.com/v1alpha/projects/$1/locations/global/collections/default_collection/dataStores?dataStoreId=$2" \
-d '{
  "displayName": "GameReview-Extensions-Store",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
  "contentConfig": "CONTENT_REQUIRED",
}'

🎉 Your Data Store is all set! You can inspect it under: https://console.cloud.google.com/gen-app-builder/data-stores

Now you just need to **ingest** your .txt files with the website contents into it by running the cell below.

**This process can take somewhere between 5-10 mins**

In [None]:
from typing import Optional

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

def import_documents_sample(
    project_id: str,
    location: str,
    data_store_id: str,
    gcs_uri: Optional[str] = None,
    bigquery_dataset: Optional[str] = None,
    bigquery_table: Optional[str] = None,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DocumentServiceClient(client_options=client_options)

    # The full resource name of the search engine branch.
    # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
    parent = client.branch_path(
        project=project_id,
        location=location,
        data_store=data_store_id,
        branch="default_branch",
    )

    request = discoveryengine.ImportDocumentsRequest(
        parent=parent,
        gcs_source=discoveryengine.GcsSource(
            input_uris=[gcs_uri], data_schema="content"
        ),
        # Options: `FULL`, `INCREMENTAL`
        reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
    )


    # Make the request
    operation = client.import_documents(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # Once the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name


gcs_uri = "gs://extensions-demo-bucket/*.txt"
import_documents_sample(PROJECT_ID, 'global', data_store_id, gcs_uri)

### Connect Data Store to a Vertex Search Engine

The following cell let's you create a Vertex Search Engine on top of your newly created Data Store. For the Vertex Search Extension to work, you will need to enable Enterprise features by setting `"searchTier": "SEARCH_TIER_ENTERPRISE" `and Advanced LLM Features by setting `"searchAddOns": ["SEARCH_ADD_ON_LLM"]`.

⏰ **Once you run this cell, you will need to wait an additional 5-10 minutes so the Vertex Search Engine is set up properly. Else your Vertex Search Extension will not be able to identify that the Enterprise features are correctly enabled.**





In [None]:
%%bash -s "$PROJECT_ID" "$data_store_id"

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: $1" \
"https://discoveryengine.googleapis.com/v1/projects/$1/locations/global/collections/default_collection/engines?engineId=$2" \
-d '{
  "displayName": "game-review-engine",
  "dataStoreIds": ["'$2'"],
  "solutionType": "SOLUTION_TYPE_SEARCH",
  "searchEngineConfig": {
     "searchTier": "SEARCH_TIER_ENTERPRISE",
     "searchAddOns": ["SEARCH_ADD_ON_LLM"]
   }
}'

### Set up the Vertex Search Extension

Your Data Store is all set. Now you just need to create an instance of the Vertex Search Extension by running the cell below.

But before, please grant the [Vertex AI Extension Service agent](https://cloud.google.com/vertex-ai/docs/general/access-control#service-agents) the [permission needed](https://cloud.google.com/vertex-ai/docs/general/access-control#home-project). In this case, you need permissions to run discovery engine.

In [None]:
# Construct an object that points to the relevant data store
DATASTORE = f"projects/{PROJECT_ID}/locations/global/collections/default_collection/dataStores/{data_store_id}/servingConfigs/default_search"

# Instantiate extension
extension_vertex_ai_search = extensions.Extension.from_hub(
    "vertex_ai_search",
    runtime_config={
        "vertex_ai_search_runtime_config": {
            "serving_config_name": DATASTORE,
        }
    })

extension_vertex_ai_search

The following is a helper function. We can let the Vertex Search Engine generate an answer for our prompt directly. However, for a more descriptive response, we get retrieve the segment matches provided by the Vertex Search Engine and let Gemini generate an answer over it.

In [None]:
from vertexai.preview.generative_models import GenerativeModel, Part
import vertexai.preview.generative_models as generative_models
model = GenerativeModel("gemini-1.0-pro-001")

# Helper function
def get_vertexSearch_response(QUERY, mode):
  vertex_ai_search_response = extension_vertex_ai_search.execute(
    operation_id = "search",
    operation_params = {"query": QUERY},
  )

  # Let Vertex Search Extension generate a response
  if mode == 'vertex':
    list_extractive_answers = []
    for i in vertex_ai_search_response:
      list_extractive_answers.append(i["extractive_answers"][0])
      return list_extractive_answers


  # Let Gemini generate a response over the Vertex Search Extension segments
  else:
    list_extractive_segments = []

    for i in vertex_ai_search_response:
      list_extractive_segments.append(i["extractive_segments"][0])

    prompt = f"""
    Prompt: {QUERY};
    Contents: {str(list_extractive_segments)}
    """

    res = model.generate_content(
        prompt,
        generation_config={
            "max_output_tokens": 2048,
            "temperature": 0.1,
            "top_p": 1
        },
        safety_settings={
              generative_models.HarmCategory.HARM_CATEGORY_HATE_SPEECH: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
              generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
              generative_models.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
              generative_models.HarmCategory.HARM_CATEGORY_HARASSMENT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
        },
        stream=False,
      )

    return res.text

### Use Vertex Search Extension to answer Questions and retrieve Summaries

Now you can run Vertex Search Extension. The cell below demonstrates an output of Vertex Search Engine without Gemini:

In [None]:
QUERY = f"What are some negative review points for {game}?" # @param {type:"string"}

search_res = get_vertexSearch_response(QUERY, mode='vertex')

search_res

The output below highlights the differences between the pure Vertex Search Extension output, and the hybrid response generated with Gemini:

In [None]:
QUERY = f"List 10 positive review points for {game}" # @param {type:"string"}

response = get_vertexSearch_response(QUERY, mode='gemini')

print(response)

# Grab the output for report generation
grab_outs(response)

Looks good. Collect more information from the website contents by giving the extension some more prompts:

In [None]:
QUERY = f"List 10 negative review points for {game}" # @param {type:"string"}

response = get_vertexSearch_response(QUERY, mode='gemini')

response

# Grab the output for report generation
grab_outs(response)

In [None]:
QUERY = f"Provide a summary description of the game {game}" # @param {type:"string"}

response = get_vertexSearch_response(QUERY, mode='gemini')

response

# Grab the output for report generation
grab_outs(response)

## Step 4: Populate your Results in a PDF Report

Now it's time to put everything together. We have collected the generated responses (both images and texts) from Vertex Code Interpreter and Search Extensions.



In [None]:
output_list

Upload the images to GCS to get a public URL.

In [None]:
imgs_files = []
txt_outs = []

for element in output_list:
  if ".png" in element or ".jpg" in element or ".jpeg" in element:
    # Get image filenames
    imgs_files.append(element)
  else:
    # Get text outputs
    txt_outs.append(element)

In [None]:
from google.cloud import storage

def upload_to_gcs(local_file, bucket_name, blob_name):
    client = storage.Client()
    bucket = client.get_bucket(bucket_name)
    blob = bucket.blob(blob_name)
    blob.upload_from_filename(local_file)

def get_public_url(bucket_name, blob_name):
    client = storage.Client()
    bucket = client.get_bucket(bucket_name)
    blob = bucket.blob(blob_name)
    return blob.public_url

# Upload to GCS
bucket_name = GCS_BUCKET

gcs_img_files = []

for image in imgs_files:
  # Upload Image
  upload_to_gcs(image, bucket_name, image)

  # Get Image public URL
  public_image_url = get_public_url(bucket_name, image)
  gcs_img_files.append(public_image_url)
  print(public_image_url)

### Generate the Report with Vertex Code Interpreter Extension

With the collected text outputs and the public URLs of the images, you can ask Code Interpreter Extension to generate a compelling PDF Report. For this, let it generate a .html file first - you can convert it to PDF in the next cells.

In [None]:
response = run_code_interpreter(instructions=f"""
    You are a report generator. Given a list of filenames and strings, create an interesting report in html language and save it to report.html.
    The report revolves around reviews for the game {game}.

    Structure the report with proper headings. Don't use 'String' as a heading.
    Write the whole report in natural language. You are allowed to use bullet points.
    Start the report with a summary of the game {game}
    Embed the images directly in the html and include image descriptions.

    The contents you can use are these, including images (the filenames indicate the image content):
    {gcs_img_files}

    And string contents:
    {txt_outs}
    """)
process_response(response)


Convert the html to a .pdf file:

In [None]:
from weasyprint import HTML, CSS

with open('report.html', 'r') as file:
    html_content = file.read()

pdf = HTML(string=html_content).write_pdf()

# Optional: Save the PDF file
with open('report.pdf', 'wb') as file:
    file.write(pdf)

Now, push your new PDF Report to your Google Drive Storage. The following cells will set up a new folder for your asset, and push the report in it.

In [None]:
# @markdown Provide the folder name on Google Drive where the PDF should be saved into:

folder_name = 'extensions-demo-assets' # @param {type:"string"}

In [None]:
import os
from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload

def create_folder(folder_name):
    drive_service = build('drive', 'v3', credentials=creds)  # Assuming credentials are set up

    file_metadata = {
        'name': folder_name,
        'mimeType': 'application/vnd.google-apps.folder'
    }
    folder = drive_service.files().create(body=file_metadata, fields='id').execute()
    return folder.get('id')

def upload_file(file_path, folder_id):
    drive_service = build('drive', 'v3', credentials=creds)

    file_metadata = {
        'name': os.path.basename(file_path),
        'parents': [folder_id]
    }

    # Determine MIME type based on file extension
    extension = os.path.splitext(file_path)[1].lower()
    if extension in ['.jpg', '.jpeg', '.png']:
        mime_type = 'image/jpeg'  # Adjust for other image types if needed
    elif extension == '.pdf':
        mime_type = 'application/pdf'
    else:
        mime_type = 'application/octet-stream'  # Generic fallback

    media = MediaFileUpload(file_path, mimetype=mime_type, resumable=True)
    file = drive_service.files().create(body=file_metadata, media_body=media, fields='id').execute()
    print(f'File uploaded to Drive: {file.get("id")}')

    return file.get("id")



In [None]:
# Create your folder
folder_id = create_folder(folder_name)

Lastly, upload your report.pdf to your new Google Drive Folder:

In [None]:
file_id = upload_file('report.pdf', folder_id)

## [OPTIONAL] Step 5: Send Report via Email

This section shows how you can grab your generated pdf report and send it via GMail to a specified recipient.

For this to work, you need to configure the GMail API and credentials first.
Follow the "Quickstart Guide" for Python: https://developers.google.com/gmail/api/quickstart/python

Steps:
- Enable the Gmail API in your Google Cloud Project
- Set up the OAuth as described in the document; set the scope for your app to allow gmail.send
- In the OAuth settings, set ´https://localhost:8080/´ in **Authorized redirect URI**
- Download the client secret json and rename it to credentials.json
- Upload the json to colab through the file system on the left panel


After that, you can run the following cells below.

Grab the contents of the pdf report:

In [None]:
import os

def read_pdf_file(filename):
    with open(filename, 'rb') as f:
        pdf_data = f.read()
    return pdf_data

pdf_filename = "report.pdf"  # Path to your PDF in Colab
pdf_data = read_pdf_file(pdf_filename)


Parse the pdf contents into a raw message for the e-mail attachment:

In [None]:
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.base import MIMEBase
from email import encoders
import base64

def create_message_with_attachment(sender, to, subject, body, filename, attachment):
    message = MIMEMultipart()
    message['to'] = to
    message['from'] = sender
    message['subject'] = subject

    msg_body = MIMEText(body, 'plain')
    message.attach(msg_body)

    part = MIMEBase('application', 'octet-stream')  # For PDFs
    part.set_payload(attachment)
    encoders.encode_base64(part)
    part.add_header('Content-Disposition', f'attachment; filename={filename}')
    message.attach(part)

    raw_message = base64.urlsafe_b64encode(message.as_bytes()).decode()
    return {'raw': raw_message}


### Setting up e-mail configuration
Provide the recipient and run the next cell to get a API token for accessing GMail.

In [None]:
# Provide the details for constructing your e-mail

recipient = 'msubasioglu@google.com' #@param {type: 'string'}

In [None]:
import os
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from google.oauth2 import credentials

SCOPES = ['https://mail.google.com/', 'https://www.googleapis.com/auth/gmail.send']

creds = None
# Token file typically stores credentials for reuse
token_file = 'token.json'

# Check if authorized credentials exist
if os.path.exists(token_file):
    creds = credentials.from_authorized_user_file(token_file, SCOPES)
# If not, or credentials are invalid, trigger the authorization flow
if not creds or not creds.valid:
  if creds and creds.expired and creds.refresh_token:
      creds.refresh(Request())
  else:
      flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES, redirect_uri='https://localhost:8080/')
      auth_url = flow.authorization_url()[0]  # Get the authorization URL
      print(f"Open this URL to authorize: {auth_url}")
      print("Enter the authorization code:")
      code = input()
      creds = flow.fetch_token(code=code)


# Build the Gmail API service object
service = build('gmail', 'v1', credentials=creds)


### Send the e-mail
Now that you have your API token, you can send the e-mail with the attached pdf report:

In [None]:
from googleapiclient.discovery import build

# Provide the details for constructing your e-mail
subject = f"{game} Review Analysis Report"
body = f"Attached is the Report on the Review Analysis for {game}"

# Construct e-mail
message = create_message_with_attachment('me', recipient,
                                          subject, body,
                                          pdf_filename, pdf_data)

# Send e-mail
service.users().messages().send(userId='me', body=message).execute()
print("Email sent!")


# Cleaning up

Clean up extension resources created in this notebook.

You can run the next cell to get a list of all Vertex Extension Instances in your environment:

In [None]:
extensions.Extension.list()

Remove the extensions instances created in this notebook: 

In [None]:
extension_code_interpreter.delete()
extension_vertex_ai_search.delete() 

Alternatively, you can uncomment the following code block to delete all active extensions in your project, by using the IDs above to clean up:

In [None]:
# clean_ids = []

# for element in extensions.Extension.list():
#   clean_ids.append(str(element).split("extensions/")[1])

# for id in clean_ids:
#   extension = extensions.Extension(id)
#   extension.delete()

Don't forget to delete any created assets if you don't need them, e.g.


*   Files in your Colab Environment
*   PDF Report in your Google Drive folder
*   Your Vertex Search Engine: https://console.cloud.google.com/gen-app-builder/apps
*   Your Data Store: https://console.cloud.google.com/gen-app-builder/data-stores
