# Chatbot with Vertex AI Extensions Code Interpreter


| | |
|----------|-------------|
| Authors   | Divya Veerapandian |
| Reviewer | Kanchana Patolla |
| Last updated | 2024 05 17: Initial release |
| |  : Complete draft |

# Overview

This notebook shows how to use the [Vertex AI Extensions](https://cloud.google.com/vertex-ai/generative-ai/docs/extensions/overview) Google-provided [Code Interpreter Extension](https://cloud.google.com/vertex-ai/generative-ai/docs/extensions/google-extensions.md#code_interpreter_extension) and create a simple Chatbot for the user to be able to upload the CSV and ask questions on top of it

In this notebook you will use Code Interpreter to explore data via Chatbot


**If you're already familiar with Google Cloud and the Vertex AI Extensions Code Interpreter Extension**, you can skip reading between here and the "Create the Data" section, but make sure to run the code cells.

## Vertex AI Extensions

[Vertex AI Extensions](https://cloud.google.com/vertex-ai/generative-ai/docs/extensions/overview) is a platform for creating and managing extensions that connect large language models to external systems via APIs. These external systems can provide LLMs with real-time data and perform data processing actions on their behalf. You can use pre-built or third-party extensions in Vertex AI Extensions.

## Vertex AI Extensions Code Interpreter Extension

The [Code Interpreter](https://cloud.google.com/vertex-ai/generative-ai/docs/extensions/google-extensions.md#code_interpreter_extension) extension provides access to a Python interpreter with a sandboxed, secure execution environment that can be used with any model in the Vertex AI Model Garden. This extension can generate and execute code in response to a user query or workflow. It allows the user or LLM agent to perform various tasks such as data analysis and visualization on new or existing data files.

You can use the Code Interpreter extension to:

* Generate and execute code.
* Perform a wide variety of mathematical calculations.
* Sort, filter, select the top results, and otherwise analyze data (including data acquired from other tools and APIs).
* Create visualizations, plot charts, draw graphs, shapes, print results, etc.

## Google Cloud Project Setup

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.
1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).
1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

## Google Cloud Permissions
Make sure you have been [granted the following roles](https://cloud.google.com/iam/docs/granting-changing-revoking-access) for the GCP project you'll access from this notebook:
* [`roles/aiplatform.user`](https://cloud.google.com/vertex-ai/docs/general/access-control#aiplatform.user)

## Install the Google Cloud Vertex AI Python SDK

Install the Google Cloud Vertex AI Python SDK, and if you already have the Google Cloud Vertex AI Python SDK installed, upgrade to the latest version.

In [1]:
!pip install google-cloud-aiplatform --upgrade

Collecting google-cloud-aiplatform
  Using cached google_cloud_aiplatform-1.51.0-py2.py3-none-any.whl.metadata (30 kB)
Using cached google_cloud_aiplatform-1.51.0-py2.py3-none-any.whl (5.0 MB)
Installing collected packages: google-cloud-aiplatform
  Attempting uninstall: google-cloud-aiplatform
    Found existing installation: google-cloud-aiplatform 1.50.0
    Uninstalling google-cloud-aiplatform-1.50.0:
      Successfully uninstalled google-cloud-aiplatform-1.50.0
Successfully installed google-cloud-aiplatform-1.51.0


### Restart runtime

You may need to restart your notebook runtime to use the Vertex AI SDK. You can do this by running the cell below, which restarts the current kernel.

You may see the restart reported as a crash, but it is working as-intended -- you are merely restarting the runtime.

The restart might take a minute or longer. After its restarted, continue to the next step.

In [2]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>


If you're using Colab, as long the notebook runtime isn't deleted (even if it restarts) you don't need to re-run the previous cell.

If you're running this notebook in your own environment you shouldn't need to run the above pip cell again unless you delete your IPython kernel.

## Authenticate

If you're using Colab, run the code in the next cell. Follow the popups and authenticate with an account that has access to your Google Cloud [project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#identifying_projects).

If you're running this notebook somewhere besides Colab, make sure your environment has the right Google Cloud access. If that's a new concept to you, consider looking into [Application Default Credentials for your local environment](https://cloud.google.com/docs/authentication/provide-credentials-adc#local-dev) and [initializing the Google Cloud CLI](https://cloud.google.com/docs/authentication/gcloud). More authentication options are discussed [here](https://cloud.google.com/docs/authentication).

In [1]:
# Colab authentication.
import sys

if "google.colab" in sys.modules:
    from google.colab import auth
    auth.authenticate_user()
    print('Authenticated')

# Initialize the Google Cloud Vertex AI Python SDK

Start here if your Notebook kernel restarts (but isn't deleted), though if it's been a few hours you may need to run the Authentication steps above again.

To initialize the SDK, you need to set your Google Cloud project ID and region.

If you don't know your project  ID, try the [Google Cloud CLI](https://cloud.google.com/sdk) commands [`gcloud config list`](https://cloud.google.com/sdk/gcloud/reference/config/list) or [`gcloud projects list`](https://cloud.google.com/sdk/gcloud/reference/projects/list). See the support page [Locate the project ID](https://support.google.com/googleapi/answer/7014113) for more information.


### Set Your Project ID



In [2]:
PROJECT_ID = ""  # @param {type:"string"}

### Set the Region

You can also change the `REGION` variable used by Vertex AI. Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [3]:
REGION = "us-central1"  # @param {type: "string"}

### Import the Vertex AI Python SDK

In [4]:
import vertexai
from vertexai.preview import extensions

vertexai.init(
    project=PROJECT_ID,
    location=REGION
)

In [5]:
import json
import requests
import base64
import gradio as gr
import pandas as pd
import base64
import io
import pprint
from PIL import Image
import os
import time
from io import StringIO
from IPython.display import display

# Setup and Test the Code Interpreter Extension

Code Interpreter is provided by Google, so you can load it directly.

In [6]:
extension_code_interpreter = extensions.Extension.from_hub("code_interpreter")
extension_code_interpreter

Creating Extension
Create Extension backing LRO: projects/1050570861770/locations/us-central1/extensions/8199541993441853440/operations/6237291454465572864
Extension created. Resource name: projects/1050570861770/locations/us-central1/extensions/8199541993441853440
To use this Extension in another session:
extension = vertexai.preview.extensions.Extension('projects/1050570861770/locations/us-central1/extensions/8199541993441853440')


<vertexai.extensions._extensions.Extension object at 0x7f1e4982df00> 
resource name: projects/1050570861770/locations/us-central1/extensions/8199541993441853440

Confirm your Code Interpreter extension is registered:Yes

In [7]:
print("Name:", extension_code_interpreter.gca_resource.name)
print("Display Name:", extension_code_interpreter.gca_resource.display_name)
print("Description:", extension_code_interpreter.gca_resource.description)

Name: projects/1050570861770/locations/us-central1/extensions/8199541993441853440
Display Name: Code Interpreter
Description: This extension generates and executes code in the specified language


## Helper Functions for Code Interpreter Response Formatting required for Chatbot

In [24]:
# Helper method to read and encode files.
def upload_file_multiple(file_paths):
    files = []
    for file_path in file_paths:
        with open(file_path, "rb") as file:
            encoded_string = base64.b64encode(file.read()).decode()
            files.append({
                "name": file_path.split("/")[-1], "contents": encoded_string})
    return files

def upload_file(file_path):
    files = []
    with open(file_path, "rb") as file:
        encoded_string = base64.b64encode(file.read()).decode()
        files.append({
                "name": file_path.split("/")[-1], "contents": encoded_string})
    return files

def get_response(query,files):
    response = extension_code_interpreter.execute(
    operation_id = "generate_and_execute",
    operation_params = {"query": QUERY, "files": FILES},
    )
    result,resultdataframe = format_response(response)
    return result,resultdataframe 

def format_response(response):
    result = ""
    result+=f"Generated Code: \n=======================\n"
    result+=response.get("generated_code") + "\n\n"

    error = response.get("execution_error")
    if len(error) > 1:
        result+=f"Code Execution Error: \n=======================\n"
        result+="\"" + error + "\"\n\n"                
    resultdataframe = pd.DataFrame()   
    exe_result = response.get("execution_result")
    if len(exe_result) > 1:
        StringData = StringIO(exe_result)
        resultdataframe = pd.read_csv(StringData, sep ="\t")
    
    return result,resultdataframe

## Test Code Interpreter

To test Code Interpreter, ask it to generate a basic plot from a small dataset.

Note that printing the Code Interpreter response object below is a bit long, due to the base64-encoded image file returned by Code Interpreter--just scroll down a bit.

#### Test using Sample Data 

In [25]:
QUERY = """
Using the data below, construct a bar chart that includes only the height values with different colors for the bars:

tree_heights_prices = {
  \"Pine\": {\"height\": 100, \"price\": 100},
  \"Oak\": {\"height\": 65, \"price\": 135},
  \"Birch\": {\"height\": 45, \"price\": 80},
  \"Redwood\": {\"height\": 200, \"price\": 200},
  \"Fir\": {\"height\": 180, \"price\": 162},
}

Please include the data in the generated code.
"""

response = extension_code_interpreter.execute(
    operation_id = "generate_and_execute",
    operation_params = {"query": QUERY},
)

result,resultdataframe  = format_response(response) 

In [26]:
print(result)

Generated Code: 
```python
import matplotlib.pyplot as plt

# Define the tree heights data
tree_heights_prices = {
  "Pine": {"height": 100, "price": 100},
  "Oak": {"height": 65, "price": 135},
  "Birch": {"height": 45, "price": 80},
  "Redwood": {"height": 200, "price": 200},
  "Fir": {"height": 180, "price": 162},
}

# Extract height values from the dictionary
heights = [tree_data["height"] for tree_data in tree_heights_prices.values()]

# Create a bar chart with different colors for each bar
plt.bar(tree_heights_prices.keys(), heights, color=["red", "green", "blue", "purple", "orange"])

# Set chart title and labels
plt.title("Tree Heights")
plt.xlabel("Tree Species")
plt.ylabel("Height (ft)")

# Display the chart
plt.show()
```




In [27]:
print(resultdataframe)

Empty DataFrame
Columns: []
Index: []


#### Test using Sample CSV 

In [28]:
INPUT_FILES_PATH = ["/home/jupyter/Extenstion/supermarket_sales.csv"]
FILES = upload_file_multiple(INPUT_FILES_PATH)
print("Input Files:", [f["name"] for f in FILES])

QUERY = "From CSV, Can you get me the distinct cities and their max rating where the rating > 9.6"
print("Query:", QUERY)
result,resultdataframe  = get_response(QUERY,FILES) 


Input Files: ['supermarket_sales.csv']
Query: From CSV, Can you get me the distinct cities and their max rating where the rating > 9.6


In [29]:
print(result)

Generated Code: 
```python
import pandas as pd

# Load the CSV data
data = pd.read_csv("supermarket_sales.csv")

# Filter the data to only include rows where the rating is greater than 9.6
filtered_data = data[data["Rating"] > 9.6]

# Get the distinct cities and their maximum rating
distinct_cities = filtered_data.groupby("City")["Rating"].max()

# Print the results
print(distinct_cities)
```




In [30]:
print(resultdataframe)

                           City
0             Mandalay     10.0
1             Naypyitaw    10.0
2             Yangon       10.0
3  Name: Rating, dtype: float64


## Final Demo Chatbot

In [31]:
def print_like_dislike(x: gr.LikeData):
    print(x.index, x.value, x.liked)

def add_message(history, message):
    for x in message["files"]:
        history.append(((x["path"],), None))  
    if message["text"] is not None:
        history.append((message["text"], None))
    return history, gr.MultimodalTextbox(value=None, interactive=False, file_types=["csv"])


def bot(history):
    file_name = history[0][0][0]
    latest_qn = history[-1][0]
    if not file_name:
        history[-1][1] = ""
        
    file_text = upload_file(file_name)
    answer,resultdataframe = get_response(latest_qn, file_text)
    result_text = resultdataframe.to_string(index=False)
    # history += [(f"![]({answer} !{resultdataframe.to_string(index=False)})", None)]
    final_answer = answer + "Result Set \n" + result_text
    history[-1][1] = ""
    history[-1][1] += final_answer
    time.sleep(0.05)
    yield history



with gr.Blocks() as demo:
    gr.Markdown("# Talk to CSV")
    chatbot = gr.Chatbot(
        [],
        elem_id="chatbot",
        bubble_full_width=False
    )

    chat_input = gr.MultimodalTextbox(interactive=True, file_types=["csv"], placeholder="Upload your csv file & Ask away!! ", show_label=False)
    chat_msg = chat_input.submit(add_message, [chatbot, chat_input], [chatbot, chat_input], queue=False).then(
        bot, chatbot, chatbot, api_name="bot_response"
    )
    chat_msg.then(lambda: gr.Textbox(interactive=True), None, [chat_input], queue=False)
    chatbot.like(print_like_dislike, None, None)

demo.queue()
demo.launch(share=True,debug=True)


Running on local URL:  http://127.0.0.1:7860
IMPORTANT: You are using gradio version 4.22.0, however version 4.29.0 is available, please upgrade.
--------
Running on public URL: https://3ca4d5cc70464663dc.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://3ca4d5cc70464663dc.gradio.live


