In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

If you are using data you didn't synthesize, put the license or an appropriate link here. E.g.:

See [Google Cloud Marketplace](https://console.cloud.google.com/marketplace/product/city-of-new-york/nyc-311) for terms of use of the dataset featured in this notebook.

# Creating a Data Scientist Agent using the Vertex AI Agent API


<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/vertex_ai_agent_api/notebooks/data_scientist_vertex_ai_agent_api.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fmain%2Fgenai-on-vertex-ai%2Fvertex_ai_agent_api%2Fnotebooks%2Fdata_scientist_vertex_ai_agent_api.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>    
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/applied-ai-engineering-samples/main/genai-on-vertex-ai/vertex_ai_agent_api/notebooks/data_scientist_vertex_ai_agent_api.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/vertex_ai_agent_api/notebooks/data_scientist_vertex_ai_agent_api.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

| | |
|----------|-------------|
| Author(s)   | Christos Aniftos |
| Reviewer(s) | Meltem Subasioglu |
| Last updated | 2024 04 10: Initial Publication |
| |  |

# Overview

This notebook shows how to use the [Vertex AI Agent API](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-api/overview) with [Code Interpreter Extension](https://cloud.google.com/vertex-ai/generative-ai/docs/extensions/code-interpreter) to ingest and analyse data from a CSV file.

A data scientist agent with a code interpreter is a powerful tool that simplifies data analysis. Regardless if you are a data analyst or a data scientist, this agent accelerates velocity of data exploration in a conversational format. It's like having a personal data expert assistant who understands your questions and by following your guidance, figures out the best way to analyse your data, performs the analysis, and presents the results in an easy-to-understand way.  This efficient, accurate, and accessible approach uncovers hidden insights, saves you time and effort, and empowers you to make informed decisions based on clear, actionable results.


In this notebook you will do the following:
- Ingest a CSV data file
- Understant the dataset using the agent to explain the data and plot charts
- Manipulate the data such us removing columns and replacing missing values
- Ask generic statistic questions and explanations
- Export your final amended dataset

In addition to Vertex AI Agent API, this notebook uses [Code Interpreter extension](https://cloud.google.com/vertex-ai/generative-ai/docs/extensions/code-interpreter). In the last step of this notebook a playground UI based on gradio is provided in order to interrogate your own data using a chat UI.

If you're new to Google Cloud, Vertex AI, or the Vertex AI Agent API you may want to look at the [Getting Started with Vertex AI Agent notebook](https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/vertex_ai_agent_api/notebooks/getting_started_vertex_agent_api.ipynb.ipynb), which contains an overview of how to use Vertex AI Agent through a basic agent example, and troubleshooting tips.


 ## Vertex AI Agent API

[Vertex AI Agent API](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-api/overview) is an API for creating and managing Generative AI systems called "agents" that can  reason, plan, and act to perform specific tasks.

 Vertex AI Agent API offers faster time to market than building agents from scratch while still being flexible and customizable. It handles orchestraction and state management, gives you the benefits of Google's expertise in building reliable AI systems, scales in a secure and responsible way, and seamlessly integrates with other Vertex AI and Google Cloud products.

## Using This Notebook

Colab is recommended for running this notebook, but it can run in any iPython environment where you can connect to Google Cloud, install pip packages, etc.

If you're running outside of Colab and encountering issues, the [Getting Started with Vertex AI Agent notebook](https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/vertex_ai_agent_api/notebooks/getting_started_vertex_agent_api.ipynb.ipynb) has some troubleshooting tips.

This tutorial uses the following Google Cloud services and resources:

* [Vertex AI Agent API](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-api/overview)
* [Code Interpreter Extension](https://cloud.google.com/vertex-ai/generative-ai/docs/extensions/code-interpreter)

This notebook has been tested in the following environment:

* Python version = 3.10.12 UPDATE
* [google-cloud-aiplatform](https://pypi.org/project/google-cloud-aiplatform/) version = 1.5.5


## Useful Tips

1. This notebook uses Generative AI cababilities. Re-running a cell that uses Generative AI capabilities may produce similar but not identical results.
2. Because of #1, it is possible that an output produces errors. If that happens re-run the cell that produced the error. The re-run will likely be bug free.
3. The use of Generative AI capabilities is subject to service quotas. Running the notebook using "Run All" may exceed your queries per minute (QPM) limitations. Run the notebook manually and if you get a quota error pause for up to 1 minute before retrying that cell. The Vertex AI Agent API defaults to Gemini on the backend and is subject to the Gemini quotas, [view your Gemini quotas here](https://console.cloud.google.com/iam-admin/quotas?pageState=(%22allQuotasTable%22:(%22f%22:%22%255B%257B_22k_22_3A_22_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22base_model_5C_22_22%257D_2C%257B_22k_22_3A_22_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22gemini_5C_22_22%257D%255D%22%29%29&e=13802955&mods=logs_tg_staging).


# Setup

## Enable APIs and Set Permissions

Enable the [Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com)

Make sure you have been [granted the roles](https://cloud.google.com/iam/docs/granting-changing-revoking-access) for the GCP project you'll access from this notebook:
* [`roles/aiplatform.user`](https://cloud.google.com/vertex-ai/docs/general/access-control#aiplatform.user)



## Install the Google Cloud Vertex AI Python SDK



In [None]:
#TODO: Replace with the public vertex sdk lib when it is released
# Remove AUTH from here as with public lib is not needed

# Colab authentication.
import sys

if "google.colab" in sys.modules:
    from google.colab import auth
    auth.authenticate_user()
    print('Authenticated')

!gsutil cp gs://vertex_agents_private_releases/vertex_agents/google_cloud_aiplatform-1.60.dev20240710+vertex.agents-py2.py3-none-any.whl .
!pip install --quiet --user --upgrade --force-reinstall -q google_cloud_aiplatform-1.60.dev20240710+vertex.agents-py2.py3-none-any.whl --no-warn-conflicts
!pip install --quiet --user -U "pandas==2.2.2"
!pip install --quiet --user -U "numpy<2"
!pip install --quiet --user gradio
!pip install --quiet --user pydub

### Restart Runtime

You may need to restart your notebook runtime to use the Vertex AI SDK. You can do this by running the cell below, which restarts the current kernel.

You may see the restart reported as a crash, but it is working as-intended -- you are merely restarting the runtime.

The restart might take a minute or longer. After its restarted, continue to the next step.

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>


## Authenticate


In [None]:
# Colab authentication.
import sys

if "google.colab" in sys.modules:
    from google.colab import auth
    auth.authenticate_user()
    print("Authenticated")

# Initialize the Google Cloud Vertex AI Python SDK



### Set Your Project ID



In [None]:
PROJECT_ID = "YOUR_PROJECT_ID_HERE"  # @param {type:"string"}

### Set the Region

In [None]:
REGION = "us-central1"  # @param {type: "string"}

### Import and Initialize the Vertex AI Python SDK

In [None]:
# TODO: Remove API_ENDPOINT to use default for public prev.

import vertexai
API_ENDPOINT = 'us-central1-autopush-aiplatform.sandbox.googleapis.com'

vertexai.init(project=PROJECT_ID, location=REGION, api_endpoint=API_ENDPOINT)


# Setup the Code Interpreter Extension
Code Interpreter is provided by Google, so you can load it directly. For more information on how you can use the code interpreter extension refer to the official [Code Interpreter Extension documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/extensions/code-interpreter)

In [None]:
from vertexai.preview import extensions
extension_code_interpreter = extensions.Extension.from_hub("code_interpreter")
extension_code_interpreter

Confirm your Code Interpreter extension is registered:

In [None]:
print("Name:", extension_code_interpreter.gca_resource.name)
print("Display Name:", extension_code_interpreter.gca_resource.display_name)
print("Description:", extension_code_interpreter.gca_resource.description)

# Setup the Data Science Agent with Code Interpreter
In this section we will create a data science Vertex AI Agent with a code interpreter extension. Agents are able to use Tools to perform specific actions. The data science agent will be able to use the code interpreter extension as a tool to write and execute python code, necessary for performing data operations.

In [None]:
from google.cloud.aiplatform.private_preview.vertex_agents.app import App, Session
from google.cloud.aiplatform.private_preview.vertex_agents.agent import Agent

from google.cloud import aiplatform

DISPLAY_NAME = "Data Science Agent"
INSTRUCTIONS = """
You are a senior data scientist! Please help users analyse data, generate graphs and provide explanations on why you are performing specific actions.
For complicated data operations only respond based on the response of the tools you have available. For basic knowledge you can use your own embedded knowledge


Instructions:
- Use code interpreter to do operations on the data like analysis on manipulation of the data that requires coding
"""

app = App.create_single_agent_app( agent_display_name=DISPLAY_NAME,
                                   instructions=INSTRUCTIONS,
                                   extensions={'Code Interpreter':extension_code_interpreter})

# Vertex AI Agent Helper Functions

These functions are optional when using Vertex AI agents but make it easier to visualise the responses from Vertex AI Agents

## `response_to_html`

`response_to_html` processes the response from the agent into an HTML format. The response might contain images and text. HTML allows easier visualisation of such a response.



**To use this functionality** call `response_to_html(response)`, where `response` is the Agents `response` object.


In [None]:
from proto import Message as pmsg
import base64

def response_to_html(result):
  images = []
  text = []
  response = pmsg.to_dict(result.content,
              use_integers_for_enums=False,
              including_default_value_fields=False,
              preserving_proto_field_name=True
  )
  html = ""
  html_images = ""
  html_links = ""
  csv_counter = 1
  for p in response.get('parts'):
    if p.get("inline_data"):
      mime_type = p.get("inline_data").get('mime_type', None)
      if mime_type == "image/png":
        c = p.get("inline_data").get('data', None)
        html_images +=  ('<img src="data:image/png;base64, '
                                f'{c}" />')
      if mime_type == "text/csv":
        c = p.get("inline_data").get('data', None)
        if c is not None:
          temp_file_name = f"file_{csv_counter}.csv"
          html_links += f'<li><a href="data:text/csv;base64,{c}">{temp_file_name}</a></li>'

    if 'text' in p:
      text.append(p['text'])



  if html_images != "":
    html += '<h3>Images generated: </h3><pre>'+html_images+'</pre>'
  if html_links != "":
    html += '<h3>CSV files generated: </h3><pre><ul>'+html_links+'</ul></pre>'
  html += "<h3>Results: </h3><pre>"+"\n".join(text)+"</pre>"


  return html

## `ds_agent`

`ds_agent` is used to forward your query to get results from the agent app and returns the results object. Optionally and HTML version of the results is printed in the function. The default operation is to print the HTML inline


**To use this functionality** call `ds_agent(query: str, session, encoded_data_list: list[bytes] = None, showHTML=True)` where:
* The `query` contains the ask to the agent.
* The `session` object holds the current stage of the conversation with the agent. This is helpful so that the agent maintains the conversation history and all data injected up to this point.
* The `encoded_data_list` is a list of encoded data that are submitted to the agent. This is how you can provide a CSV data file to the agent at the beginning of your interaction.
* `showHTML` if it is set to true it prints the html version of the result in line.

In [None]:
# import matplotlib.pyplot as plt
# from PIL import Image
# from vertexai.generative_models import Content, Part

# import mimetypes
import IPython
from google.cloud.aiplatform.private_preview.vertex_agents.gapic import types

def ds_agent(query: str, session, encoded_data_list: list[bytes] = None, showHTML=True) -> None:

  content = [types.Part(text=query)]

  if encoded_data_list is not None:
    for ed in encoded_data_list:
      content.append(types.Part(
              inline_data=types.Blob(
                  data=ed['data'], mime_type=ed['mimetype']
              )))
  result = session.run(content=content)


  html = response_to_html(result)
  if (showHTML):
    display(
        IPython.display.HTML(
          html
            )
    )

  return {'row': result, 'html': html}

## `gradio_agent`

`gradio_agent` is used by gradio as your playground app (at the end of this notebook)to send requests to the agent and get responses.
The response is in HTML so that gradio can visualise the images and text


In [None]:
import gradio as gr
import mimetypes
def gradio_agent(message, history):
    global session
    if session is None or message['text']=="session:reset":
      session = app.start_session()
      return "<p>New session Started. Please upload a CSV with the data you want to analyse with your first input </p><hr />"
    ecoded_data = []
    for f in message.get('files', None):
      with open(f, "rb") as attachment:
        ecoded_data.append({'data': base64.b64encode(attachment.read()), 'mimetype': mimetypes.guess_type(f)[0]})
    return ds_agent(query=message['text'], session=session, encoded_data_list=ecoded_data, showHTML=False).get('html', '')

# Create the Data
The following code writes a local CSV file of synthetic data. This is a simple dataset of students containing attributes about sleeping and eating habits along with academic performance. This dataset is fictional and does not represent reality. It is only used to demonstrate Code Interpreter capabilities.


In [None]:
# @title
%%writefile students.csv
StudentID,Gender,ExtraActivitiesGroup,EatingHabits,SleepingHabits,Reading,Writing,Maths
1,Male,nan,Healthy,Satisfactory,75,80,78
2,Female,Group B,Mixed,Non-Satisfactory,nan,70,67
3,nan,Group A,Unhealthy,Satisfactory,55,60,58
4,Female,Group C,Healthy,Non-Satisfactory,70,75,73
5,Male,Group B,Mixed,Satisfactory,60,65,63
6,Female,Group A,Unhealthy,Non-Satisfactory,50,55,53
7,Male,Group C,Healthy,Satisfactory,80,85,83
8,Female,Group B,Mixed,Non-Satisfactory,65,70,67
9,Male,Group A,Unhealthy,Satisfactory,55,60,58
10,Male,nan,Mixed,Non-Satisfactory,80,78,85
11,Female,Group B,Unhealthy,Satisfactory,65,68,70
12,Female,Group A,Healthy,Non-Satisfactory,52,57,55
13,nan,Group C,Unhealthy,Satisfactory,78,75,79
14,Female,Group B,Mixed,Non-Satisfactory,63,70,65
15,Male,Group A,Healthy,Satisfactory,82,87,80
16,Male,Group C,Unhealthy,Non-Satisfactory,57,60,54
17,Female,Group A,Mixed,Satisfactory,67,65,63
18,Male,Group B,Unhealthy,Non-Satisfactory,55,62,58
19,nan,Group C,Healthy,Satisfactory,88,85,87
20,Female,Group B,Mixed,Non-Satisfactory,67,75,68
21,Male,Group A,Unhealthy,Satisfactory,53,58,55
22,Female,Group C,Healthy,Non-Satisfactory,80,77,82
23,Male,Group A,Mixed,Satisfactory,60,63,60
24,Female,Group B,Unhealthy,Non-Satisfactory,65,62,60
25,Male,Group C,Healthy,Satisfactory,90,92,88
26,Female,Group B,Mixed,Non-Satisfactory,58,65,60
27,Male,Group A,Unhealthy,Satisfactory,67,60,65
28,Male,Group C,Healthy,Non-Satisfactory,72,78,73
29,Female,Group A,Mixed,Satisfactory,55,62,58
30,Male,Group B,Unhealthy,Non-Satisfactory,78,75,72
31,Female,Group C,Healthy,Satisfactory,85,87,83
32,Female,Group A,Mixed,Non-Satisfactory,70,65,67
33,Male,Group B,Unhealthy,Satisfactory,62,67,65
34,Male,Group C,Healthy,Non-Satisfactory,77,83,75
35,nan,Group A,Mixed,Satisfactory,65,63,60
36,Female,Group B,Unhealthy,Non-Satisfactory,72,78,70
37,Male,Group C,Healthy,Satisfactory,80,87,83
38,Female,Group A,Mixed,Non-Satisfactory,75,70,72
39,Male,Group B,Unhealthy,Satisfactory,65,67,60
40,nan,Group C,Healthy,Non-Satisfactory,82,88,80
41,Female,Group A,Mixed,Satisfactory,77,72,70
42,Male,Group B,Unhealthy,Non-Satisfactory,67,62,63
43,Male,Group C,Healthy,Satisfactory,92,90,88
44,Female,Group A,Mixed,Non-Satisfactory,80,75,77
45,nan,Group B,Unhealthy,Satisfactory,72,75,73
46,Female,Group C,Healthy,Non-Satisfactory,83,80,85
47,Male,Group A,Mixed,Satisfactory,75,72,73
48,Male,Group B,Unhealthy,Non-Satisfactory,60,63,58
49,nan,Group C,Healthy,Satisfactory,90,92,88
50,Female,Group A,Mixed,Non-Satisfactory,85,80,82
51,Male,Group B,Unhealthy,Satisfactory,70,67,65
52,Female,Group C,Healthy,Non-Satisfactory,78,83,77
53,Male,Group B,Mixed,Satisfactory,65,63,62
54,Male,Group A,Unhealthy,Non-Satisfactory,52,57,55
55,nan,Group C,Healthy,Satisfactory,75,78,73
56,Female,Group B,Mixed,Non-Satisfactory,70,77,72
57,Male,Group A,Unhealthy,Satisfactory,62,65,63
58,Female,Group C,Healthy,Non-Satisfactory,88,85,83
59,Male,Group B,Mixed,Satisfactory,78,80,77
60,nan,Group A,Unhealthy,Non-Satisfactory,67,60,65
61,Female,Group C,Healthy,Satisfactory,83,80,82
62,Male,Group B,Mixed,Non-Satisfactory,72,68,70
63,Male,Group A,Unhealthy,Satisfactory,62,57,60
64,Female,Group C,Healthy,Non-Satisfactory,90,87,88
65,Male,Group B,Mixed,Satisfactory,85,82,80
66,nan,Group A,Unhealthy,Non-Satisfactory,55,62,58
67,Female,Group C,Healthy,Satisfactory,77,85,80
68,Male,Group B,Mixed,Non-Satisfactory,65,72,67
69,Male,Group A,Unhealthy,Satisfactory,67,60,68
70,Female,Group C,Healthy,Non-Satisfactory,92,90,85
71,Male,Group B,Mixed,Satisfactory,77,85,82
72,nan,Group A,Unhealthy,Non-Satisfactory,62,55,60
73,Female,Group C,Healthy,Satisfactory,83,87,85
74,Male,Group B,Mixed,Non-Satisfactory,68,72,65
75,Male,Group A,Unhealthy,Satisfactory,53,58,55
76,nan,Group C,Healthy,Non-Satisfactory,88,83,87
77,Female,Group B,Mixed,Satisfactory,72,70,73
78,Male,Group A,Unhealthy,Non-Satisfactory,70,65,67
79,Male,Group C,Healthy,Satisfactory,80,85,80
80,Female,Group B,Mixed,Non-Satisfactory,75,72,75
81,nan,Group A,Unhealthy,Satisfactory,55,60,58
82,Female,Group C,Healthy,Non-Satisfactory,80,77,82
83,Male,Group B,Mixed,Satisfactory,68,70,68
84,Male,Group A,Unhealthy,Non-Satisfactory,62,57,63
85,Female,Group C,Healthy,Satisfactory,90,92,88
86,nan,Group B,Mixed,Non-Satisfactory,67,72,67
87,Female,Group A,Unhealthy,Satisfactory,53,60,58
88,Male,Group C,Healthy,Non-Satisfactory,75,78,73
89,Male,Group B,Mixed,Satisfactory,82,80,83
90,nan,Group A,Unhealthy,Non-Satisfactory,65,62,63
91,Female,Group C,Healthy,Satisfactory,80,83,80
92,Male,Group B,Mixed,Non-Satisfactory,85,80,82
93,Male,Group A,Unhealthy,Satisfactory,62,67,65
94,nan,Group C,Healthy,Non-Satisfactory,90,87,92
95,Female,Group B,Mixed,Satisfactory,77,75,78
96,Female,Group A,Unhealthy,Non-Satisfactory,67,60,68
97,nan,Group C,Healthy,Satisfactory,77,83,78
98,Male,Group B,Mixed,Non-Satisfactory,62,68,65
99,Male,Group A,Unhealthy,Satisfactory,52,57,58
100,Female,Group C,Healthy,Non-Satisfactory,72,75,77
101,Male,Group B,Mixed,Satisfactory,70,67,72
102,nan,Group A,Unhealthy,Non-Satisfactory,67,62,65
103,Female,Group C,Healthy,Satisfactory,83,87,85
104,Male,Group B,Mixed,Non-Satisfactory,80,77,82
105,Male,Group A,Unhealthy,Satisfactory,55,62,53
106,Female,Group C,Healthy,Non-Satisfactory,92,90,88
107,nan,Group B,Mixed,Satisfactory,78,83,78
108,Female,Group A,Unhealthy,Non-Satisfactory,72,65,70
109,Male,Group C,Healthy,Satisfactory,83,80,85
110,Female,Group B,Mixed,Non-Satisfactory,68,72,63
111,Male,Group A,Unhealthy,Satisfactory,60,63,63
112,nan,Group C,Healthy,Non-Satisfactory,72,78,73
113,Female,Group B,Mixed,Satisfactory,80,83,83
114,Male,Group A,Unhealthy,Non-Satisfactory,70,65,67
115,Female,Group C,Healthy,Satisfactory,90,87,92
116,Male,Group B,Mixed,Non-Satisfactory,85,82,80
117,Male,Group A,Unhealthy,Satisfactory,52,57,55
118,Female,Group C,Healthy,Non-Satisfactory,77,85,80
119,nan,Group B,Mixed,Satisfactory,68,70,68
120,Female,Group A,Unhealthy,Non-Satisfactory,53,60,58
121,Male,Group C,Healthy,Satisfactory,75,80,77
122,Female,Group B,Mixed,Non-Satisfactory,67,72,67
123,Male,Group B,Unhealthy,Satisfactory,70,67,72
124,Female,Group A,Mixed,Non-Satisfactory,62,57,60
125,nan,Group C,Healthy,Satisfactory,80,83,80
126,Male,Group B,Mixed,Non-Satisfactory,62,68,60
127,Male,Group A,Unhealthy,Satisfactory,55,60,58
128,Female,Group C,Healthy,Non-Satisfactory,92,90,85
129,Male,Group B,Mixed,Satisfactory,85,82,80
130,Female,Group A,Unhealthy,Non-Satisfactory,75,70,72
131,nan,Group C,Healthy,Satisfactory,77,83,78
132,Male,Group B,Mixed,Non-Satisfactory,80,77,82
133,Male,Group A,Unhealthy,Satisfactory,62,67,60
134,Female,Group C,Healthy,Non-Satisfactory,90,87,92
135,Male,Group B,Mixed,Satisfactory,78,83,78
136,Female,Group A,Unhealthy,Non-Satisfactory,55,62,58
137,Male,Group C,Healthy,Satisfactory,80,83,80
138,Male,Group B,Mixed,Non-Satisfactory,67,70,63
139,nan,Group A,Unhealthy,Satisfactory,65,62,65
140,Female,Group C,Healthy,Non-Satisfactory,88,83,87
141,Female,Group B,Mixed,Satisfactory,70,77,70
142,Male,Group A,Unhealthy,Non-Satisfactory,52,57,55
143,Male,Group C,Healthy,Satisfactory,85,80,82
144,Male,Group B,Mixed,Non-Satisfactory,82,80,83
145,nan,Group A,Unhealthy,Satisfactory,60,63,63
146,Female,Group C,Healthy,Non-Satisfactory,90,87,92
147,Female,Group B,Mixed,Satisfactory,75,72,77
148,Male,Group A,Unhealthy,Non-Satisfactory,57,60,54
149,nan,Group C,Healthy,Satisfactory,80,85,82
150,Female,Group B,Mixed,Non-Satisfactory,80,75,83
151,Male,Group A,Unhealthy,Satisfactory,78,75,79
152,Male,Group C,Healthy,Non-Satisfactory,92,90,88
153,nan,Group B,Mixed,Satisfactory,65,63,62
154,Female,Group A,Unhealthy,Non-Satisfactory,53,58,55
155,Male,Group C,Healthy,Satisfactory,83,87,82
156,Female,Group B,Mixed,Non-Satisfactory,85,80,83
157,Male,Group A,Unhealthy,Satisfactory,70,67,72
158,Male,Group C,Healthy,Non-Satisfactory,90,87,92
159,Female,Group B,Mixed,Satisfactory,68,70,68
160,Female,Group A,Unhealthy,Non-Satisfactory,67,60,70
161,nan,Group C,Healthy,Satisfactory,90,92,88
162,Male,Group B,Mixed,Non-Satisfactory,85,82,80
163,Male,Group A,Unhealthy,Satisfactory,65,62,65
164,Female,Group C,Healthy,Non-Satisfactory,83,87,85
165,nan,Group B,Mixed,Satisfactory,78,83,78
166,Female,Group A,Unhealthy,Non-Satisfactory,55,62,58
167,Male,Group C,Healthy,Satisfactory,80,83,80
168,Female,Group B,Mixed,Non-Satisfactory,67,70,63
169,Male,Group A,Unhealthy,Satisfactory,52,57,55
170,nan,Group C,Healthy,Non-Satisfactory,82,88,80
171,Male,Group B,Mixed,Satisfactory,80,83,83
172,Female,Group A,Unhealthy,Non-Satisfactory,75,70,72
173,Male,Group B,Healthy,Satisfactory,90,87,88
174,Male,Group B,Mixed,Non-Satisfactory,62,68,65
175,nan,Group A,Unhealthy,Satisfactory,62,57,63
176,Female,Group C,Healthy,Non-Satisfactory,77,85,80
177,Male,Group B,Mixed,Satisfactory,68,70,68
178,Male,Group A,Unhealthy,Non-Satisfactory,53,60,58
179,Female,Group C,Healthy,Satisfactory,90,87,92
180,Male,Group B,Mixed,Non-Satisfactory,70,67,75
181,nan,Group A,Unhealthy,Satisfactory,65,62,65
182,Female,Group C,Healthy,Non-Satisfactory,83,87,85
183,nan,Group A,Mixed,Satisfactory,75,78,77
184,Female,Group A,Unhealthy,Non-Satisfactory,55,62,58
185,Male,Group C,Healthy,Satisfactory,80,83,80
186,Male,Group A,Mixed,Non-Satisfactory,85,82,80
187,Male,Group A,Unhealthy,Satisfactory,78,75,79
188,nan,Group C,Healthy,Non-Satisfactory,80,85,83
189,Female,Group B,Mixed,Satisfactory,70,77,70
190,Male,Group A,Unhealthy,Non-Satisfactory,57,60,54
191,nan,Group C,Healthy,Satisfactory,92,90,85
192,Female,Group B,Mixed,Non-Satisfactory,80,75,83
193,Male,Group A,Unhealthy,Satisfactory,53,58,55
194,nan,Group C,Healthy,Non-Satisfactory,75,78,77
195,Female,Group B,Mixed,Satisfactory,65,63,62
196,Female,Group A,Unhealthy,Non-Satisfactory,67,60,70
197,Male,Group A,Healthy,Satisfactory,85,80,87
198,Male,Group B,Mixed,Non-Satisfactory,85,82,80
199,Male,Group A,Unhealthy,Satisfactory,72,65,70
200,nan,Group C,Healthy,Non-Satisfactory,90,87,92
201,Female,Group B,Mixed,Satisfactory,68,70,68
202,Female,Group A,Unhealthy,Non-Satisfactory,62,57,63
203,nan,Group A,Healthy,Satisfactory,82,88,80
204,Female,Group B,Mixed,Non-Satisfactory,80,77,82
205,Male,Group A,Unhealthy,Satisfactory,67,60,68
206,Male,Group A,Healthy,Non-Satisfactory,90,87,92
207,Female,Group B,Mixed,Satisfactory,78,83,78
208,Female,Group A,Unhealthy,Non-Satisfactory,72,65,70
209,nan,Group C,Healthy,Satisfactory,77,83,78
210,Male,Group B,Mixed,Non-Satisfactory,62,68,65
211,Male,Group A,Unhealthy,Satisfactory,53,58,55
212,Male,Group A,Healthy,Non-Satisfactory,92,90,85
213,Female,Group B,Mixed,Satisfactory,68,70,68
214,Female,Group A,Unhealthy,Non-Satisfactory,75,70,72
215,nan,Group B,Healthy,Satisfactory,77,83,78
216,Female,Group B,Mixed,Non-Satisfactory,67,70,63
217,Male,Group A,Unhealthy,Satisfactory,52,57,55
218,nan,Group C,Healthy,Non-Satisfactory,90,87,92
219,Female,Group B,Mixed,Satisfactory,85,82,80
220,Female,Group A,Unhealthy,Non-Satisfactory,55,62,58
221,Male,Group A,Healthy,Satisfactory,80,83,80
222,Male,Group B,Mixed,Non-Satisfactory,60,63,63
223,Male,Group A,Unhealthy,Satisfactory,78,75,79
224,Female,Group C,Healthy,Non-Satisfactory,75,78,77
225,nan,Group B,Mixed,Satisfactory,70,67,72
226,Male,Group A,Unhealthy,Non-Satisfactory,70,65,67
227,nan,Group C,Healthy,Satisfactory,90,92,88
228,Female,Group B,Mixed,Non-Satisfactory,85,82,80
229,Male,Group A,Unhealthy,Satisfactory,65,62,65
230,Female,Group C,Healthy,Non-Satisfactory,83,87,85
231,nan,Group B,Mixed,Satisfactory,75,78,77
232,Female,Group A,Unhealthy,Non-Satisfactory,55,62,58
233,Male,Group C,Healthy,Satisfactory,80,83,80
234,Male,Group B,Mixed,Non-Satisfactory,85,82,80
235,Male,Group A,Unhealthy,Satisfactory,78,75,79
236,Female,Group C,Healthy,Non-Satisfactory,83,87,85
237,nan,Group A,Mixed,Satisfactory,80,83,83
238,Female,Group B,Mixed,Non-Satisfactory,75,70,77
239,Male,Group A,Unhealthy,Non-Satisfactory,62,57,63
240,nan,Group C,Healthy,Non-Satisfactory,82,88,80
241,Female,Group B,Mixed,Satisfactory,80,77,82
242,Male,Group A,Unhealthy,Satisfactory,60,63,63
243,Female,Group C,Healthy,Non-Satisfactory,90,87,92
244,Male,Group B,Mixed,Non-Satisfactory,82,80,83
245,nan,Group C,Healthy,Satisfactory,77,83,78
246,Male,Group B,Mixed,Non-Satisfactory,72,68,70
247,Female,Group A,Unhealthy,Satisfactory,65,62,65
248,Male,Group C,Healthy,Non-Satisfactory,80,85,83
249,Female,Group A,Mixed,Non-Satisfactory,70,65,67
250,nan,Group C,Healthy,Non-Satisfactory,83,80,85
251,Female,Group B,Mixed,Satisfactory,68,70,68
252,Female,Group A,Unhealthy,Non-Satisfactory,62,57,63
253,Male,Group C,Healthy,Satisfactory,92,90,88
254,Female,Group B,Mixed,Non-Satisfactory,80,75,83
255,nan,Group C,Healthy,Satisfactory,90,92,88
256,Female,Group B,Mixed,Satisfactory,70,77,70
257,Male,Group A,Unhealthy,Non-Satisfactory,52,57,55
258,nan,Group C,Healthy,Non-Satisfactory,75,78,77
259,Female,Group B,Mixed,Non-Satisfactory,80,77,82
260,Male,Group A,Unhealthy,Satisfactory,55,62,58
261,nan,Group C,Healthy,Satisfactory,82,88,80
262,Female,Group B,Mixed,Non-Satisfactory,72,65,70
263,Male,Group A,Unhealthy,Non-Satisfactory,65,62,65
264,Female,Group C,Healthy,Non-Satisfactory,90,87,92
265,Male,Group B,Mixed,Satisfactory,77,85,82
266,Female,Group A,Unhealthy,Non-Satisfactory,55,62,58
267,nan,Group C,Healthy,Satisfactory,83,80,85
268,Female,Group B,Mixed,Non-Satisfactory,85,82,80
269,Male,Group A,Unhealthy,Satisfactory,62,57,63
270,Female,Group C,Healthy,Non-Satisfactory,77,85,80
271,nan,Group B,Mixed,Satisfactory,70,67,72
272,Male,Group A,Unhealthy,Non-Satisfactory,53,60,58
273,Male,Group C,Healthy,Satisfactory,75,80,77
274,Female,Group B,Mixed,Non-Satisfactory,80,75,83
275,Male,Group A,Unhealthy,Satisfactory,52,57,55
276,nan,Group C,Healthy,Non-Satisfactory,92,90,85
277,Female,Group B,Mixed,Satisfactory,68,72,65
278,Male,Group A,Unhealthy,Non-Satisfactory,70,65,67
279,nan,Group C,Healthy,Satisfactory,80,83,80
280,Female,Group B,Mixed,Non-Satisfactory,75,72,75
281,Male,Group A,Unhealthy,Satisfactory,57,60,54
282,Female,Group C,Healthy,Non-Satisfactory,78,83,77
283,nan,Group B,Mixed,Satisfactory,70,67,72
284,Female,Group A,Unhealthy,Non-Satisfactory,62,57,63
285,Male,Group C,Healthy,Satisfactory,90,87,88
286,Male,Group B,Mixed,Non-Satisfactory,82,80,83
287,nan,Group C,Healthy,Satisfactory,77,83,78
288,Female,Group B,Mixed,Non-Satisfactory,72,70,73
289,Male,Group A,Unhealthy,Satisfactory,65,62,65
290,Female,Group C,Healthy,Non-Satisfactory,90,87,92
291,nan,Group B,Mixed,Satisfactory,70,63,60
292,Female,Group A,Unhealthy,Non-Satisfactory,55,62,58
293,Male,Group C,Healthy,Satisfactory,75,80,77
294,Male,Group B,Mixed,Non-Satisfactory,85,82,80
295,nan,Group A,Mixed,Satisfactory,80,75,77
296,Female,Group C,Healthy,Non-Satisfactory,77,83,78
297,Female,Group B,Mixed,Non-Satisfactory,67,72,67
298,Male,Group A,Unhealthy,Satisfactory,67,60,68
299,Male,Group B,Healthy,Satisfactory,88,85,87
300,Female,Group A,Mixed,Non-Satisfactory,78,75,79
301,Male,Group C,Unhealthy,Satisfactory,75,78,72
302,Female,Group B,Mixed,Non-Satisfactory,72,65,70
303,Male,Group A,Healthy,Non-Satisfactory,85,82,80
304,Female,Group C,Healthy,Non-Satisfactory,77,83,78
305,Male,Group A,Mixed,Non-Satisfactory,72,65,70
306,Female,Group B,Unhealthy,Satisfactory,72,78,70
307,nan,Group A,Healthy,Satisfactory,82,88,80
308,Female,Group C,Mixed,Non-Satisfactory,72,75,77
309,Male,Group B,Mixed,Non-Satisfactory,62,68,65
310,Female,Group A,Unhealthy,Satisfactory,53,60,58
311,nan,Group C,Healthy,Satisfactory,90,92,88
312,Female,Group B,Mixed,Non-Satisfactory,80,77,82
313,Male,Group A,Unhealthy,Non-Satisfactory,67,60,68
314,nan,Group C,Healthy,Satisfactory,77,83,78
315,Female,Group B,Mixed,Satisfactory,75,72,75
316,Male,Group A,Unhealthy,Non-Satisfactory,52,57,55
317,Female,Group C,Healthy,Non-Satisfactory,90,87,92
318,Male,Group B,Mixed,Non-Satisfactory,85,82,80

# Load data in memory
We are transforming the dataset using base64 encoding to submit to the Agent. These are submitted as inline data and the agent is able to use them to answer users requests

In [None]:
encoded_data= []
with open('students.csv', "rb") as attachment:
  encoded_data.append({'data': base64.b64encode(attachment.read()), 'mimetype': mimetypes.guess_type('students.csv')[0]})


# Step 1: Start a new Session
A session represents a conversation history with an agent. When starting a new engagement with an agent requires a new session to not reuse information from previous engagements. Likewise if you want to reset a conversation you need to start a new session


In [None]:
session = app.start_session()

# Step 2: Provide the dataset to the agent
Send a message to the agent with the dataset and information that this is the dataset for analysis

In [None]:
result = ds_agent(query="This is the dataset we will be using today", session=session, encoded_data_list=encoded_data)

# Step 3: Data Exploration
In this section we will be asking the agent to explain and visualise the dataset so we can get a better understanding

In [None]:
result = ds_agent(query="Help me better understand the dataset. Please verbally explain the columns and data distributions", session=session)

Thats great! lest see what else we can ask our agent to do!

In [None]:
query="""
I would like to see a pie chart for gender distribution
"""

result = ds_agent(query=query, session=session)

In [None]:
query="""
Hm, I noticed only 'Male' and 'Female' gender categories. In the dataset some values are Nan for gender.
Can you replace missing values with Unknown on all categorical columns and reproduce the gender pie chart?
"""

result = ds_agent(query=query, session=session)

In [None]:
query="""
Now I want to see charts that show module scores (math, reading and writing) in relation to eating habits, sleeping habits and extra activities group
"""

result = ds_agent(query=query, session=session)

# Step 4: Data manipulation
In this section we will perform some data manipulation actions, replace missing values, remove columns, normalise etc.


In [None]:
query="""
Are there any other missing data in the dataset?
"""

result = ds_agent(query=query, session=session)

In [None]:
query="""
Replace numerical missing values with the mean of that column
"""

result = ds_agent(query=query, session=session)

Okay now we do not have any missing values in the dataset. This dataset is about students. I am wondering if there is any PII information. Let's ask our agent

In [None]:
query="""
Is there any PII info in the dataset
"""

result = ds_agent(query=query, session=session)

The agent is able to provide some potential risks and ideas on mitigation. Let's remove StudentID

In [None]:
query="""
Perfect, remove the student ID column and return the first 10 rows of the
dataset to validate that the column has been removed
"""

result = ds_agent(query=query, session=session)

In [None]:
query="""
Excellent! I am a bit rusty and do not quite recall, what is the difference of standardisation vs normalisation?
"""

result = ds_agent(query=query, session=session)

In [None]:
query="""
Okay, please apply standardisation to module columns. Print the first 5 lines of the data to validate
"""

result = ds_agent(query=query, session=session)

# Step 5: Final observations
In this section we are asking the agent to give us a final view on this dataset and some advice for students. This is just an exercise and we do not suggest that advice provided is the golden rule. Each student is different and should consult with teachers, guidance counsellors, or other trusted adults for personalised advice and support.


In [None]:
query="""
What are your observations of this dataset in terms of habits in comparison to academic performance?
"""

result = ds_agent(query=query, session=session)

In [None]:
query="""
What could be an advice to students?
"""

result = ds_agent(query=query, session=session)

In [None]:
query="""
Please return the final dataset in a csv file
"""

result = ds_agent(query=query, session=session)

# Playground
I hope you enjoyed this tutorial! Here you can interact with a data science agent in a new session. You can upload your own data by attaching them in the initial message and start asking questions. You can download the same student.csv dataset and reuse that for experimentation.


Generating credentials for gradio UI

In [None]:
import secrets


username = "user"
password = secrets.token_urlsafe(8)


print(f"Use the following information to login to the gradio UI: \n\nusername: user \npassword: {password}")

For better experience you can open gradio on a new tab by clicking on the gradio public URL that is generated after you run the following cell

In [None]:
session = app.start_session()
gr.ChatInterface(gradio_agent,
    title="Data Science Agent",
    examples=[{"text":"session:reset"}],
    multimodal=True).launch(debug=True, share=True, auth=(username, password))

# Cleaning Up
In this tutorial we used Vertex AI Agent with the Code Interpreter Vertex AI Extension to analyse a csv dataset.

To delete your app, uncomment and run:

In [None]:
# app.delete(app.app_name)

You also created an Code Interpter extension in this notebook, if you want to delete it uncomment and run:

In [None]:
# extension_code_interpreter.delete()

If you restarted the notebook, you may have additional apps and extensions that need cleaning up.

List all the apps in your project:

In [None]:
App.list_apps()

If you see more apps that you don't want, look for the `app_name` values (they look like `projects/PROJECT_NUMBER/locations/LOCATION/apps/APP_NUMBER`) and use the following line of code to delete them manually:

In [None]:
App.delete(app_name="projects/PROJECT_NUMBER/locations/LOCATION/apps/APP_NUMBER)

You can run the next cell to get a list of all Vertex AI Extensions Instances in your environment:

In [None]:
from vertexai.preview import extensions
extensions.Extension.list()

If you see extensions that you don't want, look for the `resource name` values (they look like `projects/PROJECT_NUMBER/locations/LOCATION/extensions/EXTENSION_NUMBER`) and use the following line of code to delete them manually:

In [None]:
extension = extensions.Extension("projects/PROJECT_NUMBER/locations/LOCATION/extensions/EXTENSION_NUMBER")
extension.delete()

You can run the next cell to get a list of all Vertex AI Extensions Instances in your environment:

In [None]:
extension_code_interpreter.delete()

If you restarted the notebook runtime, you may have some stray registered Extensions. This next line of code shows you all the Extensions registered in your project:

In [None]:
extensions.Extension.list()

You can use the [Google Cloud Console](https://console.cloud.google.com/vertex-ai/extensions) to view and delete any stray registered Extensions.

If you want to delete all the extensions in your project, uncomment and run this code block. **WARNING**: This cannot be undone!

In [None]:
"""
clean_ids = []

for element in extensions.Extension.list():
  clean_ids.append(str(element).split("extensions/")[1])

for id in clean_ids:
  extension = extensions.Extension(id)
  extension.delete()
"""