<div style="display: flex; align-items: left;">
    <a href="https://sites.google.com/corp/google.com/genai-solutions/home?authuser=0">
        <img src="https://storage.googleapis.com/miscfilespublic/Linkedin%20Banner%20%E2%80%93%202.png" style="margin-right">
    </a>
</div>

In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


<h1 align="center">Open Data QnA - Chat with your SQL Database</h1> 

---

This notebook first walks through the Vector Store Setup needed for running the Open Data QnA application. 

Currently supported Source DBs are: 
- PostgreSQL on Google Cloud SQL 
- BigQuery

Furthermore, the following vector stores are supported 
- pgvector on PostgreSQL 
- BigQuery vector


The setup part covers the following steps: 
> 1. Configuration: Intial GCP project, IAM permissions, Environment  and Databases setup including logging on Bigquery for analytics

> 2. Creation of Table, Column and Known Good Query Embeddings in the Vector Store  for Retreival Augmented Generation(RAG)


Afterwards, you will be able to run the Open Data QnA Pipeline to generate SQL queries and answer questions over your data source. 

The pipeline run covers the following steps: 

> 1. Take user question and generate sql in the dialect corresponding to data source

> 2. Execute the sql query and retreive the data

> 3. Generate natural language respose and charts to display

> 4. Clean Up resources



### 📒 Using this interactive notebook

If you have not used this IDE with jupyter notebooks it will ask for installing Python + Jupyter extensions. Please go ahead install them

Click the **run** icons ▶️  of each cell within this notebook.

> 💡 Alternatively, you can run the currently selected cell with `Ctrl + Enter` (or `⌘ + Enter` on a Mac).

> ⚠️ **To avoid any errors**, wait for each section to finish in their order before clicking the next “run” icon.

This sample must be connected to a **Google Cloud project**, but nothing else is needed other than your Google Cloud project.

You can use an existing project. Alternatively, you can create a new Cloud project [with cloud credits for free.](https://cloud.google.com/free/docs/gcp-free-tier)

# 🚧 **0. Prerequisites**

Make sure that Google Cloud CLI is installed before moving to the next cell! You can refer to the link below for guidance

Installation Guide: https://cloud.google.com/sdk/docs/install

##  **0.1. Setup Poetry Environment and Setup GCP Project** 

### 💻 **Install Code Dependencies (Create and setup venv)**
Install the dependencies by runnign the poetry commands below 

Note: Below command runs with default Python Kernel and we will change that to Kernel from venv after this execution below

In [None]:
# Install poetry
! pip uninstall poetry -y
! pip install poetry --quiet

#Run the poetry commands below to set up the environment
!poetry lock #resolve dependecies (also auto create poetry venv if not exists)
!poetry install --quiet #installs dependencies
!poetry env info #Displays the evn just created and the path to it

### 📌 **Important Step: Activate your virtual environment [Run all these on Terminal]**

Once the venv created either in the local directory or in the cache directory. Open the terminal on the same machine where your notebooks are running and start running the below commands.


```
poetry shell #this command should activate your venv and you should see it enters into the venv

##inside the activated venv shell

gcloud auth login
gcloud auth application-default login
gcloud services enable \
    serviceusage.googleapis.com \
    cloudresourcemanager.googleapis.com --project <<Enter Project Id>>
gcloud auth application-default set-quota-project <<Enter Project Id for using resources>>

```
For IDEs adding Juypter Extensions will automatically give you option to change the kernel. If not, manually select the python interpreter in your IDE (The exact is shown in the above cell. Path would look like e.g. /home/admin_/Talk2Data/.venv/bin/python or ~cache/user/Talk2Data/.venv/bin/python)

**Extra Steps if you are running inside Jupyter Lab or Jupyter Environments on Workbench etc**

We need to manually add venv as Kernel in the those instance where you don't have choice to select the path manually like above.

Run the steps above and continue with below

```
##still inside the activated venv shell

pip install jupyter

ipython kernel install --name "openqna-venv" --user 

```

Restart your kernel or close the exsiting notebook and open again, you should now see the "openqna-venv" in the kernel drop down

**What did we do here?**

* Created Application Default Credentials to use for the code
* Added venv to kernel to select for runningt the notebooks (For standalone Jupyter setups like Workbench etc)

### 🔗 **Connect Your Google Cloud Project**
Time to connect your Google Cloud Project to this notebook. 

In [None]:
#@markdown Please fill in the value below with your GCP project ID and then run the cell.
PROJECT_ID = "my_project"

# Quick input validations.
assert PROJECT_ID, "⚠️ Please provide your Google Cloud Project ID"

# Configure gcloud.
!gcloud config set project {PROJECT_ID}
print(f'Project has been set to {PROJECT_ID}')

### 🔐 **Authenticate to Google Cloud**
Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.
You can do this within Google Colab or using the Application Default Credentials in the Google Cloud CLI.


In [2]:
"""Colab Auth""" 
# from google.colab import auth
# auth.authenticate_user()

import google.auth
import os

credentials, project_id = google.auth.default()

os.environ['GOOGLE_CLOUD_QUOTA_PROJECT']=PROJECT_ID
os.environ['GOOGLE_CLOUD_PROJECT']=PROJECT_ID


### ⚙️ **Enable Required API Services in the GCP Project**

In [None]:
#Enable all the required APIs for the Open Data QnA solution

!gcloud services enable \
  cloudapis.googleapis.com \
  iam.googleapis.com \
  run.googleapis.com \
  sqladmin.googleapis.com \
  aiplatform.googleapis.com \
  bigquery.googleapis.com \
  storage.googleapis.com \


# **1. Vector Store Setup** (Run once)
---

This section walks through the Vector Store Setup needed for running the Open Data QnA application. 

It covers the following steps: 
> 1. Configuration: Environment and Databases setup including logging on Bigquery for analytics

> 2. Creation of Table, Column and Known Good Query Embeddings in the Vector Store  for Retreival Augmented Generation(RAG)




## 📈 **1.1 Set Up your Data Source and Vector Store**

This section assumes that a datasource is already set up in your GCP project. If a datasource has not been set up, use the notebooks below to copy a public data set from BigQuery to Cloud SQL or BigQeury on your GCP project


Enabled Data Sources:
* PostgreSQL on Google Cloud SQL (Copy Sample Data: [0_CopyDataToCloudSqlPG.ipynb](0_CopyDataToCloudSqlPG.ipynb))
* BigQuery (Copy Sample Data: [0_CopyDataToBigQuery.ipynb](0_CopyDataToBigQuery.ipynb))

Enabled Vector Stores:
* pgvector on PostgreSQL 
* BigQuery vector


### 🤔 **Choose Data Source and Vector Store**

Fill out the parameters and configuration settings below. 
These are the parameters for connecting to the source databases and setting configurations for the vector store tables to be created. 
Additionally, you can specify whether you have and want to use known-good-queries for the pipeline run and whether you want to enable logging.

**Known good queries:** if you have known working user question <-> SQL query pairs, you can put them into the file `scripts/known_good_sql.csv`. This will be used as a caching layer and for in-context learning: If an exact match of the user question is found in the vector store, the pipeline will skip SQL Generation and output the cached SQL query. If the similarity score is between 90-100%, the known good queries will be used as few-shot examples by the SQL Generator Agent. 

**Logging:** you can enable logging. If enabled, a dataset is created in Big Query in your project, which will store the logging table and save information from the pipeline run in the logging table. This is especially helpful for debugging.

In [3]:
# Data source details
DATA_SOURCE = 'bigquery' # Options: 'bigquery' and 'cloudsql-pg' i.e, PostgreSQL database on Google Cloud SQL

# Please specify what you would like to use as vector store for embeddings
VECTOR_STORE = 'bigquery-vector' # Options: 'bigquery-vector' i.e, Bigquery vector and 'cloudsql-pgvector' i.e, pgvector on PostgreSQL


# If you have chosen 'cloudsql-pg' as DATA_SOURCE; provide information below
PG_REGION = "" #@param {type:"string"}
PG_INSTANCE = ""
PG_DATABASE = ""
PG_USER = ""
PG_PASSWORD = ""
PG_SCHEMA = '' # Name of the dataset that contains all the tables


# If you have chosen 'bigquery' as DATA_SOURCE; provide information below
BQ_REGION = 'us-central1'
BQ_DATASET_NAME = 'imdb'
# you can specify the names of the bq tables you want to query over specifially. If left empty, Open Data QnA will parse through all the tables in the dataset.
BQ_TABLE_LIST = None # either None or a list of table names in format ['reviews', 'ratings']


# Specify if you have example question & known-good-query pairs you want to leverage 
EXAMPLES = False 

# Please specify if you want to enable Logging. Logging will create a BQ table and store logs of the Open Data QnA Pipeline run. 
LOGGING = True 

# If Logging is enabled, specify the name for the log table. You call leave it at the default value. 
BQ_LOG_TABLE_NAME = 'audit_log_table' 

# If Logging is enabled OR you are using bigquery-vector as the data store, a Big Query dataset will be created to hold the tables. 
# You can rename the table below if you wish to do so. 
BQ_OPENDATAQNA_DATASET_NAME = 'opendataqna'

Quick input verifications below:

In [4]:

# Input verification - Source
assert DATA_SOURCE in {'bigquery', 'cloudsql-pg'}, "⚠️ Invalid DATA_SOURCE. Must be 'bigquery' or 'cloudsql-pg'"

# Input verification - Vector Store
assert VECTOR_STORE in {'bigquery-vector', 'cloudsql-pgvector'}, "⚠️ Invalid VECTOR_STORE. Must be 'bigquery-vector' or 'cloudsql-pgvector'"

if LOGGING: 
    assert BQ_LOG_TABLE_NAME, "⚠️ Please provide a name for your log table if you want to use logging"

if DATA_SOURCE == 'bigquery':
    assert BQ_REGION, "⚠️ Please provide the Data Set Region"
    assert BQ_DATASET_NAME, "⚠️ Please provide the name of the dataset on Bigquery"

    DATASET_REGION = BQ_REGION

elif DATA_SOURCE == 'cloudsql-pg':
    assert PG_REGION, "⚠️ Please provide Region of the Cloud SQL Instance"
    assert PG_INSTANCE, "⚠️ Please provide the name of the Cloud SQL Instance"
    assert PG_DATABASE, "⚠️ Please provide the name of the PostgreSQL Database on the Cloud SQL Instance"
    assert PG_USER, "⚠️ Please provide a username for the Cloud SQL Instance"
    assert PG_PASSWORD, "⚠️ Please provide the Password for the PG_USER"

    DATASET_REGION = PG_REGION


### 💾 **Save Configuration to File** 
Save the configurations set in this notebook to  `config.ini`. The parameters from this file are used in notebooks and in various modeules in the repo

In [6]:
import os
import sys
import configparser

module_path = os.path.abspath(os.path.join('..'))
sys.path.append(module_path)

config = configparser.ConfigParser()
config.read(module_path+'/config.ini')

config['GCP']['PROJECT_ID'] = PROJECT_ID
config['CONFIG']['DATA_SOURCE'] = DATA_SOURCE
config['CONFIG']['VECTOR_STORE'] = VECTOR_STORE

# Save the parameters based on Data Source and Vector Store Choices
if DATA_SOURCE == 'cloudsql-pg' or VECTOR_STORE == 'cloudsql-pgvector':
    config['PGCLOUDSQL']['PG_INSTANCE'] = PG_INSTANCE
    config['PGCLOUDSQL']['PG_DATABASE'] = PG_DATABASE
    config['PGCLOUDSQL']['PG_USER'] = PG_USER
    config['PGCLOUDSQL']['PG_PASSWORD'] = PG_PASSWORD
    config['PGCLOUDSQL']['PG_REGION'] = PG_REGION
    config['PGCLOUDSQL']['PG_SCHEMA'] = PG_SCHEMA

if DATA_SOURCE := 'bigquery':
    config['BIGQUERY']['BQ_DATASET_REGION'] = BQ_REGION
    config['BIGQUERY']['BQ_DATASET_NAME'] = BQ_DATASET_NAME
    config['BIGQUERY']['BQ_OPENDATAQNA_DATASET_NAME'] = BQ_OPENDATAQNA_DATASET_NAME
    config['BIGQUERY']['BQ_TABLE_LIST'] = str(BQ_TABLE_LIST)
    config['BIGQUERY']['BQ_LOG_TABLE_NAME'] = BQ_LOG_TABLE_NAME

if LOGGING: 
    config['BIGQUERY']['LOGGING'] = 'yes'
    config['BIGQUERY']['BQ_LOG_TABLE_NAME'] = BQ_LOG_TABLE_NAME

else: 
    config['BIGQUERY']['LOGGING'] = 'no'


with open(module_path+'/config.ini', 'w') as configfile:    # save
    config.write(configfile)

print('All configuration paramaters saved to file!')

All configuration paramaters saved to file!


### ⚙️ **Database Setup for Vector Store: CloudSQL-pgvector**

Create PostgreSQL Instance on CloudSQL if 'cloudsql-pgvector' is chosen as vector store

Note that a PostgreSQL Instance on CloudSQL already exists if 'cloudsql-pg' is the data source. PostgreSQL Instance is created only if a different data store is chosen.

The cell will also create a dataset to store the log table on Big Query, **if** logging is enabled.

In [7]:
#@markdown Feel free to update PostgreSQL or BigQuery parameters.
# If not updated, we will proceed with default values!

# Create PostgreSQL Instance is data source is different from PostgreSQL Instance
if VECTOR_STORE == 'cloudsql-pgvector' and DATA_SOURCE != 'cloudsql-pg':
  # Parameters for PostgreSQL Instance
  PG_REGION = DATASET_REGION
  PG_INSTANCE = "pg15-opendataqna"
  PG_DATABASE = "opendataqna-db"
  PG_USER = "pguser"
  PG_PASSWORD = "pg123"
  PG_SCHEMA = 'pg-vector-store' 


  # check if Cloud SQL instance exists in the provided region
  database_version = !gcloud sql instances describe {PG_INSTANCE} --format="value(databaseVersion)"
  if database_version[0].startswith("POSTGRES"):
    print("Found existing Postgres Cloud SQL Instance!")
  else:
    print("Creating new Cloud SQL instance...")
    !gcloud sql instances create {PG_INSTANCE} --database-version=POSTGRES_15 \
      --region={PG_REGION} --cpu=1 --memory=4GB --root-password={PG_PASSWORD} \
      --database-flags=cloudsql.iam_authentication=On

  # Create a database on the instance and a user with password
  database_exists = !gcloud sql databases list --instance={PG_INSTANCE} | grep -z 'NAME: {PG_DATABASE}'
  if database_exists:
      print("Found existing Postgres Cloud SQL database!")
  else:
      print("Creating new Cloud SQL database...")
      !gcloud sql databases create  {PG_DATABASE} --instance={PG_INSTANCE}
  !gcloud sql users create {PG_USER} \
  --instance={PG_INSTANCE} \
  --password={PG_PASSWORD}




# Create a new data set on Bigquery to use for the logs table
if LOGGING:
  BQ_OPENDATAQNA_DATASET_NAME = "opendataqna" #@param {type:"string"} - name of the dataset in Vector Store

  from google.cloud import bigquery
  import google.api_core 
  client=bigquery.Client(project=PROJECT_ID)
  dataset_ref = f"{PROJECT_ID}.{BQ_OPENDATAQNA_DATASET_NAME}"


  # Create the dataset if it does not exist already
  try:
      client.get_dataset(dataset_ref)
      print("Destination Dataset exists")
  except google.api_core.exceptions.NotFound:
      print("Cannot find the dataset hence creating.......")
      dataset=bigquery.Dataset(dataset_ref)
      dataset.location=DATASET_REGION
      client.create_dataset(dataset)
      print(str(dataset_ref)+" is created")

Destination Dataset exists


### ⚙️  **Database Setup for Vector Store: BigQuery-vector**

Create dataset on Big Query to store the embeddings tables.
If Bigquery is the vector store, the same database is used for logging. 

In [8]:
# Create a new data set on Bigquery to use as Vector store; the same will be used for logging as well
if VECTOR_STORE == 'bigquery-vector':
  BQ_OPENDATAQNA_DATASET_NAME = "opendataqna" #@param {type:"string"} - name of the dataset in Vector Store

  from google.cloud import bigquery
  import google.api_core 
  client=bigquery.Client(project=PROJECT_ID)
  dataset_ref = f"{PROJECT_ID}.{BQ_OPENDATAQNA_DATASET_NAME}"


  # Create the dataset if it does not exist already
  try:
      client.get_dataset(dataset_ref)
      print("Destination Dataset exists")
  except google.api_core.exceptions.NotFound:
      print("Cannot find the dataset hence creating.......")
      dataset=bigquery.Dataset(dataset_ref)
      dataset.location=DATASET_REGION
      client.create_dataset(dataset)
      print(str(dataset_ref)+" is created")

Destination Dataset exists


##  **1.2. Create Embeddings in Vector Store for RAG** 

### 🖋️ **Create Table and Column Embeddings**

In this step, table and column metadata is retreived from the data source and embeddings are generated for both

In [10]:
# Create Table and Column Embeddings
from embeddings.retrieve_embeddings import retrieve_embeddings

if DATA_SOURCE =='bigquery':
    table_schema_embeddings, col_schema_embeddings = retrieve_embeddings(DATA_SOURCE, SCHEMA=BQ_DATASET_NAME, table_names=BQ_TABLE_LIST)
else: 
    table_schema_embeddings, col_schema_embeddings = retrieve_embeddings(DATA_SOURCE, SCHEMA=PG_SCHEMA)

print("Table and Column embeddings are created")


Source Selected is bigquery

LLM generated 8 Table Descriptions

LLM generated 0 Column Descriptions
Table and Column embeddings are created


### 💾 **Save the Table and Column Embeddings in the Vector Store**
The table and column embeddings created in the above step are save to the Vector Store chosen

In [11]:
from embeddings.store_embeddings import store_schema_embeddings

if VECTOR_STORE=='bigquery-vector':
    await(store_schema_embeddings(table_details_embeddings=table_schema_embeddings, 
                                  tablecolumn_details_embeddings=col_schema_embeddings, 
                                  project_id=PROJECT_ID,
                                  instance_name=None,
                                  database_name=None,
                                  schema=BQ_OPENDATAQNA_DATASET_NAME,
                                  database_user=None,
                                  database_password=None,
                                  region=BQ_REGION,
                                  VECTOR_STORE = VECTOR_STORE
                                  ))

elif VECTOR_STORE=='cloudsql-pgvector':
    await(store_schema_embeddings(table_details_embeddings=table_schema_embeddings, 
                                tablecolumn_details_embeddings=col_schema_embeddings, 
                                project_id=PROJECT_ID,
                                instance_name=PG_INSTANCE,
                                database_name=PG_DATABASE,
                                schema=None,
                                database_user=PG_USER,
                                database_password=PG_PASSWORD,
                                region=PG_REGION,
                                VECTOR_STORE = VECTOR_STORE
                                ))

print("Table and Column embeddings are saved to vector store")

Table and Column embeddings are saved to vector store


### 🗄️ **Load Known Good SQL into Vector Store**
Known Good Queries are used to create query cache for Few shot examples. Creating a query cache is highly recommended for best outcomes! 

The following cell will load the Natural Language Question and Known Good SQL pairs into our Vector Store. There pairs are loaded from `known_good_sql.csv` file inside scripts folder. If you have your own Question-SQL examples, curate them in .csv file before running the cell below. 

If no Known Good Queries are available at this time to create query cache, you can use [3_LoadKnownGoodSQL.ipynb](3_LoadKnownGoodSQL.ipynb) to load them later!!" Empty table for KGQ embedding will be created!



#### Format of the Known Good SQL File (known_good_sql.csv)

prompt | sql | database_name [3 columns]

prompt ==> User Question 

sql ==> SQL for the user question (Note that the sql should enclosed in quotes and only in single line. Please remove the line  break)

database_name ==>This name should exactly  match the SCHEMA   NAME for Postgres Source or BQ_DATASET_NAME

In [12]:

if EXAMPLES :
    print("Examples are provided, creating KGQ embeddings and saving them to Vector store.....")

    from embeddings.kgq_embeddings import setup_kgq_table, load_kgq_df, store_kgq_embeddings
    import pandas as pd
    
    # Delete any old tables and create a new table to KGQ embeddings
    if VECTOR_STORE=='bigquery-vector':
        await(setup_kgq_table(project_id=PROJECT_ID,
                            instance_name=None,
                            database_name=None,
                            schema=BQ_OPENDATAQNA_DATASET_NAME,
                            database_user=None,
                            database_password=None,
                            region=BQ_REGION,
                            VECTOR_STORE = VECTOR_STORE
                            ))

    elif VECTOR_STORE=='cloudsql-pgvector':
        await(setup_kgq_table(project_id=PROJECT_ID,
                            instance_name=PG_INSTANCE,
                            database_name=PG_DATABASE,
                            schema=None,
                            database_user=PG_USER,
                            database_password=PG_PASSWORD,
                            region=PG_REGION,
                            VECTOR_STORE = VECTOR_STORE
                            ))


    # Load the contents of the known_good_sql.csv file into a dataframe
    df_kgq = load_kgq_df()



    # Add KGQ to the vector store
    if VECTOR_STORE=='bigquery-vector':
        await(store_kgq_embeddings(df_kgq,
                                   project_id=PROJECT_ID,
                                    instance_name=None,
                                    database_name=None,
                                    schema=BQ_OPENDATAQNA_DATASET_NAME,
                                    database_user=None,
                                    database_password=None,
                                    region=BQ_REGION,
                                    VECTOR_STORE = VECTOR_STORE
                                    ))

    elif VECTOR_STORE=='cloudsql-pgvector':
        await(store_kgq_embeddings(df_kgq,
                                   project_id=PROJECT_ID,
                                    instance_name=PG_INSTANCE,
                                    database_name=PG_DATABASE,
                                    schema=None,
                                    database_user=PG_USER,
                                    database_password=PG_PASSWORD,
                                    region=PG_REGION,
                                    VECTOR_STORE = VECTOR_STORE
                                    ))
    print('Done!!')

else:
    print("⚠️ WARNING: No Known Good Queries are provided to create query cache for Few shot examples!")
    print("Creating a query cache is highly recommended for best outcomes")
    print("If no Known Good Queries for the dataset are availabe at this time, you can use 3_LoadKnownGoodSQL.ipynb to load them later!!")


Creating a query cache is highly recommended for best outcomes
If no Known Good Queries for the dataset are availabe at this time, you can use 3_LoadKnownGoodSQL.ipynb to load them later!!


### 🥁 If all the above steps are executed suucessfully, the following should be set up:

* GCP project and all the required IAM permissions

* Environment to run the solution

* Data source and Vector store for the solution

__________________________________________________________________________________________________________________

# **2. Run the Open Data QnA Pipeline**

###  ❓ **Ask your Natural Language Question**

In [13]:
print("\033[1mData Source:- "+ DATA_SOURCE)
print("Vector Store:- "+ VECTOR_STORE)
    
# Suggested question for 'fda_food' dataset: "What are the top 5 cities with highest recalls?"
# Suggested question for 'google_dei' dataset: "How many asian men were part of the leadership workforce in 2021?"

# user_question = input(prompt_for_question) #Uncomment if you want to ask question yourself
user_question = 'How many movies have a rating higher than four?' # Or Enter Question here

print("Asked Question:- "+user_question)

[1mData Source:- bigquery
Vector Store:- bigquery-vector
Asked Question:- How many movies have a rating higher than four?


### 🏃 **Run the Pipeline**

In [16]:
from opendataqna import run_pipeline
import asyncio 

final_sql, response, _resp = await(run_pipeline(user_question,
                                                    VECTOR_STORE=VECTOR_STORE,
                                                    DATA_SOURCE=DATA_SOURCE,  
                                                    embedder='vertex',
                                                    RUN_DEBUGGER=True,
                                                    EXECUTE_FINAL_SQL=True,
                                                    DEBUGGING_ROUNDS = 2, 
                                                    LLM_VALIDATION=True,
                                                    Embedder_model='vertex',
                                                    SQLBuilder_model= 'gemini-1.5-pro',
                                                    SQLChecker_model= 'gemini-1.5-pro-001',
                                                    SQLDebugger_model= 'gemini-1.5-pro-001',
                                                    Responder_model= 'gemini-1.5-pro-001',
                                                    num_table_matches = 5,
                                                    num_column_matches = 10,
                                                    table_similarity_threshold = 0.1,
                                                    column_similarity_threshold = 0.1, 
                                                    example_similarity_threshold = 0.1, 
                                                    num_sql_matches=3))


Loading Agents.


Setting Data Source and Vector Store from config.ini content.
Source selected is : bigquery
Schema or Dataset Name is : imdb
Vector Store selected is : bigquery-vector
Found 5 similarity matches for table.
Found 10 similarity matches for column.


 AUDIT_TEXT: 
 
User Question : How many movies have a rating higher than four?
User Database : imdb

Get Table and Column Schema: 
Retrieved Similar Known Good Queries, Table Schema and Column Schema: 

Retrieved Tables: 

            Full Table Name : msubasioglu-main.imdb.title_ratings |
            Table Columns List: [tconst, average_rating, num_votes] |
            Table Description: - **tconst**: Alphanumeric identifier for the title.
- **average_rating**: Weighted average of all the user ratings.
- **num_votes**: Number of votes the title has received. 

            Full Table Name : msubasioglu-main.imdb.reviews |
            Table Columns List: [review, split, label, movie_id, reviewer_rating, movie_url, title] |
            Table D

### 📊 **Create Charts for the results** (Run only when you have proper results in the above cells)
Agent provides two suggestive google charts to display on a UI with element IDs chart_div and chart_div_1

In [36]:
from agents import VisualizeAgent
Visualize = VisualizeAgent ()

chart_js=''
chart_js = Visualize.generate_charts(user_question,final_sql,response) #sending 
# print(chart_js["chart_div_1"])

Charts Suggested : ['Bar Chart', 'Table Chart']


In [40]:
from IPython.display import HTML

html_code = f'''
<script type="text/javascript" src="https://www.gstatic.com/charts/loader.js"></script>
<script type="text/javascript">
{chart_js["chart_div"]}
</script>
<div id="chart_div"></div>
'''

HTML(html_code)


In [41]:
html_code = f'''
<script type="text/javascript" src="https://www.gstatic.com/charts/loader.js"></script>
<script type="text/javascript">
{chart_js["chart_div_1"]}
</script>
<div id="chart_div_1"></div>
'''

HTML(html_code)

## 🗑 **Clean Up Notebook Resources**
Make sure to delete your Cloud SQL instance and BigQuery Datasets when your are finished with this notebook to avoid further costs. 💸 💰

Uncomment and run the cell below to delete 

In [None]:
# # delete Cloud SQL instance
# !gcloud sql instances delete {PG_INSTANCE} -q

# #delete BigQuery Dataset using bq utility
# !bq rm -r -f -d {BQ_DATASET_NAME}

# #delete BigQuery 'Open Data QnA' Vector Store Dataset using bq utility

# !bq rm -r -f -d {BQ_OPENDATAQNA_DATASET_NAME}

