# **Section 1: Preprocessing**
+ In this section, we deal with all preprocessing steps required for the rest of this notebook, including importing libraries, installing necessary packages, initializing client module for Google BigQuery etc.
+ Our main tools for this project are `pandas` and `bigquery` from `google.cloud`.

In [1]:
# Install google-cloud-bigquery-storage for running BigQuery SQL without error
# Add -q to suppress verbose for the sake of readability 
!pip uninstall -q -y bigframes
!pip install -q google-cloud-bigquery-storage

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m293.6/293.6 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
[?25h

In [2]:
# Import all libraries required for this project
import pandas as pd

from google.cloud import bigquery
from datetime import datetime, timedelta

In [3]:
# Initialize BigQuery client with Google Cloud's project id
project_id = "analog-delight-470708-d0"
client = bigquery.Client(project=project_id)

In [None]:
# Data Modelling
create_index = False
if create_index:
    query = """alter table analog-delight-470708-d0.steam.steam_game_list
    add column app_id integer;
    
    update analog-delight-470708-d0.steam.steam_game_list
    set app_id = cast(`App ID` as integer)
    where true;
    
    ALTER TABLE analog-delight-470708-d0.steam.steam_game_list
    add primary key (app_id) not enforced;
    """
    client.query(query)

---

# **Section 2: Create Vector Indices for Tables**

## *The master table of Steam game*
+ The dataset `steam.steam_game_list`

In [30]:
table_steam_game_list = "steam.steam_game_list"
embedding_steam = "steam.llm_steam"
first_creation = False

query = f"""
UPDATE `{project_id}.{table_steam_game_list}` as t
SET t.embeddings = e.ml_generate_embedding_result
FROM (
    SELECT DISTINCT
        ml_generate_embedding_result,
        content
    FROM ML.GENERATE_EMBEDDING(
        MODEL `{project_id}.{embedding_steam}`,
        (SELECT CONCAT(IFNULL(`Short Description`, ''), ' ', IFNULL(Tags, ''), ' ', IFNULL(Categories, '')) as content
          FROM `{project_id}.{table_steam_game_list}`
        )
    )
) e
WHERE CONCAT(IFNULL(`Short Description`, ''), ' ', IFNULL(Tags, ''), ' ', IFNULL(Categories, '') = e.content
"""
if first_creation:
    result = client.query(query)

QueryJob<project=analog-delight-470708-d0, location=asia-east2, id=8f0785c4-db4f-4aae-a6c3-714d7a0df66b>

In [12]:
result.to_dataframe()

# **Section 3: Usecases of Google BigQuery AI in Product Positioning**

## Usecase 1 - Search a list of similar Steam games given a user query on game characteristics

In [33]:
user_input = "What are first-person horror games without zombies that ?"
number_of_games = 10

query = f"""
SELECT *

FROM VECTOR_SEARCH(

   (SELECT * from `{project_id}.{table_steam_game_list}`

   -- You can pre-filter your query here, eg. for rows of specific users

   -- WHERE some-clause

   ),

   'embeddings',

   (SELECT ml_generate_embedding_result, content AS query

     FROM ML.GENERATE_EMBEDDING(

         MODEL `{project_id}.{embedding_steam}`,

         (SELECT '{user_input}' AS content))

   ),

   top_k => {number_of_games},

   distance_type => 'COSINE')
"""
query2 = f"""
SELECT array_length(ml_generate_embedding_result), content AS query
    FROM ML.GENERATE_EMBEDDING(
         MODEL `{project_id}.{embedding_steam}`,
         (SELECT '{user_input}' AS content)
    )
"""
# df = client.query(query2).to_dataframe()
# print(df)

   f0_                                              query
0  768  What are first-person horror games without zom...
