# **Section 1: Preprocessing**
+ In this section, we deal with all preprocessing steps required for the rest of this notebook, including importing libraries, installing necessary packages, initializing client module for Google BigQuery etc.
+ Our main tools for this project are `pandas` and `bigquery` from `google.cloud`.
+ Google Cloud's `bigframes` library is uninstalled because of version crashing in Kaggle's default environment.

In [1]:
# Install google-cloud-bigquery-storage for running BigQuery SQL without error
# Add -q to suppress verbose for the sake of readability 
!pip uninstall -q -y bigframes
!pip install -q google-cloud-bigquery-storage

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m293.6/293.6 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25h

In [2]:
# Import all libraries required for this project
import pandas as pd

from google.cloud import bigquery
from datetime import datetime, timedelta

## **Define project and dataset ids**
+ To create a database client for BigQuery, a project id is needed: `analog-delight-470708-d0`.
+ We also define dataset and table ids that have been imported from Google Cloud Buckets (GCB) to BigQuery. Please refer to our blog for details of the selected datasets. 

In [8]:
# Initialize BigQuery client with Google Cloud's project id
project_id = "analog-delight-470708-d0"
client = bigquery.Client(project=project_id)

# We also define dataset and table ids
dataset_id = "steam"
game_list_data = "steam_game_list"
review_data = "steam_reviews"

## **Create primary keys for datasets in BigQuery**
+ We convert *App ID* in `steam.steam_game_list` from string to integer as a new column called *app_id*.
+ This step facilitates table joining with steam's review data `steam.steam_reviews` in BigQuery.

In [23]:
# Check whether a column exist in the table schema
def check_column_exists(dataset_id, table_id, name):
    table_ref = client.dataset(dataset_id).table(table_id)
    table_schema = client.get_table(table_ref).schema
    for field in table_schema:
        if field.name == name:
            return True
    return False

In [25]:
# Generate primary index for game_list_data
game_list_data_pk = 'app_id'
exist_app_id = check_column_exists(dataset_id, game_list_data, game_list_data_pk)

if not exist_app_id:
    query = f"""
    alter table {project_id}.{dataset_id}.{game_list_data}
    add column if not exists {game_list_data_pk} integer;
    
    update {project_id}.{dataset_id}.{game_list_data}
    set {game_list_data_pk} = cast(`App ID` as integer)
    where true;
    
    alter table {project_id}.{dataset_id}.{game_list_data}
    add primary key ({game_list_data_pk}) not enforced;
    """
    result_pk = client.query(query)
    print(result_pk.result())

# **Section 2: Generate Embeddings and Create Vector Indices**

In [35]:
embedding_steam = "llm_steam"
def create_embeddings(embeddings_name, column_name):
    query = f"""
    alter table `{project_id}.{dataset_id}.{game_list_data}`
    add column if not exists {embeddings_name} array<float64>;

    update `{project_id}.{dataset_id}.{game_list_data}` as t
    set t.{embeddings_name} = e.ml_generate_embedding_result
    from (
        select distinct
            ml_generate_embedding_result,
            content
        from ml.generate_embedding(
            model `{project_id}.{dataset_id}.{embedding_steam}`,
            (select ifnull({column_name}, ' ') as content
              from `{project_id}.{dataset_id}.{game_list_data}`
            )
        )
    ) e
    where ifnull(t.{column_name}, ' ') = e.content
    """
    return client.query(query)

exist_desc = check_column_exists(dataset_id, game_list_data, "desc_embeddings")
if not exist_desc:
    result_desc = create_embeddings("desc_embeddings", "`Short Description`")
    print(result_desc.result())
    
exist_tags = check_column_exists(dataset_id, game_list_data, "tags_embeddings")
if not exist_tags:
    result_tags = create_embeddings("tags_embeddings", "tags")
    print(result_tags.result())

# **Section 3: Usecases of Google BigQuery AI in Product Positioning**

## Usecase 1 - Search a list of similar Steam games given a user query on game characteristics

In [45]:
user_input = "What are first-person horror games without zombies that ?"
number_of_games = 10
embeddings = "desc_embeddings"
query = f"""
SELECT *

FROM VECTOR_SEARCH(
    (SELECT * from `{project_id}.{dataset_id}.{game_list_data}`),
    '{embeddings}',
    (SELECT ml_generate_embedding_result, content AS query 
    FROM ML.GENERATE_EMBEDDING(
    MODEL `{project_id}.{dataset_id}.{embedding_steam}`,
        (SELECT '{user_input}' AS content))
    ),
    top_k => {number_of_games},
    distance_type => 'COSINE')
"""
query2 = f"""
SELECT array_length(ml_generate_embedding_result), content AS query
    FROM ML.GENERATE_EMBEDDING(
         MODEL `{project_id}.{dataset_id}.{embedding_steam}`,
         (SELECT '{user_input}' AS content)
    )
"""
df = client.query(query).to_dataframe()
print(df)

                                               query  \
0  {'ml_generate_embedding_result': [0.0242799259...   
1  {'ml_generate_embedding_result': [0.0242799259...   
2  {'ml_generate_embedding_result': [0.0242799259...   
3  {'ml_generate_embedding_result': [0.0242799259...   
4  {'ml_generate_embedding_result': [0.0242799259...   
5  {'ml_generate_embedding_result': [0.0242799259...   
6  {'ml_generate_embedding_result': [0.0242799259...   
7  {'ml_generate_embedding_result': [0.0242799259...   
8  {'ml_generate_embedding_result': [0.0242799259...   
9  {'ml_generate_embedding_result': [0.0242799259...   

                                                base  distance  
0  {'App ID': '1565290', 'Name': 'Survival Horror...  0.211766  
1  {'App ID': '1067450', 'Name': 'Never Let Me Aw...  0.265053  
2  {'App ID': '1582870', 'Name': 'Slickpoo The Cl...  0.298663  
3  {'App ID': '853020', 'Name': 'Venal Soul (Chap...  0.307013  
4  {'App ID': '1653910', 'Name': 'Lost in Terra M...  0.30

In [55]:
df_result = []
for record in df['base'].tolist():
    df_result.append(record)
pd.DataFrame(df_result)

Unnamed: 0,App ID,Name,Short Description,Developer,Publisher,Genre,Tags,Type,Categories,Owners,...,CCU,Languages,Platforms,Release Date,Required Age,Website,Header Image,app_id,desc_embeddings,tags_embeddings
0,1565290,"Survival Horror #8,436",A first-person survival horror game,Lickbeans Interactive,Lickbeans Interactive,"Action, Free to Play","Survival Horror: 170, Horror: 161, Singleplaye...",game,"Single-player, Steam Achievements","0 .. 20,000",...,1,English,windows,2021/05/12,0,,https://cdn.akamai.steamstatic.com/steam/apps/...,1565290,"[0.02635873295366764, -0.030865009874105453, -...","[-0.016629289835691452, -0.03437075391411781, ..."
1,1067450,Never Let Me Awake,A first-person psychological survival horror game,SUZUKI PRODUCTION,SUZUKI PRODUCTION,"Action, Adventure, Indie","Adventure: 31, Action: 21, Gore: 21, Indie: 21...",game,Single-player,"0 .. 20,000",...,0,English,windows,2019/05/8,0,,https://cdn.akamai.steamstatic.com/steam/apps/...,1067450,"[0.039107076823711395, -0.03487164527177811, -...","[0.0034175594337284565, -0.01758565381169319, ..."
2,1582870,Slickpoo The Clown,Single player stealth game with a horror backg...,Ptyron,Ptyron,Simulation,"Simulation: 57, Survival Horror: 53, Psycholog...",game,Single-player,"0 .. 20,000",...,0,English,windows,2021/05/28,0,,https://cdn.akamai.steamstatic.com/steam/apps/...,1582870,"[-0.007724049035459757, -0.03689466789364815, ...","[-0.011998282745480537, 0.012152046896517277, ..."
3,853020,Venal Soul (Chapter One),An old school survival horror in First Person ...,Vanadial,Vanadial,"Action, Adventure, Indie","Action: 22, Adventure: 22, Gore: 21, Indie: 21...",game,Single-player,"0 .. 20,000",...,0,English,windows,2018/05/13,0,,https://cdn.akamai.steamstatic.com/steam/apps/...,853020,"[0.06727341562509537, -0.038571182638406754, -...","[5.516139935934916e-06, -0.021368566900491714,..."
4,1653910,Lost in Terra Mora,"A scary, adventure horror game with quest elem...",FallTrand L.W.,FallTrand L.W.,"Action, Adventure, Indie, RPG, Simulation","Horror: 555, Survival Horror: 549, Zombies: 54...",game,"Single-player, Steam Achievements","20,000 .. 50,000",...,0,"English, French, Italian, German, Spanish - Sp...",windows,2021/06/17,0,,https://cdn.akamai.steamstatic.com/steam/apps/...,1653910,"[0.02581055834889412, -0.015293313190340996, -...","[0.02549314871430397, -0.00454957690089941, -0..."
5,1887490,Dymension:Scary Horror Survival Shooter,It’s a horror first-person game where you have...,Midnight Games,Midnight Games,"Action, Adventure, Indie","Adventure: 196, Action: 190, Puzzle: 166, Acti...",game,Single-player,"0 .. 20,000",...,0,English,windows,2022/03/3,0,,https://cdn.akamai.steamstatic.com/steam/apps/...,1887490,"[0.02701568976044655, -0.008860357105731964, -...","[-0.011969579383730888, 0.006284061819314957, ..."
6,1533370,Undiscovered House,"First-person, story-based horror game. The gam...",Sysreb Games,Sysreb Games,"Action, Adventure, Indie","Horror: 110, Indie: 103, First-Person: 101, Da...",game,Single-player,"0 .. 20,000",...,0,"English, French, Italian, German, Spanish - Sp...",windows,2021/03/24,0,https://www.undiscoveredhousegame.com/,https://cdn.akamai.steamstatic.com/steam/apps/...,1533370,"[0.005753298755735159, -0.004222327843308449, ...","[0.004101963248103857, -0.026635196059942245, ..."
7,1349740,First Floor,A first person audio horror game.,Keffny Charles,Dirty Kaneez Gamez,"Casual, Indie","Casual: 60, First-Person: 50, Singleplayer: 43...",game,"Single-player, Partial Controller Support, Ste...","0 .. 20,000",...,0,English,windows,2020/07/18,0,,https://cdn.akamai.steamstatic.com/steam/apps/...,1349740,"[0.03199845924973488, 0.0026017765048891306, -...","[-0.010893996804952621, -0.02505333535373211, ..."
8,1273780,Peekaboo Collection - 3 Tales of Horror,Three horror games. Three different stories ex...,Vidas Salavejus,Vidas Salavejus,Indie,"Indie: 57, Violent: 33, Gore: 31, Horror: 15, ...",game,Single-player,"0 .. 20,000",...,0,English,windows,2020/04/14,0,,https://cdn.akamai.steamstatic.com/steam/apps/...,1273780,"[0.020020214840769768, -0.019941003993153572, ...","[0.021052107214927673, -0.03859810158610344, -..."
9,1265070,Escape From Violet Institute,2D sidescrolling 1st Person Survival Horror game,CJB Games,CJB Games,"Action, Indie","Horror: 195, Side Scroller: 190, 2D Platformer...",game,Single-player,"0 .. 20,000",...,0,English,windows,2020/04/16,0,,https://cdn.akamai.steamstatic.com/steam/apps/...,1265070,"[0.0004693367809522897, -0.06220121681690216, ...","[0.005873347632586956, -0.008186726830899715, ..."
