# About this notebook

-   ### Enhanced Text Preprocessing

    -   Unlike the main notebook, this version uses a deep-cleaning pipeline for the About the game column, removing HTML noise, special characters, and boilerplate Steam text to ensure the TF-IDF model focuses on actual gameplay descriptions.

-  ### Feature Engineering (Metadata Soup):

    -   **Unified Metadata**: Created a metadata_features column by merging Genres and Tags into unique, space-separated tokens.

    -   **Categorical Weighting:** Integrated Developers, Publishers, and Categories into the text soup to allow the model to recognize brand and functional similarities (e.g., "co-op" or "roguelike").

-  ### Hybrid Similarity Logic:

    -   TF-IDF + Cosine Similarity: Captures the semantic "vibe" and narrative of the games.

    -   Jaccard Similarity: Specifically used for Tags to ensure that games with identical community-driven labels are prioritized.

    -   Alpha Blending: A weighted scoring system that balances description-based similarity with tag-based similarity.

### Importing the dataset

In [134]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [135]:
path_to_dataset = 'dataset/games.csv'

if not os.path.isfile(path_to_dataset):
    print("not ok")

In [136]:
df = pd.read_csv(path_to_dataset)
df.head(n=3)

Unnamed: 0,AppID,Name,Release date,Estimated owners,Peak CCU,Required age,Price,DiscountDLC count,About the game,Supported languages,...,Median playtime forever,Median playtime two weeks,Developers,Publishers,Categories,Genres,Tags,Screenshots,Movies,Others
0,20200,Galactic Bowling,"Oct 21, 2008",0 - 20000,0,0,19.99,0,0,Galactic Bowling is an exaggerated and stylize...,...,0,0,0,Perpetual FX Creative,Perpetual FX Creative,"Single-player,Multi-player,Steam Achievements,...","Casual,Indie,Sports","Indie,Casual,Sports,Bowling",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
1,655370,Train Bandit,"Oct 12, 2017",0 - 20000,0,0,0.99,0,0,THE LAW!! Looks to be a showdown atop a train....,...,0,0,0,Rusty Moyher,Wild Rooster,"Single-player,Steam Achievements,Full controll...","Action,Indie","Indie,Action,Pixel Graphics,2D,Retro,Arcade,Sc...",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
2,1732930,Jolt Project,"Nov 17, 2021",0 - 20000,0,0,4.99,0,0,Jolt Project: The army now has a new robotics ...,...,0,0,0,Campi√£o Games,Campi√£o Games,Single-player,"Action,Adventure,Indie,Strategy",,https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...


### Exploring the dataset

In [137]:
df.shape

(111452, 40)

In [138]:
df.describe()

Unnamed: 0,AppID,Peak CCU,Required age,Price,DiscountDLC count,About the game,Metacritic url,Positive,Negative,Score rank,Achievements,Recommendations,Notes,Average playtime two weeks,Median playtime forever,Median playtime two weeks,Developers
count,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,44.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0
mean,1716972.0,177.7215,0.254208,7.061568,0.464209,0.44953,2.623354,0.030408,754.3525,125.859177,98.909091,17.511144,616.3715,81.24729,9.174954,72.65133,9.891038
std,920385.9,8390.462,2.035653,12.563246,3.503658,12.006677,13.736245,1.565136,21394.1,4002.844431,0.857747,150.139008,15738.54,999.935906,168.20103,1321.333137,183.232812
min,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,97.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,936255.0,0.0,0.0,0.99,0.0,0.0,0.0,0.0,0.0,0.0,98.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1665065.0,0.0,0.0,3.99,0.0,0.0,0.0,0.0,3.0,1.0,99.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,2453585.0,1.0,0.0,9.99,0.0,0.0,0.0,0.0,29.0,8.0,100.0,17.0,0.0,0.0,0.0,0.0,0.0
max,3671840.0,1311366.0,21.0,999.98,92.0,2366.0,97.0,100.0,5764420.0,895978.0,100.0,9821.0,3441592.0,145727.0,19159.0,208473.0,19159.0


In [139]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 111452 entries, 0 to 111451
Data columns (total 40 columns):
 #   Column                      Non-Null Count   Dtype  
---  ------                      --------------   -----  
 0   AppID                       111452 non-null  int64  
 1   Name                        111446 non-null  object 
 2   Release date                111452 non-null  object 
 3   Estimated owners            111452 non-null  object 
 4   Peak CCU                    111452 non-null  int64  
 5   Required age                111452 non-null  int64  
 6   Price                       111452 non-null  float64
 7   DiscountDLC count           111452 non-null  int64  
 8   About the game              111452 non-null  int64  
 9   Supported languages         104969 non-null  object 
 10  Full audio languages        111452 non-null  object 
 11  Reviews                     111452 non-null  object 
 12  Header image                10624 non-null   object 
 13  Website       

In [140]:
df.head(n=3)

Unnamed: 0,AppID,Name,Release date,Estimated owners,Peak CCU,Required age,Price,DiscountDLC count,About the game,Supported languages,...,Median playtime forever,Median playtime two weeks,Developers,Publishers,Categories,Genres,Tags,Screenshots,Movies,Others
0,20200,Galactic Bowling,"Oct 21, 2008",0 - 20000,0,0,19.99,0,0,Galactic Bowling is an exaggerated and stylize...,...,0,0,0,Perpetual FX Creative,Perpetual FX Creative,"Single-player,Multi-player,Steam Achievements,...","Casual,Indie,Sports","Indie,Casual,Sports,Bowling",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
1,655370,Train Bandit,"Oct 12, 2017",0 - 20000,0,0,0.99,0,0,THE LAW!! Looks to be a showdown atop a train....,...,0,0,0,Rusty Moyher,Wild Rooster,"Single-player,Steam Achievements,Full controll...","Action,Indie","Indie,Action,Pixel Graphics,2D,Retro,Arcade,Sc...",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
2,1732930,Jolt Project,"Nov 17, 2021",0 - 20000,0,0,4.99,0,0,Jolt Project: The army now has a new robotics ...,...,0,0,0,Campi√£o Games,Campi√£o Games,Single-player,"Action,Adventure,Indie,Strategy",,https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...


In [141]:
df.columns

Index(['AppID', 'Name', 'Release date', 'Estimated owners', 'Peak CCU',
       'Required age', 'Price', 'DiscountDLC count', 'About the game',
       'Supported languages', 'Full audio languages', 'Reviews',
       'Header image', 'Website', 'Support url', 'Support email', 'Windows',
       'Mac', 'Linux', 'Metacritic score', 'Metacritic url', 'User score',
       'Positive', 'Negative', 'Score rank', 'Achievements', 'Recommendations',
       'Notes', 'Average playtime forever', 'Average playtime two weeks',
       'Median playtime forever', 'Median playtime two weeks', 'Developers',
       'Publishers', 'Categories', 'Genres', 'Tags', 'Screenshots', 'Movies',
       'Others'],
      dtype='object')

In [142]:
idx = df.columns.get_loc('About the game')
idx

8

In [143]:
df.iloc[:, idx:-1] = df.iloc[:, idx+1:].values

df = df.iloc[:, :-1]

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)

  df.iloc[:, idx:-1] = df.iloc[:, idx+1:].values


In [144]:
df.head(n=3)

Unnamed: 0,AppID,Name,Release date,Estimated owners,Peak CCU,Required age,Price,DiscountDLC count,About the game,Supported languages,...,Average playtime two weeks,Median playtime forever,Median playtime two weeks,Developers,Publishers,Categories,Genres,Tags,Screenshots,Movies
0,20200,Galactic Bowling,"Oct 21, 2008",0 - 20000,0,0,19.99,0,Galactic Bowling is an exaggerated and stylize...,['English'],...,0,0,0,Perpetual FX Creative,Perpetual FX Creative,"Single-player,Multi-player,Steam Achievements,...","Casual,Indie,Sports","Indie,Casual,Sports,Bowling",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
1,655370,Train Bandit,"Oct 12, 2017",0 - 20000,0,0,0.99,0,THE LAW!! Looks to be a showdown atop a train....,"['English', 'French', 'Italian', 'German', 'Sp...",...,0,0,0,Rusty Moyher,Wild Rooster,"Single-player,Steam Achievements,Full controll...","Action,Indie","Indie,Action,Pixel Graphics,2D,Retro,Arcade,Sc...",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
2,1732930,Jolt Project,"Nov 17, 2021",0 - 20000,0,0,4.99,0,Jolt Project: The army now has a new robotics ...,"['English', 'Portuguese - Brazil']",...,0,0,0,Campi√£o Games,Campi√£o Games,Single-player,"Action,Adventure,Indie,Strategy",,https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...


In [145]:
df.to_csv("test.csv")

In [146]:
df.columns

Index(['AppID', 'Name', 'Release date', 'Estimated owners', 'Peak CCU',
       'Required age', 'Price', 'DiscountDLC count', 'About the game',
       'Supported languages', 'Full audio languages', 'Reviews',
       'Header image', 'Website', 'Support url', 'Support email', 'Windows',
       'Mac', 'Linux', 'Metacritic score', 'Metacritic url', 'User score',
       'Positive', 'Negative', 'Score rank', 'Achievements', 'Recommendations',
       'Notes', 'Average playtime forever', 'Average playtime two weeks',
       'Median playtime forever', 'Median playtime two weeks', 'Developers',
       'Publishers', 'Categories', 'Genres', 'Tags', 'Screenshots', 'Movies'],
      dtype='object')

In [147]:
df['Developers'].value_counts()

Developers
EroticGamesClub             216
Choice of Games             166
Laush Dmitriy Sergeevich    149
Boogygames Studios          145
Creobit                     138
                           ... 
Studio Binokle                1
GIBBING TREE, LLC             1
BitCore Studios LLC           1
SOMOV KIRILL                  1
Ledx                          1
Name: count, Length: 64655, dtype: int64

In [148]:
def clean_data(x):
    if isinstance(x, str):
        return x.replace(" ", "").lower()
    else:
        return ''

df['Developers_clean'] = df['Developers'].apply(clean_data)
df['Publishers_clean'] = df['Publishers'].apply(clean_data)

In [149]:
df.head(n=3)

Unnamed: 0,AppID,Name,Release date,Estimated owners,Peak CCU,Required age,Price,DiscountDLC count,About the game,Supported languages,...,Median playtime two weeks,Developers,Publishers,Categories,Genres,Tags,Screenshots,Movies,Developers_clean,Publishers_clean
0,20200,Galactic Bowling,"Oct 21, 2008",0 - 20000,0,0,19.99,0,Galactic Bowling is an exaggerated and stylize...,['English'],...,0,Perpetual FX Creative,Perpetual FX Creative,"Single-player,Multi-player,Steam Achievements,...","Casual,Indie,Sports","Indie,Casual,Sports,Bowling",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...,perpetualfxcreative,perpetualfxcreative
1,655370,Train Bandit,"Oct 12, 2017",0 - 20000,0,0,0.99,0,THE LAW!! Looks to be a showdown atop a train....,"['English', 'French', 'Italian', 'German', 'Sp...",...,0,Rusty Moyher,Wild Rooster,"Single-player,Steam Achievements,Full controll...","Action,Indie","Indie,Action,Pixel Graphics,2D,Retro,Arcade,Sc...",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...,rustymoyher,wildrooster
2,1732930,Jolt Project,"Nov 17, 2021",0 - 20000,0,0,4.99,0,Jolt Project: The army now has a new robotics ...,"['English', 'Portuguese - Brazil']",...,0,Campi√£o Games,Campi√£o Games,Single-player,"Action,Adventure,Indie,Strategy",,https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...,campi√£ogames,campi√£ogames


In [150]:
def merge_and_clean_tags(row):
    genres = str(row['Genres']).lower().split(',')
    tags = str(row['Tags']).lower().split(',')

    genres = [g.strip().replace(' ', '') for g in genres]
    tags = [t.strip().replace(' ', '') for t in tags]

    combined = list(set(genres + tags))

    return ' '.join(combined)

df['metadata_features'] = df.apply(merge_and_clean_tags, axis=1)

In [151]:
df.head(n=3)

Unnamed: 0,AppID,Name,Release date,Estimated owners,Peak CCU,Required age,Price,DiscountDLC count,About the game,Supported languages,...,Developers,Publishers,Categories,Genres,Tags,Screenshots,Movies,Developers_clean,Publishers_clean,metadata_features
0,20200,Galactic Bowling,"Oct 21, 2008",0 - 20000,0,0,19.99,0,Galactic Bowling is an exaggerated and stylize...,['English'],...,Perpetual FX Creative,Perpetual FX Creative,"Single-player,Multi-player,Steam Achievements,...","Casual,Indie,Sports","Indie,Casual,Sports,Bowling",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...,perpetualfxcreative,perpetualfxcreative,bowling sports indie casual
1,655370,Train Bandit,"Oct 12, 2017",0 - 20000,0,0,0.99,0,THE LAW!! Looks to be a showdown atop a train....,"['English', 'French', 'Italian', 'German', 'Sp...",...,Rusty Moyher,Wild Rooster,"Single-player,Steam Achievements,Full controll...","Action,Indie","Indie,Action,Pixel Graphics,2D,Retro,Arcade,Sc...",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...,rustymoyher,wildrooster,arcade difficult singleplayer blood casual act...
2,1732930,Jolt Project,"Nov 17, 2021",0 - 20000,0,0,4.99,0,Jolt Project: The army now has a new robotics ...,"['English', 'Portuguese - Brazil']",...,Campi√£o Games,Campi√£o Games,Single-player,"Action,Adventure,Indie,Strategy",,https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...,campi√£ogames,campi√£ogames,nan strategy action adventure indie


In [152]:
df.duplicated().sum()

0

In [153]:
df.drop_duplicates(inplace=True)

In [154]:
df.columns

Index(['AppID', 'Name', 'Release date', 'Estimated owners', 'Peak CCU',
       'Required age', 'Price', 'DiscountDLC count', 'About the game',
       'Supported languages', 'Full audio languages', 'Reviews',
       'Header image', 'Website', 'Support url', 'Support email', 'Windows',
       'Mac', 'Linux', 'Metacritic score', 'Metacritic url', 'User score',
       'Positive', 'Negative', 'Score rank', 'Achievements', 'Recommendations',
       'Notes', 'Average playtime forever', 'Average playtime two weeks',
       'Median playtime forever', 'Median playtime two weeks', 'Developers',
       'Publishers', 'Categories', 'Genres', 'Tags', 'Screenshots', 'Movies',
       'Developers_clean', 'Publishers_clean', 'metadata_features'],
      dtype='object')

In [155]:
df['About the game'][6]

'TD Worlds is a dynamic, highly strategical game that challenges your skill. Build an impenetrable defense and get ready to plunge into a new, unknown world to uncover its secrets. In this bizarre universe, each attempt will be unique in its own way, which provides many hours of fun to play. Clear three completely different worlds from darkness, spread your influence everywhere. unique conditions in each game; losing is an important part of game progress. Each defeat reveals something new for you; dynamic storytelling: the more you play, the more you learn about the world; get random rewards after each level; tired of playing? Feel free to leave the game, next time you will continue where you left; experiment with different tactics; Twitch integration - play with your viewers.'

In [156]:
import re
from bs4 import BeautifulSoup

In [157]:
def clean_description(text):
    if not isinstance(text, str):
        return ""

    text = BeautifulSoup(text, "html.parser").get_text(separator=" ")

    text = re.sub(r'[‚Ñ¢¬Æ¬©]', '', text)
    text = re.sub(r'http\S+|www\S+|https\S+', '', text, flags=re.MULTILINE)
    text = re.sub(r'[^a-zA-Z0-9\s]', ' ', text)

    text = text.lower().strip()
    text = " ".join(text.split())

    return text

df['About the game_clean'] = df['About the game'].apply(clean_description)

  text = BeautifulSoup(text, "html.parser").get_text(separator=" ")


In [158]:
df.to_csv('test.csv')

In [159]:
df.columns

Index(['AppID', 'Name', 'Release date', 'Estimated owners', 'Peak CCU',
       'Required age', 'Price', 'DiscountDLC count', 'About the game',
       'Supported languages', 'Full audio languages', 'Reviews',
       'Header image', 'Website', 'Support url', 'Support email', 'Windows',
       'Mac', 'Linux', 'Metacritic score', 'Metacritic url', 'User score',
       'Positive', 'Negative', 'Score rank', 'Achievements', 'Recommendations',
       'Notes', 'Average playtime forever', 'Average playtime two weeks',
       'Median playtime forever', 'Median playtime two weeks', 'Developers',
       'Publishers', 'Categories', 'Genres', 'Tags', 'Screenshots', 'Movies',
       'Developers_clean', 'Publishers_clean', 'metadata_features',
       'About the game_clean'],
      dtype='object')

In [160]:
df['Categories'][1]

'Single-player,Steam Achievements,Full controller support,Steam Leaderboards,Remote Play on Phone,Remote Play on Tablet,Remote Play on TV'

In [161]:
def clean_categories(text):
    if not isinstance(text, str):
        return ""

    parts = text.split(',')
    clean_parts = [p.strip().replace(' ', '').replace('-', '').lower() for p in parts]
    return " ".join(clean_parts)

df['Categories_clean'] = df['Categories'].apply(clean_categories)

In [162]:
df['Categories_clean'][1]

'singleplayer steamachievements fullcontrollersupport steamleaderboards remoteplayonphone remoteplayontablet remoteplayontv'

In [163]:
cols_to_keep = ['AppID', 'Name', 'Positive', 'Negative', 'About the game_clean', 'metadata_features', 'Categories_clean', 'Developers_clean', 'Publishers_clean']

df = df[cols_to_keep]

In [164]:
df.head(n=3)

Unnamed: 0,AppID,Name,Positive,Negative,About the game_clean,metadata_features,Categories_clean,Developers_clean,Publishers_clean
0,20200,Galactic Bowling,6,11,galactic bowling is an exaggerated and stylize...,bowling sports indie casual,singleplayer multiplayer steamachievements par...,perpetualfxcreative,perpetualfxcreative
1,655370,Train Bandit,53,5,the law looks to be a showdown atop a train th...,arcade difficult singleplayer blood casual act...,singleplayer steamachievements fullcontrollers...,rustymoyher,wildrooster
2,1732930,Jolt Project,0,0,jolt project the army now has a new robotics p...,nan strategy action adventure indie,singleplayer,campi√£ogames,campi√£ogames


In [165]:
df.to_csv('selected_data.csv')

In [166]:
df['About the game_clean'].isna().sum()

0

In [167]:
df['Name'].isna().sum()

6

In [168]:
df = df.dropna(subset=['Name', 'About the game_clean'])

In [169]:
df[df['About the game_clean'].str.strip() == ""]

Unnamed: 0,AppID,Name,Positive,Negative,About the game_clean,metadata_features,Categories_clean,Developers_clean,Publishers_clean
105,1943590,Ê∫™È£éË∞∑‰πãÊàò Playtest,0,0,,,,,
180,1966960,Burial Stone Playtest,0,0,,,,,
214,1688630,Emperial Knights Playtest,0,0,,,,,
220,1478660,Slotracers VR Playtest,0,0,,,,,
291,1613340,Pirates of the Asteroid Belt Playtest,0,0,,,,,
...,...,...,...,...,...,...,...,...,...
111401,2078010,Legends of Immortality,47,29,,combat tacticalrpg rpg 2d strategy earlyaccess...,singleplayer steamachievements familysharing,‰∫ëÊ¢¶Â±±Â∑•‰ΩúÂÆ§,ÊñπÂùóÊ∏∏Êàè(cubegame)
111405,3670540,Boiiing Boiiing Playtest,0,0,,,,,
111417,3609290,Êúà‰πãÂÜï Playtest,0,0,,,,,
111434,3654520,Delusional Playtest,0,0,,,,,


In [170]:
df = df[df['About the game_clean'].str.strip() != ""]

In [171]:
df.shape

(104737, 9)

In [172]:
cols_to_fix = ['metadata_features', 'Categories_clean', 'Developers_clean', 'Publishers_clean']
df[cols_to_fix] = df[cols_to_fix].fillna('')

In [173]:
df.isna().sum()

AppID                   0
Name                    0
Positive                0
Negative                0
About the game_clean    0
metadata_features       0
Categories_clean        0
Developers_clean        0
Publishers_clean        0
dtype: int64

In [174]:
df.columns

Index(['AppID', 'Name', 'Positive', 'Negative', 'About the game_clean',
       'metadata_features', 'Categories_clean', 'Developers_clean',
       'Publishers_clean'],
      dtype='object')

In [175]:
df.reset_index(drop=True, inplace=True)

In [176]:
df['combined'] = (
    df['About the game_clean'] + ' ' +
    df['metadata_features'] + ' ' +
    df['Categories_clean'] + ' '
    # df['Developers_clean'] + ' ' +
    # df['Publishers_clean']
)

df['combined'] = df['combined'].apply(lambda x: " ".join(x.split()))

### TF-IDF

In [177]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [178]:
vectorizer = TfidfVectorizer(stop_words='english')
games_vector = vectorizer.fit_transform(df['combined'])

In [179]:
type(vectorizer)

sklearn.feature_extraction.text.TfidfVectorizer

### Cosine Similarity

In [180]:
from sklearn.metrics.pairwise import cosine_similarity

In [181]:
def process_tags_to_set(tags_str):
    if not isinstance(tags_str, str):
        return set()
    return set([t.strip().lower() for t in tags_str.split(',')])

df['tags_set'] = df['metadata_features'].apply(process_tags_to_set)

In [182]:
def jaccard_similarity(set1, set2):
    intersection = len(set1 & set2)
    union = len(set1 | set2)
    return intersection / union if union != 0 else 0

In [190]:
def recommend_game(title, n_recommendation=5, alpha=0.3):
    """
    alpha = 0.3: 30% conteaza Cosine Similarity (Text), 70% conteaza Jaccard (Tags).
    """
    title = title.lower().strip()

    if 'name_norm' not in df.columns:
        df['name_norm'] = df['Name'].str.lower().str.strip()

    if title not in df['name_norm'].values:
        return {"error": "Game not found. Please check the name and try again."}

    g_idx = df[df['name_norm'] == title].index[0]

    current_game_tags = df.loc[g_idx, 'tags_set']
    jaccard_scores = [
        0 if idx == g_idx else jaccard_similarity(
            current_game_tags, df.loc[idx, 'tags_set']
        )
        for idx in range(len(df))
    ]

    jaccard_scores = np.array(jaccard_scores)
    if np.max(jaccard_scores) > 0:
        jaccard_scores /= np.max(jaccard_scores)

    cosine_scores = cosine_similarity(games_vector[g_idx], games_vector).flatten()

    if np.max(cosine_scores) > 0:
        cosine_scores /= np.max(cosine_scores)

    final_score = alpha * cosine_scores + (1 - alpha) * jaccard_scores
    recomm_idx = final_score.argsort()[::-1][1:n_recommendation + 1]

    recommendations = []
    for idx in recomm_idx:
        game_name = df.iloc[idx]['Name']
        game_image = df.iloc[idx].get('Header image', '')
        score = round(final_score[idx], 3)

        recommendations.append({
            "name": game_name,
            "image": game_image,
            "score": score
        })

    return {
        "input": title,
        "recommendations": recommendations
    }

In [193]:
title = 'phasmophobia'
print(recommend_game(title))

{'input': 'phasmophobia', 'recommendations': [{'name': 'Evil Hunt - Evil never sleeps', 'image': '', 'score': 0.177}, {'name': 'Paranormal Home Invaders', 'image': '', 'score': 0.129}, {'name': 'Ghostbane', 'image': '', 'score': 0.123}, {'name': 'BE HUNTED', 'image': '', 'score': 0.12}, {'name': 'Friki', 'image': '', 'score': 0.12}]}


In [199]:
title = 'The Crew‚Ñ¢ 2'
print(recommend_game(title))

{'input': 'the crew‚Ñ¢ 2', 'recommendations': [{'name': 'Automobilista 2', 'image': '', 'score': 0.063}, {'name': 'The Crew‚Ñ¢', 'image': '', 'score': 0.05}, {'name': 'Project CARS', 'image': '', 'score': 0.049}, {'name': 'Space Haven', 'image': '', 'score': 0.046}, {'name': 'TNN Motorsports Hardcore TR', 'image': '', 'score': 0.043}]}
