# Source

https://www.kaggle.com/datasets/nikdavis/steam-store-games

# Introduction

Steam is a popular digital distribution platform developed by Valve Corporation, which offers a vast library of video games, software, and other multimedia content. It provides both a desktop application and a web platform, allowing users to purchase, download, and play their favorite games with ease. The platform is known for its vast selection, user-friendly interface, and frequent sales, making it a go-to choice for gamers worldwide.

Recommendation systems are intelligent algorithms designed to provide personalized suggestions to users based on their preferences, behavior, or other relevant factors. These systems analyze data patterns to predict and recommend items that are likely to be of interest, enhancing user experience and engagement with the platform.

Steam uses a recommendation system to suggest games that users might be interested in based on their play history, friends' activity, and other factors. However, not all users log in when browsing the Steam store through a web browser, resulting in a lack of access to personalized data such as their game library and play history. This limitation poses a challenge to provide relevant game recommendations to users who are not logged in.

## Business question

The business question addressed in this project is: <br>
How can a recommendation system for the Steam platform that relies solely on game similarities be built, without requiring users to log in or access their personal library and play history? 

This approach will enable Steam to provide valuable suggestions to users who browse the platform without logging in, potentially increasing engagement and sales.

## Dataset Description

The following three datasets from Kaggle have been used in this project to build the game similarity-based recommendation system:

### 1. steam.csv
This dataset contains general information about the games available on Steam, including their title, release date, developer, publisher, genres, and other relevant details. The dataset has the following columns:

- **name**: The title of the game.
- **release_date**: The date the game was released.
- **english**: A binary indicator of whether the game supports the English language (1) or not (0).
- **developer**: The developer responsible for creating the game.
- **publisher**: The company responsible for publishing the game.
- **platforms**: The platforms the game is available on (e.g., Windows, Mac, Linux).
- **required_age**: The minimum age requirement to play the game, according to ESRB ratings or similar guidelines.
- **categories**: A list of categories that describe the game's features (e.g., Single-player, Multi-player, Co-op).
- **genres**: The genre(s) the game belongs to.
- **steamspy_tags**: The top three tags associated with the game, according to SteamSpy data.
- **achievements**: The number of achievements available in the game.
- **positive_ratings**: The number of positive ratings the game has received from users.
- **negative_ratings**: The number of negative ratings the game has received from users.
- **average_playtime**: The average playtime of the game, in minutes, among users who have played the game.
- **median_playtime**: The median playtime of the game, in minutes, among users who have played the game.
- **owners**: An estimated range of the total number of users who own the game (e.g., "20000-50000").
- **price**: The current price of the game in USD.

This dataset is essential for the project as it provides the basic information about the games and allows for identifying the key characteristics of each game to be used in the similarity calculations.


### 2. steam_description_data.csv
This dataset provides additional information about the games in the form of descriptive texts, including a short description, detailed description, and other related texts. The columns in this dataset are:

- **steam_appid**: Unique identifier for each game, matching the appid in steam.csv.
- **detailed_description**: A more comprehensive description of the game, often including gameplay features and storyline.
- **about_the_game**: A brief overview of the game, highlighting its main features and selling points.
- **short_description**: A concise summary of the game.

The descriptive texts in this dataset are important for understanding the content and context of each game. By analyzing these texts, relevant features can be extracted to determine the similarities between games.

### 3. steamspy_tag_data.csv
This dataset contains user-generated tags for each game, which help categorize and describe the games in more detail. The columns in this dataset are:

- **appid**: Unique identifier for each game, matching the appid in steam.csv.
- **[tag_name]**: Each column represents a different tag, and the value in each cell indicates the number of times that tag has been applied to the corresponding game by users.

The user-generated tags in this dataset provide valuable insights into the popular features and characteristics of each game, as perceived by the community. Analyzing these tags helps us to determine the similarities between games based on the preferences and opinions of Steam users.


By combining the information from these three datasets, a comprehensive understanding of each game's features, content, and user perception can be created. This data will enable us to build an effective similarity-based recommendation system for users who browse the Steam platform without logging in.


# Importing Libraries and Loading Datasets

In this step, I imported the necessary libraries and load the three datasets into separate DataFrame objects. 

I also set the index of each DataFrame to the unique game identifier (`appid` or `steam_appid`) for easier data manipulation. These values are key and foreign keys of the database.


In [2]:
import pandas as pd
import numpy as np
import spacy
from langdetect import detect
from datetime import datetime
from sklearn.preprocessing import MinMaxScaler
import pickle
from time import time

# Load Spacy's English model, using the medium-sized model for a balance between accuracy and processing time
nlp = spacy.load('en_core_web_md')

In [3]:
df = pd.read_csv('steam.csv')
df.set_index('appid', inplace=True)

tag_df = pd.read_csv('steamspy_tag_data.csv')
tag_df.set_index('appid', inplace=True)

des_df = pd.read_csv('steam_description_data.csv')
des_df.set_index('steam_appid', inplace=True)


pd.set_option('display.max_columns', 400)



# Data Exploration and Preprocessing: main dataframe

Before building the recommendation system, I explored the `steam` DataFrame and preprocess the data to ensure its quality and relevance.

### Findings
- 77% of the dataset has implicit missing values for the `average_playtime` column, making it unsuitable for use in the recommendation system.
- 511 games are not in English. These will be removed to focus on English-language games.

### Preprocessing Steps
1. Drop the non-English games from the `steam` DataFrame.
2. Remove unnecessary columns that are not relevant for the recommendation system: ['platforms', 'average_playtime', 'median_playtime', 'price', 'required_age', 'achievements', 'steamspy_tags', 'english'].


In [4]:
print(df.shape)
df.head()

(27075, 17)


Unnamed: 0_level_0,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,steamspy_tags,achievements,positive_ratings,negative_ratings,average_playtime,median_playtime,owners,price
appid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
10,Counter-Strike,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,124534,3339,17612,317,10000000-20000000,7.19
20,Team Fortress Classic,1999-04-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,3318,633,277,62,5000000-10000000,3.99
30,Day of Defeat,2003-05-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Valve Anti-Cheat enabled,Action,FPS;World War II;Multiplayer,0,3416,398,187,34,5000000-10000000,3.99
40,Deathmatch Classic,2001-06-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,1273,267,258,184,5000000-10000000,3.99
50,Half-Life: Opposing Force,1999-11-01,1,Gearbox Software,Valve,windows;mac;linux,0,Single-player;Multi-player;Valve Anti-Cheat en...,Action,FPS;Action;Sci-fi,0,5250,288,624,415,5000000-10000000,3.99


In [5]:
len(df[(df['average_playtime']==0)])/len(df)

0.7721144967682364

In [6]:
print(df[df['english']==0].shape[0])
df = df[df['english']!=0]

511


In [7]:
df.drop(columns=['platforms', 'average_playtime', 'median_playtime', 'price',
                 'required_age', 'achievements', 'steamspy_tags', 'english'],
                 inplace=True)
df.head()

Unnamed: 0_level_0,name,release_date,developer,publisher,categories,genres,positive_ratings,negative_ratings,owners
appid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
10,Counter-Strike,2000-11-01,Valve,Valve,Multi-player;Online Multi-Player;Local Multi-P...,Action,124534,3339,10000000-20000000
20,Team Fortress Classic,1999-04-01,Valve,Valve,Multi-player;Online Multi-Player;Local Multi-P...,Action,3318,633,5000000-10000000
30,Day of Defeat,2003-05-01,Valve,Valve,Multi-player;Valve Anti-Cheat enabled,Action,3416,398,5000000-10000000
40,Deathmatch Classic,2001-06-01,Valve,Valve,Multi-player;Online Multi-Player;Local Multi-P...,Action,1273,267,5000000-10000000
50,Half-Life: Opposing Force,1999-11-01,Gearbox Software,Valve,Single-player;Multi-player;Valve Anti-Cheat en...,Action,5250,288,5000000-10000000


# Data Exploration and Preprocessing: Description DataFrame

In this step, I explored the `des_df` DataFrame and preprocess the data to ensure its quality and relevance for building the recommendation system.

### Data Exploration Findings
- The 'about_the_game' column provides a good balance between the amount of text context and information needed for creating meaningful embeddings using Spacy's document vectorizer.
- Many HTML code sections were found in the text data. These may negatively affect the document vectorization process.

### Preprocessing Steps
1. Select the 'about_the_game' column for further processing, as it provides sufficient context and information for the recommendation system.
2. Replace HTML code sections with a whitespace to prevent joining words together and to ensure that Spacy's pipeline can handle the text data more effectively.
3. Focus on games with English descriptions, as they form the majority of the dataset and allow us to use Spacy's pre-trained English model.



In [8]:
des_df.sample(5)

Unnamed: 0_level_0,detailed_description,about_the_game,short_description
steam_appid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
744550,Top-down tactical shooter with stealth element...,Top-down tactical shooter with stealth element...,Tactical Operations is a subsystem of investig...
875240,Many of us live in a noisy urban environment. ...,Many of us live in a noisy urban environment. ...,An artistic bonsai tree simulation game that f...
758100,"<h2 class=""bb_tag""><strong>About E-Startup</st...","<h2 class=""bb_tag""><strong>About E-Startup</st...",E-Startup is a sandbox business simulation gam...
503480,"Run, jump and slash your way through an epic, ...","Run, jump and slash your way through an epic, ...",Mahluk is a hack and slash platformer game wit...
995000,You Play as Erica who follows the adventurous ...,You Play as Erica who follows the adventurous ...,"Explore, find clues, solve puzzles and make yo..."


In [9]:
des_df = des_df[['about_the_game']].copy()
des_df

Unnamed: 0_level_0,about_the_game
steam_appid,Unnamed: 1_level_1
10,Play the world's number 1 online action game. ...
20,One of the most popular online action games of...
30,Enlist in an intense brand of Axis vs. Allied ...
40,Enjoy fast-paced multiplayer gaming with Death...
50,Return to the Black Mesa Research Facility as ...
...,...
1065230,"<img src=""https://steamcdn-a.akamaihd.net/stea..."
1065570,Have you ever been so lonely that no one but y...
1065650,<strong>Super Star Blast </strong>is a space b...
1066700,Pursue a snow-white deer through an enchanted ...


In [10]:
# Print five random 'about_the_game' contents
def show_rnd_strings(des_df):
    rn_indexes = des_df['about_the_game'].sample(5).index
    for i in rn_indexes:
        print(des_df['about_the_game'].loc[i])
        print()
        
# show_rnd_strings(des_df)

In [11]:
# Remove html code
pattern = '<.*?>'
des_df['about_the_game'] = des_df['about_the_game'].replace(pattern, ' ', regex=True)

# show_rnd_strings(des_df) 

In [12]:
# remove non english content
def is_english(text):
    try:
        lang = detect(text)
        if lang == 'en':
            return True
        else:
            return False
    except:
        return False

# Filter rows that are in English
des_df = des_df[des_df['about_the_game'].apply(is_english)]

## Data Exploration and Preprocessing: Tags DataFrame

In this step, I explored the `tag_df` DataFrame and preprocess the data to ensure its quality and relevance for building the recommendation system.

### Data Exploration Findings
- There are 575 games with no tags. These games will be removed from the dataset as they do not provide any useful information for the recommendation system.

### Preprocessing Steps
- Drop the games with no tags from the `tag_df` DataFrame.



In [13]:
print(len(tag_df[tag_df.sum(axis=1)==0]), 'games have no tags')
tag_df = tag_df[tag_df.sum(axis=1) != 0]

575 games have no tags


# Selecting Games with Complete Information

To ensure the recommendation system has sufficient data for each game, I selected only the games that have complete information across all three DataFrames.

#### Selection
I performed an inner join on the three DataFrames to select only games that have all information available.



In [14]:
print('df rows:', df.shape[0])
print('tag_df rows:', tag_df.shape[0])
print('des_df rows:', des_df.shape[0])

df rows: 26564
tag_df rows: 28447
des_df rows: 27085


In [15]:
df = df.join(des_df, how='inner')
df = df.join(tag_df, how='inner')
print(df.shape)

(26507, 381)


# Creating Data Subsets

To facilitate further data processing and feature engineering, separate DataFrames are created for each relevant aspect of the games.

### Data Subsets
1. **name_df**: Contains the names of the games.
2. **gen_df**: Contains the genres of the games.
3. **cat_df**: Contains the categories of the games.
4. **tag_df**: Contains the user-generated tags for the games.
5. **rating_df**: Contains the positive and negative ratings, as well as the number of owners for the games.
6. **date_df**: Contains the release dates of the games.
7. **dev_df**: Contains the developers of the games.
8. **pub_df**: Contains the publishers of the games.
9. **des_df**: Contains the 'about_the_game' descriptions of the games.
10. **scores_df**: An empty DataFrame to store the final scores used for the recommendation system.



In [16]:
name_df = df[['name']].copy()

gen_df = df[['genres']].copy()
cat_df = df[['categories']].copy()
tag_df = tag_df.loc[:, '1980s':].copy()
rating_df = df[['positive_ratings', 'negative_ratings', 'owners']].copy()
date_df = df[['release_date']].copy()
dev_df = df[['developer']].copy()
pub_df = df[['publisher']].copy()
des_df = df[['about_the_game']].copy()

scores_df = df[[]].copy()


## Transforming Category and Genre Data

Category and genre data are stored as single strings with multiple elements separated by semicolons. To facilitate the use of Jaccard similarity later on, these strings are transformed into lists.

### Data Transformation Steps
1. Convert the 'categories' and 'genres' columns into lists by splitting the strings using the semicolon delimiter.
2. Retain only the relevant categories: 'Co-op', 'Local Co-op', 'Local Multi-Player', 'MMO', 'Multi-player', 'Online Co-op', 'Online Multi-Player', 'Shared/Split Screen', and 'Single-player'.
3. Remove redundant information: Use the 'MMO' category to represent all massively multiplayer online games, as all games with the 'MMO' tag have the 'Massively Multiplayer' category.



In [17]:
def get_all_list_elements(df, column):
    df_genres = df.explode(column)
    unique_genres = df_genres[column].unique()
    # Print the unique genres
    return (list(unique_genres))

In [18]:
cat_df['cat_list'] = cat_df['categories'].apply(lambda x: x.split(';'))
cat_df.drop(columns=['categories'], inplace=True)
print(get_all_list_elements(cat_df, 'cat_list'))

cat_df.sample(5)

['Multi-player', 'Online Multi-Player', 'Local Multi-Player', 'Valve Anti-Cheat enabled', 'Single-player', 'Steam Cloud', 'Steam Achievements', 'Steam Trading Cards', 'Captions available', 'Partial Controller Support', 'Includes Source SDK', 'Cross-Platform Multiplayer', 'Stats', 'Commentary available', 'Includes level editor', 'Steam Workshop', 'In-App Purchases', 'Co-op', 'Full controller support', 'Steam Leaderboards', 'SteamVR Collectibles', 'Online Co-op', 'Shared/Split Screen', 'Local Co-op', 'MMO', 'VR Support', 'Mods', 'Mods (require HL2)', 'Steam Turn Notifications']


Unnamed: 0,cat_list
705210,"[Single-player, Multi-player, Online Multi-Pla..."
701900,"[Single-player, Partial Controller Support]"
240360,"[Single-player, Steam Achievements, Partial Co..."
854740,[Single-player]
532840,"[Single-player, Steam Achievements]"


In [19]:
keep_set = {'Co-op', 'Local Co-op', 'Local Multi-Player', 'MMO', 'Multi-player', 'Online Co-op',
            'Online Multi-Player', 'Shared/Split Screen', 'Single-player'}

cat_df['cat_list'] = cat_df['cat_list'].apply(lambda x: list(set(x).intersection(keep_set)) if len(x)>0 else ['unknown'])
cat_df['cat_list'] = cat_df['cat_list'].apply(lambda x: x if len(x)>0 else ['unknown'])

get_all_list_elements(cat_df, 'cat_list')


['Multi-player',
 'Online Multi-Player',
 'Local Multi-Player',
 'Single-player',
 'Co-op',
 'unknown',
 'Online Co-op',
 'Shared/Split Screen',
 'Local Co-op',
 'MMO']

In [20]:
df[cat_df['cat_list'].apply(lambda x: True if 'unknown' in x else False)].shape[0]

186

In [21]:
gen_df['gen_list'] = gen_df['genres'].apply(lambda x: x.split(';'))
gen_df.drop(columns=['genres'], inplace=True)
print(get_all_list_elements(gen_df, 'gen_list'))


gen_df.sample(5)

['Action', 'Free to Play', 'Strategy', 'Adventure', 'Indie', 'RPG', 'Animation & Modeling', 'Video Production', 'Casual', 'Simulation', 'Racing', 'Violent', 'Massively Multiplayer', 'Nudity', 'Sports', 'Early Access', 'Gore', 'Utilities', 'Design & Illustration', 'Web Publishing', 'Education', 'Software Training', 'Sexual Content', 'Audio Production', 'Game Development', 'Photo Editing', 'Accounting', 'Documentary', 'Tutorial']


Unnamed: 0,gen_list
827900,"[Action, Casual, Indie]"
951010,"[Action, Adventure, Indie]"
819940,"[Action, Indie, Early Access]"
631990,"[Violent, Adventure, Indie, Strategy]"
681150,"[Action, Adventure, Casual, Indie, Racing, Spo..."


#### Exploring MMO

In [22]:
len(df[cat_df['cat_list'].apply(lambda x: True if 'MMO' in x else False)])#[['name','categories','genres']])


400

In [23]:
len(df[gen_df['gen_list'].apply(lambda x: True if 'Massively Multiplayer' in x else False)][['categories','genres']])


692

In [24]:
# Games that has MMO tag and Massively Multiplayer genre
mmo1 = len(df[(cat_df['cat_list'].apply(lambda x: True if 'MMO' in x else False))&
              (gen_df['gen_list'].apply(lambda x: True if 'Massively Multiplayer' in x else False))])
# Games that has nor MMO tag but Massively Multiplayer genre
mmo2 = len(df[(cat_df['cat_list'].apply(lambda x: False if 'MMO' in x else True))&
              (gen_df['gen_list'].apply(lambda x: True if 'Massively Multiplayer' in x else False))])
mmo1+mmo2

692

In [25]:
gen_df[(gen_df['gen_list'].apply(lambda x: True if 'Massively Multiplayer' in x else False))&
       (gen_df['gen_list'].apply(lambda x: True if len(x)==1 else False))]

Unnamed: 0,gen_list


In [26]:
i = cat_df[(gen_df['gen_list'].apply(lambda x: True if 'Massively Multiplayer' in x else False)) &
           (cat_df['cat_list'].apply(lambda x: False if 'MMO' in x else True))].index

cat_df.loc[i, 'cat_list'] = np.array([x + ['MMO'] for x in cat_df.loc[i, 'cat_list']], dtype=object)

gen_df['gen_list'] = gen_df['gen_list'].apply(lambda x: [g for g in x if g != 'Massively Multiplayer'])


In [27]:
get_all_list_elements(gen_df, 'gen_list')

['Action',
 'Free to Play',
 'Strategy',
 'Adventure',
 'Indie',
 'RPG',
 'Animation & Modeling',
 'Video Production',
 'Casual',
 'Simulation',
 'Racing',
 'Violent',
 'Nudity',
 'Sports',
 'Early Access',
 'Gore',
 'Utilities',
 'Design & Illustration',
 'Web Publishing',
 'Education',
 'Software Training',
 'Sexual Content',
 'Audio Production',
 'Game Development',
 'Photo Editing',
 'Accounting',
 'Documentary',
 'Tutorial']

## Normalizing Tag Data

To ensure that the tag data are on a consistent scale, the values in the `tag_df` DataFrame are normalized to be between 0 and 1 by dividing each value by the maximum value in its row.


In [28]:
def divide_by_max(row):
    max_val = row.max()
    if max_val != 0:
        return row / max_val
    else:
        return 0
    
tag_df = tag_df.apply(divide_by_max, axis=1)

tag_df.sample(5)

Unnamed: 0_level_0,1980s,1990s,2.5d,2d,2d_fighter,360_video,3d,3d_platformer,3d_vision,4_player_local,4x,6dof,atv,abstract,action,action_rpg,action_adventure,addictive,adventure,agriculture,aliens,alternate_history,america,animation_&_modeling,anime,arcade,arena_shooter,artificial_intelligence,assassin,asynchronous_multiplayer,atmospheric,audio_production,bmx,base_building,baseball,based_on_a_novel,basketball,batman,battle_royale,beat_em_up,beautiful,benchmark,bikes,blood,board_game,bowling,building,bullet_hell,bullet_time,crpg,capitalism,card_game,cartoon,cartoony,casual,cats,character_action_game,character_customization,chess,choices_matter,choose_your_own_adventure,cinematic,city_builder,class_based,classic,clicker,co_op,co_op_campaign,cold_war,colorful,comedy,comic_book,competitive,conspiracy,controller,conversation,crafting,crime,crowdfunded,cult_classic,cute,cyberpunk,cycling,dark,dark_comedy,dark_fantasy,dark_humor,dating_sim,demons,design_&_illustration,destruction,detective,difficult,dinosaurs,diplomacy,documentary,dog,dragons,drama,driving,dungeon_crawler,dungeons_&_dragons,dynamic_narration,dystopian_,early_access,economy,education,emotional,epic,episodic,experience,experimental,exploration,fmv,fps,faith,family_friendly,fantasy,fast_paced,feature_film,female_protagonist,fighting,first_person,fishing,flight,football,foreign,free_to_play,funny,futuristic,gambling,game_development,gamemaker,games_workshop,gaming,god_game,golf,gore,gothic,grand_strategy,great_soundtrack,grid_based_movement,gun_customization,hack_and_slash,hacking,hand_drawn,hardware,heist,hex_grid,hidden_object,historical,hockey,horror,horses,hunting,illuminati,indie,intentionally_awkward_controls,interactive_fiction,inventory_management,investigation,isometric,jrpg,jet,kickstarter,lego,lara_croft,lemmings,level_editor,linear,local_co_op,local_multiplayer,logic,loot,lore_rich,lovecraftian,mmorpg,moba,magic,management,mars,martial_arts,massively_multiplayer,masterpiece,match_3,mature,mechs,medieval,memes,metroidvania,military,mini_golf,minigames,minimalist,mining,mod,moddable,modern,motocross,motorbike,mouse_only,movie,multiplayer,multiple_endings,music,music_based_procedural_generation,mystery,mystery_dungeon,mythology,nsfw,narration,naval,ninja,noir,nonlinear,nudity,offroad,old_school,on_rails_shooter,online_co_op,open_world,otome,parkour,parody_,party_based_rpg,perma_death,philisophical,photo_editing,physics,pinball,pirates,pixel_graphics,platformer,point_&_click,political,politics,pool,post_apocalyptic,procedural_generation,programming,psychedelic,psychological,psychological_horror,puzzle,puzzle_platformer,pve,pvp,quick_time_events,rpg,rpgmaker,rts,racing,real_time_tactics,real_time,real_time_with_pause,realistic,relaxing,remake,replay_value,resource_management,retro,rhythm,robots,rogue_like,rogue_lite,romance,rome,runner,sailing,sandbox,satire,sci_fi,science,score_attack,sequel,sexual_content,shoot_em_up,shooter,short,side_scroller,silent_protagonist,simulation,singleplayer,skateboarding,skating,skiing,sniper,snow,snowboarding,soccer,software,software_training,sokoban,souls_like,soundtrack,space,space_sim,spectacle_fighter,spelling,split_screen,sports,star_wars,stealth,steam_machine,steampunk,story_rich,strategy,strategy_rpg,stylized,submarine,superhero,supernatural,surreal,survival,survival_horror,swordplay,tactical,tactical_rpg,tanks,team_based,tennis,text_based,third_person,third_person_shooter,thriller,time_attack,time_management,time_manipulation,time_travel,top_down,top_down_shooter,touch_friendly,tower_defense,trackir,trading,trading_card_game,trains,transhumanism,turn_based,turn_based_combat,turn_based_strategy,turn_based_tactics,tutorial,twin_stick_shooter,typing,underground,underwater,unforgiving,utilities,vr,vr_only,vampire,video_production,villain_protagonist,violent,visual_novel,voice_control,voxel,walking_simulator,war,wargame,warhammer_40k,web_publishing,werewolves,western,word_game,world_war_i,world_war_ii,wrestling,zombies,e_sports
appid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1,Unnamed: 192_level_1,Unnamed: 193_level_1,Unnamed: 194_level_1,Unnamed: 195_level_1,Unnamed: 196_level_1,Unnamed: 197_level_1,Unnamed: 198_level_1,Unnamed: 199_level_1,Unnamed: 200_level_1,Unnamed: 201_level_1,Unnamed: 202_level_1,Unnamed: 203_level_1,Unnamed: 204_level_1,Unnamed: 205_level_1,Unnamed: 206_level_1,Unnamed: 207_level_1,Unnamed: 208_level_1,Unnamed: 209_level_1,Unnamed: 210_level_1,Unnamed: 211_level_1,Unnamed: 212_level_1,Unnamed: 213_level_1,Unnamed: 214_level_1,Unnamed: 215_level_1,Unnamed: 216_level_1,Unnamed: 217_level_1,Unnamed: 218_level_1,Unnamed: 219_level_1,Unnamed: 220_level_1,Unnamed: 221_level_1,Unnamed: 222_level_1,Unnamed: 223_level_1,Unnamed: 224_level_1,Unnamed: 225_level_1,Unnamed: 226_level_1,Unnamed: 227_level_1,Unnamed: 228_level_1,Unnamed: 229_level_1,Unnamed: 230_level_1,Unnamed: 231_level_1,Unnamed: 232_level_1,Unnamed: 233_level_1,Unnamed: 234_level_1,Unnamed: 235_level_1,Unnamed: 236_level_1,Unnamed: 237_level_1,Unnamed: 238_level_1,Unnamed: 239_level_1,Unnamed: 240_level_1,Unnamed: 241_level_1,Unnamed: 242_level_1,Unnamed: 243_level_1,Unnamed: 244_level_1,Unnamed: 245_level_1,Unnamed: 246_level_1,Unnamed: 247_level_1,Unnamed: 248_level_1,Unnamed: 249_level_1,Unnamed: 250_level_1,Unnamed: 251_level_1,Unnamed: 252_level_1,Unnamed: 253_level_1,Unnamed: 254_level_1,Unnamed: 255_level_1,Unnamed: 256_level_1,Unnamed: 257_level_1,Unnamed: 258_level_1,Unnamed: 259_level_1,Unnamed: 260_level_1,Unnamed: 261_level_1,Unnamed: 262_level_1,Unnamed: 263_level_1,Unnamed: 264_level_1,Unnamed: 265_level_1,Unnamed: 266_level_1,Unnamed: 267_level_1,Unnamed: 268_level_1,Unnamed: 269_level_1,Unnamed: 270_level_1,Unnamed: 271_level_1,Unnamed: 272_level_1,Unnamed: 273_level_1,Unnamed: 274_level_1,Unnamed: 275_level_1,Unnamed: 276_level_1,Unnamed: 277_level_1,Unnamed: 278_level_1,Unnamed: 279_level_1,Unnamed: 280_level_1,Unnamed: 281_level_1,Unnamed: 282_level_1,Unnamed: 283_level_1,Unnamed: 284_level_1,Unnamed: 285_level_1,Unnamed: 286_level_1,Unnamed: 287_level_1,Unnamed: 288_level_1,Unnamed: 289_level_1,Unnamed: 290_level_1,Unnamed: 291_level_1,Unnamed: 292_level_1,Unnamed: 293_level_1,Unnamed: 294_level_1,Unnamed: 295_level_1,Unnamed: 296_level_1,Unnamed: 297_level_1,Unnamed: 298_level_1,Unnamed: 299_level_1,Unnamed: 300_level_1,Unnamed: 301_level_1,Unnamed: 302_level_1,Unnamed: 303_level_1,Unnamed: 304_level_1,Unnamed: 305_level_1,Unnamed: 306_level_1,Unnamed: 307_level_1,Unnamed: 308_level_1,Unnamed: 309_level_1,Unnamed: 310_level_1,Unnamed: 311_level_1,Unnamed: 312_level_1,Unnamed: 313_level_1,Unnamed: 314_level_1,Unnamed: 315_level_1,Unnamed: 316_level_1,Unnamed: 317_level_1,Unnamed: 318_level_1,Unnamed: 319_level_1,Unnamed: 320_level_1,Unnamed: 321_level_1,Unnamed: 322_level_1,Unnamed: 323_level_1,Unnamed: 324_level_1,Unnamed: 325_level_1,Unnamed: 326_level_1,Unnamed: 327_level_1,Unnamed: 328_level_1,Unnamed: 329_level_1,Unnamed: 330_level_1,Unnamed: 331_level_1,Unnamed: 332_level_1,Unnamed: 333_level_1,Unnamed: 334_level_1,Unnamed: 335_level_1,Unnamed: 336_level_1,Unnamed: 337_level_1,Unnamed: 338_level_1,Unnamed: 339_level_1,Unnamed: 340_level_1,Unnamed: 341_level_1,Unnamed: 342_level_1,Unnamed: 343_level_1,Unnamed: 344_level_1,Unnamed: 345_level_1,Unnamed: 346_level_1,Unnamed: 347_level_1,Unnamed: 348_level_1,Unnamed: 349_level_1,Unnamed: 350_level_1,Unnamed: 351_level_1,Unnamed: 352_level_1,Unnamed: 353_level_1,Unnamed: 354_level_1,Unnamed: 355_level_1,Unnamed: 356_level_1,Unnamed: 357_level_1,Unnamed: 358_level_1,Unnamed: 359_level_1,Unnamed: 360_level_1,Unnamed: 361_level_1,Unnamed: 362_level_1,Unnamed: 363_level_1,Unnamed: 364_level_1,Unnamed: 365_level_1,Unnamed: 366_level_1,Unnamed: 367_level_1,Unnamed: 368_level_1,Unnamed: 369_level_1,Unnamed: 370_level_1,Unnamed: 371_level_1
338290,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.478261,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.956522,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.695652,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
754310,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.954545,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.954545,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.954545,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.227273,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
639720,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.956522,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
977290,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.909091,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
627410,0.0,0.0,0.0,0.375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.96875,0.0,0.0,0.0,0.0,0.0,0.0,0.34375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.96875,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.34375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.34375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.96875,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.34375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.34375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.34375,0.0,0.34375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Transforming Rating Data

To better utilize the rating data and incorporate the number of owners as a weighting method for the rating, the `rating_df` DataFrame is transformed as follows:

### Data Transformation Steps
1. Compute a new column `rating_score` by dividing the positive ratings by the total number of ratings.
2. Convert the 'owners' column into a normalized scale based on the number of owners.
3. Compute the `weighted_rating` by multiplying the `rating_score` and the normalized `owners` values.

The advantages of this approach include:
- The rating score provides a more meaningful metric to compare games by accounting for both positive and negative ratings.
- The weighted rating takes into consideration the number of owners, giving more weight to popular games and mitigating the impact of games with a smaller user base that may have skewed ratings.



In [29]:
rating_df['rating_score'] = rating_df['positive_ratings']/(rating_df['positive_ratings'] + 
                                                           rating_df['negative_ratings'])
rating_df.drop(columns=['positive_ratings', 'negative_ratings'], inplace=True)
rating_df.sample(5)

Unnamed: 0,owners,rating_score
551750,0-20000,0.785714
656740,0-20000,0.965517
957390,0-20000,0.75
304650,500000-1000000,0.828567
776000,0-20000,0.636364


In [30]:
display(rating_df['owners'].value_counts())
rating_df['owners'] = rating_df['owners'].apply(lambda x: 0.6 if x=='0-20000' else 
                                                          0.8 if x=='20000-50000' else 
                                                          0.9 if x=='50000-100000' else 
                                                          1)
display(rating_df['owners'].value_counts())

rating_df['weighted_rating'] = rating_df['rating_score']*rating_df['owners']
rating_df.drop(columns=['owners','rating_score'], inplace=True)
rating_df.sample(5)

0-20000                18129
20000-50000             3012
50000-100000            1671
100000-200000           1368
200000-500000           1268
500000-1000000           512
1000000-2000000          284
2000000-5000000          191
5000000-10000000          45
10000000-20000000         21
20000000-50000000          3
50000000-100000000         2
100000000-200000000        1
Name: owners, dtype: int64

0.6    18129
1.0     3695
0.8     3012
0.9     1671
Name: owners, dtype: int64

Unnamed: 0,weighted_rating
731520,0.6
342660,0.4
231140,0.569004
523210,0.698113
778150,0.507692


## Transforming Release Date Data

To better utilize the release date data, the `date_df` DataFrame is transformed as follows:

### Data Transformation Steps
1. Convert the 'release_date' column to a datetime format.
2. Compute 'days_since_release' by calculating the number of days between the release date and the current date.
3. Normalize the 'days_since_release' column using MinMaxScaler.



In [31]:
date_df['release_date'] = pd.to_datetime(date_df['release_date'])

reference_date = datetime.now()
date_df['days_since_release'] = (reference_date - date_df['release_date']).dt.days

scaler = MinMaxScaler()
date_df['days_norm'] = scaler.fit_transform(date_df[['days_since_release']])

date_df.drop(columns=['release_date', 'days_since_release'], inplace=True)

date_df.sample(5)

Unnamed: 0,days_norm
461350,0.07348
972740,0.021066
884410,0.035862
968470,0.021693
615910,0.086646


## Cleaning and Transforming Developer Data

The `dev_df` DataFrame contains many developer names with variations, additional characters, and sub-branches.

After exloring all developers, in order to better utilize this data, the following steps are taken:

1. Remove any irrelevant characters from the developer names.
2. Combine sub-branches or variations of major developers, such as CAPCOM, 2K, and UBISOFT, into a single developer name.
3. Convert the cleaned string of developer names into a list to utilize Jaccard similarity later on.


In [32]:
dev_df.sample(5)

Unnamed: 0,developer
849870,"MAMMOSSIX Co., Ltd."
720250,Reflect Studios
50910,Big Fish Games
709350,Mad Head Games
494990,AK84C


In [33]:
def devlist(string):
    
    for char in '•.\',()?!/&\"+:[]_-{}®©':
        string = string.replace(char, '')

    for s in ['BANDAI NAMCO Entertainment Inc','BANDAI NAMCO Studio Inc','BANDAI NAMCO Studios Inc','BANDAI NAMCO Studios Vancouver','BANDAI NAMCO Studios']:
        string = string.replace(s, 'BANDAI NAMCO')
    for s in ['Aspyr Mac Linux  Windows Update','Aspyr Linux','Aspyr Mac  Linux','Aspyr Mac']:
        string = string.replace(s, 'Aspyr')
    for s in ['2K Australia','2K Boston','2K China','2K Czech','2K Marin']:
        string = string.replace(s, '2K')
    for s in ['Ubisoft  Shanghaï','Ubisoft - San Francisco','Ubisoft Annecy','Ubisoft Belgrade','Ubisoft Blue Byte','Ubisoft Bucharest','Ubisoft Bulgaria','Ubisoft Entertainment','Ubisoft Kiev','Ubisoft Milan','Ubisoft Montpellier','Ubisoft Montreal','Ubisoft Montreal Studio','Ubisoft Montreal, Massive Entertainment, and Ubisoft Shanghai','Ubisoft Montreal, Red Storm, Shanghai, Toronto, Kiev','Ubisoft Montréal','Ubisoft Paris','Ubisoft Pune','Ubisoft Quebec','Ubisoft Quebec, in collaboration with Ubisoft Annecy, Bucharest, Kiev, Montreal, Montpellier, Shanghai, Singapore, Sofia, Toronto studios','Ubisoft Reflections','Ubisoft Romania','Ubisoft San Francisco','Ubisoft Shanghai','Ubisoft Singapore','Ubisoft Sofia','Ubisoft Toronto']:
        string = string.replace(s, 'Ubisoft')
        
    for s in ['Rockstar Leeds','Rockstar New England','Rockstar North','Rockstar North / Toronto','Rockstar Studios','Rockstar Toronto']:        
        string = string.replace(s, 'Rockstar Games')

    string = string.replace('Alternative Software Ltd','Alternative Software')        
    string = string.replace('Alternative Dreams Studios','Alternative Dreams')        
    string = string.replace('ARTDINK CORPORATION', 'ARTDINK')        
    string = string.replace('4 Fun Studio Inc','4 Fun Studio')              
    string = string.replace('FrameLineNetwork Kft','FrameLineNetwork',)
    string = string.replace('Flight Systems LLC', 'Flight Systems',)
    string = string.replace('Feral Interactive MacLinux', 'Feral interactive')
    string = string.replace('Feral interactive Mac', 'Feral interactive')
    string = string.replace('Feral Interactive Linux', 'Feral interactive')
    string = string.replace('FarSight Studios Inc', 'FarSight Studios')
    string = string.replace('FELISTELLA Co Ltd','FELISTELLA')
    string = string.replace('ERS Game Studios','ERS GStudios')
    string = string.replace('ERS Games Studio','ERS GStudios')
    string = string.replace('Evil Tortilla Games Incorporated','Evil Tortilla Games')
    string = string.replace('ECC GAMES SP Z OO',  'ECC GAMES SA')
    string = string.replace('Deceptive Games Ltd', 'Deceptive Games')
    string = string.replace('CyberConnect2 Co Ltd', 'CyberConnect2')
    string = string.replace('CAPCOM CO LTD', 'CAPCOM')
    string = string.replace('CAPCOM Co Ltd', 'CAPCOM')
    string = string.replace('VRROOM Ultimate VR Experiences  BV', 'VRROOM Ultimate VR Experiences')
    string = string.replace('Subaltern Games LLC','Subaltern Games')
    string = string.replace('Stumphead GamesLLC','Stumphead Games')
    string = string.replace('Stainless GamesLtd','Stainless Games')
    string = string.replace('Square Enix Montréal', 'Square Enix')
    string = string.replace('Spark Plug Games LLC','Spark Plug Games')
    string = string.replace('Spaces of Play UG','Spaces of Play')
    string = string.replace('Sanzaru Games Inc','Sanzaru Games')
    string = string.replace('Rocketeer Games Studio LLC','Rocketcat Games')
    string = string.replace('Random Thoughts Enterainment','Random Thoughts Entertainment')
    string = string.replace('Napoleon Games sro', 'Napoleon Games')
    string = string.replace('Monolith Productions, Inc','Monolith Productions')
    string = string.replace('Modern Dream Ltd','Modern Dream')
    string = string.replace('McMagic Productions sro','McMagic Productions')
    string = string.replace('Kverta Limited', 'Kverta')
    string = string.replace('Kool2Play Sp z oo', 'Kool2Play')
    string = string.replace('Independent Arts Software GmbH','Independent Arts Software')
    string = string.replace('Immersive VR Education Ltd','Immersive VR Education PLC')
    string = string.replace('FromSoftware Inc', 'FromSoftware')
    string = string.replace('Frima Studio Inc','Frima')
    string = string.replace('Frima Studio','Frima')

    string = string.lower()
    string = string.replace(' ', '')
                              
    return string.split(";")

In [34]:
dev_df['dev_list'] = dev_df['developer'].apply(devlist)
dev_df.drop(columns=['developer'], inplace=True)

dev_df.sample(5)

Unnamed: 0,dev_list
7220,[pendulostudios]
464500,[frontwing]
428240,[sharktreestudios]
827270,[salsawi]
6420,"[mithisgames, thqnordic]"


## Cleaning and Transforming Publisher Data

Similar to the developer data, the `pub_df` DataFrame also contains variations and additional characters in the publisher names. Therefore, to better utilize this data, the same steps are taken:

1. Remove any irrelevant characters from the publisher names.
2. Combine sub-branches or variations of major publishers, such as CAPCOM, and UBISOFT, into a single publisher name.
3. Convert the cleaned string of publisher names into a list to utilize Jaccard similarity later on.


In [35]:
def publist(string):
    
    for char in '•.\',\\()?!/&\"#+:[]_-{}=%®©@$':
        string = string.replace(char, '')
        
    for s in ['Aspyr Mac Linux','Aspyr Linux','Aspyr Mac']:
        string = string.replace(s, 'Aspyr')

    for s in ['Bandai Namco', 'Bandai Namco Entertainment', 'BANDAI NAMCO Entertainment America', 'BANDAI NAMCO Entertainment Europe', 'BANDAI NAMCO Entertainment', 'BANDAI NAMCO Entertainement',]:
        string = string.replace(s, 'BANDAI NAMCO')
       
    for s in ['Bethesda Softworks','Bethesda-Soft','Bethesda-Softworks']:
        string = string.replace(s, 'Bethesda')
 
    for s in ['Dovetail Games - Fishing','Dovetail Games - Flight','Dovetail Games - TSW','Dovetail Games - Trains',]:
        string = string.replace(s, 'Dovetail Games')
        
    for s in ['CAPCOM CO LTD','CAPCOM Co Ltd','Capcom Co Ltd','Capcom USA Inc']:
        string = string.replace(s, 'CAPCOM')
  
    for s in ['Gaijin Distribution KFT','Gaijin Entertainment','Gaijin Entertainment Corporation','Gaijin inCubator']:
        string = string.replace(s, 'Gaijin Games')
    
    for s in ['Konami Digital Entertainement GmbH', 'Konami Digital Entertainment', 'Konami Digital Entertainment GmbH','Konami Digital Entertainment Inc']:
        string = string.replace(s, 'Konami')
   
    string = string.replace('Big Ant Studios Steam', 'Big Ant Studios')
    string = string.replace( 'Big Fish Games Inc',  'Big Fish Games')
    string = string.replace('Bitbox SL', 'Bitbox Ltd')
    string = string.replace('Blazing Griffin Ltd','Blazing Griffin')
    string = string.replace('Blob Games Studio','Blob Games')
    string = string.replace('CM Softworks Inc','CM Softworks')
    string = string.replace('Cartoon Network Games', 'Cartoon Network')
    string = string.replace('CasGames', 'CasGame')
    string = string.replace('Chorus Worldwide Games Limited','Chorus Worldwide')
    string = string.replace('Circle 5 Studios','Circle 5')
    string = string.replace('Crazy Rocks Studios','Crazy Rocks')
    string = string.replace('Crazysoft Limited','Crazysoft Ltd')
    string = string.replace('Crian Soft SA','Crian Soft')
    string = string.replace('Empyrean Interactive','Empyrean')
    string = string.replace('EuroVideo Medien GmbH','EuroVideo Medien')
    string = string.replace('Fair Games Studio','Fair Games')
    string = string.replace('Fantasy Flight Publishing Inc', 'Fantasy Flight Publishing')
    string = string.replace('FarSight Studios Inc',  'FarSight Studios')
    string = string.replace('Fatbot Games s r o', 'Fatbot Games')
    string = string.replace('Fatmoth Interactive','Fatmoth')
    string = string.replace('Feral Interactive MacLinux', 'Feral Interactive')
    string = string.replace('Feral Interactive Linux', 'Feral Interactive')
    string = string.replace('Feral Interactive Mac', 'Feral Interactive')
    string = string.replace('Five Mind Creations UG haftungsbeschraenkt','Five Mind Creations')
    string = string.replace('Fixpoint Productions Ltd', 'Fixpoint Productions')
    string = string.replace('Flight Systems LLC','Flight Systems')
    string = string.replace('Forthright Entertainment LLC','Forthright Entertainment')
    string = string.replace('Fossil Games','Fossil')
    string = string.replace('Frima Originals','Frima')
    string = string.replace('Frima Studio','Frima')
    string = string.replace('FromSoftware Japan','FromSoftware')
    string = string.replace('FromSoftware Inc','FromSoftware')
    string = string.replace('Game Troopers SL','Game Troopers')
    string = string.replace('Gameforge 4D GmbHu202c', 'Gameforge 4D GmbH')
    string = string.replace('GungHo Online Entertainment America Inc','GungHo Online Entertainment America')
    string = string.replace('Hazardous Software Inc','Hazardous Software')
    string = string.replace('Idea Factory International Inc','Idea Factory')
    string = string.replace('Idea Factory International','Idea Factory')
    string = string.replace('Immanitas Entertainment GmbH','Immanitas Entertainment')
    string = string.replace('Immanitas Entertainment GmbH','Immanitas Entertainment PLC')
    string = string.replace('Kagura Games Chinese Localization','Kagura Games')
    string = string.replace('Kerberos Productions Inc','Kerberos Productions')
    string = string.replace('Kool2Play Sp z oo','Kool2Play')
    string = string.replace('LB Studios','LB')
    string = string.replace('Lemondo Entertainment', 'Lemondo Games')
    string = string.replace('Lofty','Loft')
    string = string.replace('MAGES Inc','MAGES')
    string = string.replace('MGP Studios', 'MGP')
    string = string.replace('MK game production', 'MK Games')
    string = string.replace('MK-ULTRA Games', 'MK Games')
    string = string.replace('MLBcom','MLB')
    string = string.replace('Mayflower Entertainment KR','Mayflower Entertainment')
    string = string.replace('McMagic Productions sro','McMagic Productions')
    string = string.replace('NS STUDIO','NS')
    string = string.replace('Nexon America Inc','Nexon',)
    string = string.replace('Nexon America','Nexon',)
    string = string.replace('Nexon Korea Corporation','Nexon')
    string = string.replace('Oddworld Inhabitants Inc','Oddworld Inhabitants')
    string = string.replace('Outright Games Ltd', 'Outright Games')
    string = string.replace('Perfect Square Studios LLC','Perfect Square Studios')
    string = string.replace('Praxia Entertainment Inc','Praxia Entertainment')
    string = string.replace('SelfPublished','Selfp')
    string = string.replace('Ubisoft Entertainment','Ubisoft')
    string = string.replace( 'Viva Media Inc', 'Viva Media')
    string = string.replace('Warner Bros Interactive Entertainment','Warner Bros')
    string = string.replace('Warner Bros Interactive','Warner Bros')
    string = string.replace('crowgames UG haftungsbeschränkt','crowgames')
    string = string.replace('方块游戏 Asia', '方块游戏')
    string = string.replace('方块游戏CubeGame', '方块游戏')

    string = string.lower()
    string = string.replace(' ', '')

    if string == '':
        string = 'unknown'
        
    return string.split(";")

In [36]:
pub_df['pub_list'] = pub_df['publisher'].apply(publist)
pub_df.drop(columns=['publisher'], inplace=True)
pub_df.sample(5)

Unnamed: 0,pub_list
985930,[watertemplestudio]
502500,[bandainamco]
881250,[fyrg]
544790,[aelentertainment]
997280,[flashynurav]


## Processing the description dataframe

The des_df DataFrame contains the 'about_the_game' descriptions for the games. To convert these descriptions into numerical vectors that can be used for similarity calculations, I used the doc.vector to call tex embedding vectorizer function, from  the SpaCy library, obtaining a 300-dimensional  vector representation of text data. This method is useful because it takes into account the context and the meaning of the words.

Using SpaCy's doc.vector is useful because it provides a compact numerical representation of the text, which can be utilized for similarity calculations in the recommendation system.

SpaCy is a popular open-source library for natural language processing in Python. It is designed to be fast, efficient, and easy to use, offering various capabilities such as tokenization, part-of-speech tagging, dependency parsing, and more.

The advantage of using SpaCy for processing game descriptions is that it simplifies the process of converting text data into numerical vectors while maintaining the semantic relationships between the words. This allows the recommendation system to effectively capture the similarities between games based on their descriptions.

In [37]:
def get_vector(text):
    doc = nlp(text)
    return doc.vector

start_time = time()

des_vectors = des_df['about_the_game'].apply(get_vector)

print(f"Elapsed time: {time()-start_time} seconds")

des_vectors_df = pd.DataFrame(des_vectors.to_list())
des_vectors_df = des_vectors_df.rename(columns={(i): f'description V_{i+1}' for i in range(0, len(des_vectors_df.columns)+1)})
des_vectors_df.index = des_df.index
des_vectors_df.sample(5)

Elapsed time: 839.2062339782715 seconds


Unnamed: 0,description V_1,description V_2,description V_3,description V_4,description V_5,description V_6,description V_7,description V_8,description V_9,description V_10,description V_11,description V_12,description V_13,description V_14,description V_15,description V_16,description V_17,description V_18,description V_19,description V_20,description V_21,description V_22,description V_23,description V_24,description V_25,description V_26,description V_27,description V_28,description V_29,description V_30,description V_31,description V_32,description V_33,description V_34,description V_35,description V_36,description V_37,description V_38,description V_39,description V_40,description V_41,description V_42,description V_43,description V_44,description V_45,description V_46,description V_47,description V_48,description V_49,description V_50,description V_51,description V_52,description V_53,description V_54,description V_55,description V_56,description V_57,description V_58,description V_59,description V_60,description V_61,description V_62,description V_63,description V_64,description V_65,description V_66,description V_67,description V_68,description V_69,description V_70,description V_71,description V_72,description V_73,description V_74,description V_75,description V_76,description V_77,description V_78,description V_79,description V_80,description V_81,description V_82,description V_83,description V_84,description V_85,description V_86,description V_87,description V_88,description V_89,description V_90,description V_91,description V_92,description V_93,description V_94,description V_95,description V_96,description V_97,description V_98,description V_99,description V_100,description V_101,description V_102,description V_103,description V_104,description V_105,description V_106,description V_107,description V_108,description V_109,description V_110,description V_111,description V_112,description V_113,description V_114,description V_115,description V_116,description V_117,description V_118,description V_119,description V_120,description V_121,description V_122,description V_123,description V_124,description V_125,description V_126,description V_127,description V_128,description V_129,description V_130,description V_131,description V_132,description V_133,description V_134,description V_135,description V_136,description V_137,description V_138,description V_139,description V_140,description V_141,description V_142,description V_143,description V_144,description V_145,description V_146,description V_147,description V_148,description V_149,description V_150,description V_151,description V_152,description V_153,description V_154,description V_155,description V_156,description V_157,description V_158,description V_159,description V_160,description V_161,description V_162,description V_163,description V_164,description V_165,description V_166,description V_167,description V_168,description V_169,description V_170,description V_171,description V_172,description V_173,description V_174,description V_175,description V_176,description V_177,description V_178,description V_179,description V_180,description V_181,description V_182,description V_183,description V_184,description V_185,description V_186,description V_187,description V_188,description V_189,description V_190,description V_191,description V_192,description V_193,description V_194,description V_195,description V_196,description V_197,description V_198,description V_199,description V_200,description V_201,description V_202,description V_203,description V_204,description V_205,description V_206,description V_207,description V_208,description V_209,description V_210,description V_211,description V_212,description V_213,description V_214,description V_215,description V_216,description V_217,description V_218,description V_219,description V_220,description V_221,description V_222,description V_223,description V_224,description V_225,description V_226,description V_227,description V_228,description V_229,description V_230,description V_231,description V_232,description V_233,description V_234,description V_235,description V_236,description V_237,description V_238,description V_239,description V_240,description V_241,description V_242,description V_243,description V_244,description V_245,description V_246,description V_247,description V_248,description V_249,description V_250,description V_251,description V_252,description V_253,description V_254,description V_255,description V_256,description V_257,description V_258,description V_259,description V_260,description V_261,description V_262,description V_263,description V_264,description V_265,description V_266,description V_267,description V_268,description V_269,description V_270,description V_271,description V_272,description V_273,description V_274,description V_275,description V_276,description V_277,description V_278,description V_279,description V_280,description V_281,description V_282,description V_283,description V_284,description V_285,description V_286,description V_287,description V_288,description V_289,description V_290,description V_291,description V_292,description V_293,description V_294,description V_295,description V_296,description V_297,description V_298,description V_299,description V_300
877550,-0.649386,0.224372,-0.169371,-0.092183,-0.137173,-0.00635,-0.026544,-0.126208,0.035112,1.690713,-0.165936,-0.00752,0.042527,0.10544,-0.105865,-0.072799,-0.081827,1.151265,-0.197947,-0.018468,-0.020434,0.076008,0.003608,-0.183293,-0.01691,-0.007654,-0.059822,-0.117105,-0.018925,-0.078407,-0.088125,0.01984,-0.009074,-0.058728,0.080029,0.150851,-0.093613,-0.00608,0.017362,-0.028141,0.051252,0.030481,0.048463,-0.0522,0.035033,-0.02779,0.002606,0.000645,-0.058073,0.029554,-0.091156,0.073372,0.079692,-0.028763,0.0356,-0.027811,0.007625,-0.079471,-0.028919,-0.04581,-0.080888,-0.152314,0.020351,0.124716,0.15861,-0.077904,0.036878,0.080682,0.072709,-0.083222,0.13269,0.088787,0.164616,-0.042953,0.051268,0.135773,0.104722,-0.047216,0.063846,0.277432,-0.054022,0.025642,0.028132,-0.048895,0.030594,-0.115971,0.24812,0.007796,0.132818,-0.083156,-0.076292,-0.032962,-0.100542,-0.02089,-0.04057,-0.183332,0.067861,0.008908,0.047232,0.009795,-0.097037,0.032278,-0.187841,0.076145,0.227575,-1.110301,0.081972,0.053068,-0.06573,-0.033452,-0.020543,-0.214448,0.047634,-0.101459,-0.042403,-0.018527,-0.033634,-0.112555,-0.021737,-0.062102,0.160297,0.042643,-0.017503,-0.102642,-0.058316,-0.020575,-0.000879,-0.055476,-0.13496,-0.06042,0.071668,-0.015889,0.03242,0.070398,0.104451,0.016416,-0.155071,0.030909,-0.047037,-0.036267,-1.073535,0.14803,0.119607,-0.01842,-0.042086,-0.072983,-0.048722,0.059273,0.08116,0.012449,0.096475,0.085941,-0.018025,-0.044302,0.03369,-0.050368,-0.05047,0.110347,-0.056966,-0.150318,-0.027803,-0.011558,0.064578,-0.030345,-0.144209,-0.01958,0.102926,0.056701,0.068235,-0.045244,-0.022948,-0.079385,0.007999,-0.096348,0.021549,0.007065,-0.092538,0.086048,0.100618,0.011288,0.029184,-0.04577,-0.254859,0.026787,-0.12471,0.076762,-0.038709,-0.028446,0.052395,0.084325,0.149372,-0.017406,-0.038108,-0.008935,-0.054106,-0.088284,-0.006665,0.00594,-0.132082,0.12948,0.180135,-0.183032,-0.058148,-0.022558,-0.076067,-0.034787,0.029743,0.044697,0.025878,0.150974,0.06411,0.052617,-0.057285,-0.139285,0.031531,0.030642,0.064393,0.104451,-0.163035,0.042552,-0.06195,0.000155,-0.186792,-0.160666,0.018428,-0.090888,-0.056562,-0.005985,0.093922,-0.010793,0.064704,-0.123915,0.131797,0.042035,0.013927,0.037063,0.106059,-0.056894,-0.044943,0.015089,-0.000715,-0.006664,-0.011467,0.14565,0.101,-0.093872,0.041384,-0.162348,-0.02208,0.097572,0.036966,-0.045655,0.054232,-0.100686,0.008708,0.25803,-0.062083,-0.067546,0.017579,0.056415,-0.064178,0.085817,0.026065,0.153355,0.123855,-0.081383,-0.017336,0.085599,0.483322,0.083296,0.08355,-0.070606,-0.014624,-0.171636,-0.128808,-0.002146,-0.077234,-0.028947,-0.066267,0.238523,0.014185,0.010927,-0.038806,0.051486,-0.006357,0.13962,0.066644,-0.144642,0.012529,0.057424,-0.245627,0.043884,-0.044055,-0.03139,0.033107,0.073119,-0.020698,-0.111485,0.093064,0.070821
289050,-0.678563,0.109212,-0.092267,-0.070556,-0.116048,-0.024126,0.105354,-0.193509,-0.013823,1.666345,-0.159772,-0.110628,-0.066894,-0.033297,-0.155726,0.002756,-0.02993,0.577382,-0.13593,0.015403,-0.041847,-0.009819,0.014021,-0.196949,0.014128,0.021051,-0.061195,-0.080006,0.011321,-0.095713,-0.081856,0.065284,-0.121693,0.004392,0.142251,0.102765,-0.043185,0.008789,-0.068737,-0.021715,0.042829,-0.078851,0.039093,0.046695,0.003368,-0.027253,-0.004078,0.063503,-0.03766,0.030557,-0.078272,0.192679,0.017108,-0.04916,-0.067176,0.052986,-0.029297,-0.03603,-0.000722,-0.054326,-0.128068,-0.093689,-0.058511,0.057572,0.155103,-0.092667,-0.007418,-0.026682,0.0556,-0.079925,0.126413,0.015118,0.091126,-0.034464,0.01049,0.112926,0.044572,0.00661,-0.008945,0.203252,-0.044438,0.035402,0.028809,-0.038101,-0.034707,-0.067619,0.43615,-0.238241,-0.092106,-0.062949,-0.041795,0.01779,-0.137055,-0.006438,0.077574,-0.16482,0.059735,0.000913,0.035635,0.062196,-0.052171,0.098422,-0.073205,-0.016653,0.225056,-0.821774,0.000957,0.014453,-0.04518,-0.037843,0.003215,-0.205181,0.008631,-0.137966,-0.128221,-0.026102,0.033511,-0.122218,-0.099582,-0.073982,0.086485,0.129198,-0.069175,-0.134088,-0.074349,0.016677,-0.002658,-0.065289,-0.108819,-0.036872,-0.062756,0.002367,0.050106,0.071377,0.118452,-0.101748,-0.226056,-0.029189,-0.015864,-0.012954,-1.604307,0.068893,0.162356,-0.003724,-0.063861,-0.101299,-0.003627,0.055232,0.045025,-0.032367,0.009059,0.095598,0.034792,0.008972,0.061779,0.006628,-0.063896,0.031445,0.055619,-0.180806,0.006739,-0.079237,0.107879,0.064798,-0.087633,-0.108104,0.057403,0.029834,-0.035126,-0.109145,-0.007089,-0.035582,0.06156,-0.00745,0.10845,-0.00366,-0.158402,0.109052,0.093464,0.097649,0.031098,0.004956,-0.218537,0.027296,-0.029442,0.079217,0.01268,0.122902,0.126523,0.088914,0.178515,-0.026301,0.011178,0.044579,-0.102857,-0.05103,0.026012,-0.060497,-0.090947,0.077025,0.055722,-0.074165,-0.043142,-0.048438,-0.061295,-0.026774,0.024332,0.061668,0.014311,0.014354,-0.083661,0.098248,-0.016231,-0.052086,-0.000631,0.022981,0.087404,0.037593,-0.087499,-0.097873,-0.04552,0.02242,-0.244613,-0.12865,0.073027,-0.08922,-0.037717,0.002689,0.086046,-0.054995,-0.043244,-0.063563,0.134027,0.034764,0.067069,0.046736,0.069896,-0.050472,0.029908,0.023652,-0.011208,-0.059376,-0.021085,0.113839,0.06222,-0.013458,-0.047442,-0.096473,-0.009312,0.169313,-0.034111,0.017541,0.092576,-0.105726,-0.025772,0.057717,-0.069933,-0.055698,0.017105,0.039777,-0.112217,0.040816,0.042099,0.100657,0.16497,0.007267,-0.039633,0.102633,0.287264,0.144313,-0.009256,-0.049762,-0.11417,-0.176824,-0.090248,0.018588,0.026242,-0.092021,-0.069778,0.266826,0.025888,0.032228,0.020574,0.038715,0.046362,0.090846,0.102288,-0.147711,0.067033,0.000194,-0.123144,0.071683,-0.090549,-0.020991,0.051442,0.116055,-0.024489,-0.105676,-0.041447,0.016192
949520,-0.68106,0.165796,-0.168786,-0.068867,-0.097624,-0.01324,0.002835,-0.033152,-0.0326,1.785456,-0.147226,-0.058406,-0.040769,0.073718,-0.110687,0.00864,-0.120204,0.967662,-0.147418,-0.041143,-0.112475,0.013378,-0.024542,-0.179935,-0.03818,-0.057666,0.003302,-0.030666,-0.01498,-0.0896,-0.135664,0.029013,-0.064283,-0.13213,0.116652,0.154471,-0.050914,-0.012791,-0.053942,0.00299,0.076852,-0.047015,0.028102,0.008825,0.116441,-0.089869,-0.062812,0.057587,-0.064002,0.002562,-0.091362,0.15008,0.054921,-0.029696,0.057744,-0.006113,-0.035607,-0.04276,-0.033376,0.016411,-0.09516,-0.117747,-0.006239,0.11086,0.092427,-0.124298,0.013267,0.003392,0.039793,-0.073724,0.116131,0.122982,0.084283,-0.055603,0.024551,0.092552,0.057962,0.011616,0.003328,0.100276,-0.059413,0.100089,0.018891,-0.075003,0.023544,-0.177783,0.294543,0.002146,-0.003182,-0.094055,-0.072219,-0.016578,-0.099222,0.006447,-0.017512,-0.133177,0.092617,0.000649,0.077833,0.018569,-0.056202,0.086558,-0.117845,0.087635,0.209523,-1.288744,0.008774,-0.019957,-0.053178,-0.049395,0.044294,-0.163504,-0.005612,-0.163518,-0.050044,-0.008997,0.009156,-0.155352,-0.038218,-0.083841,0.051065,0.082182,0.002979,-0.184436,-0.048802,-0.057308,0.028518,-0.062657,-0.12169,-0.033746,0.049186,-0.098152,0.019104,0.102993,0.122453,-0.008965,-0.164927,-0.00508,0.000277,-0.016095,-1.199968,0.173009,0.136468,0.003709,-0.050728,-0.071419,-0.016197,0.084068,0.060479,0.028094,0.012008,0.125171,0.06146,-0.023621,0.04884,-0.001949,-0.073082,0.111162,-0.011045,-0.232446,0.0224,-0.012358,0.092508,0.026855,-0.051834,-0.011772,0.126912,-0.010712,-0.001322,-0.080235,-0.006948,-0.092152,0.046607,0.009149,0.073571,0.027238,-0.09082,0.039793,0.113466,0.02146,0.025584,0.01916,-0.236894,0.067832,-0.06994,0.020321,-0.01469,0.010183,0.059359,0.040236,0.210519,-0.067346,0.005344,-0.000326,-0.100435,-0.045203,-0.005489,-0.019823,-0.115117,0.087028,0.1059,-0.057116,-0.055341,0.02497,-0.045354,-0.022844,0.045031,0.06149,0.05745,0.060054,-0.006977,0.043139,-0.008951,-0.122099,0.005003,-0.004666,0.095825,0.06928,-0.088716,-0.09422,-0.026982,0.070224,-0.147054,-0.172039,0.068861,-0.106193,-0.050245,-0.019685,0.016759,0.042627,0.060573,-0.088878,0.139842,0.041728,0.046463,0.028395,0.065652,-0.094737,0.004241,-0.054678,0.007783,-0.020669,-0.028855,0.140269,0.075509,-0.04938,-0.030435,-0.16057,-0.059293,0.16095,-0.021707,0.002986,0.068587,-0.061447,-0.008782,0.239292,0.004756,-0.034328,0.104294,0.09751,-0.132555,0.029582,0.036944,0.084401,0.1708,-0.067589,-0.007728,0.061411,0.295613,0.132427,0.102741,-0.075052,-0.089357,-0.198584,-0.137018,-0.000655,-0.018674,0.038456,-0.069301,0.271264,0.009263,0.004649,-0.040036,0.037825,0.01566,0.110354,0.093324,-0.182719,0.075589,0.013894,-0.217667,0.09836,-0.065591,-0.071346,-0.005297,0.171375,-0.029797,-0.141184,0.011374,-0.013089
886780,-0.723686,0.164918,-0.131152,-0.066012,-0.119403,-0.014556,0.025716,-0.052308,0.049928,1.868257,-0.163588,-0.045404,-0.020411,0.091899,-0.017153,-0.010425,-0.077777,0.890749,-0.210757,0.002479,-0.020416,0.05631,0.013753,-0.204423,-0.043834,-0.05366,0.009623,-0.063982,-0.046756,-0.119715,-0.080728,0.084166,-0.073365,-0.060282,0.129845,0.099481,-0.076116,0.016798,-0.034506,0.052107,-0.008637,-0.081529,0.023827,-0.00858,0.105114,0.00625,0.005983,0.113867,-0.089978,-0.001447,-0.034452,0.142011,0.116864,-0.049447,-0.001382,0.053121,-0.004029,-0.021927,-3.8e-05,0.001132,-0.089642,-0.091047,-0.006227,0.058648,0.130851,-0.088845,0.044357,-0.000916,0.078049,-0.10955,0.092134,0.108277,0.119134,-0.082925,-0.007826,0.099479,0.098658,-0.073429,0.00796,0.21266,-0.070546,-0.019626,0.026468,-0.030213,0.034131,-0.082151,0.216889,-0.101102,-0.012796,-0.10772,-0.013703,-0.00792,-0.087186,0.051123,-0.005351,-0.166236,0.07911,0.018162,0.029315,0.03666,-0.038597,0.030275,-0.118477,0.011278,0.212128,-1.176506,0.061443,0.027816,-0.045041,0.017962,-0.030267,-0.20267,0.019475,-0.123596,-0.064855,0.05729,0.019679,-0.148597,-0.033888,-0.085452,0.114477,0.040542,-0.017032,-0.125351,-0.027341,-0.009539,-0.01286,-0.062135,-0.13277,-0.006001,-0.012048,-0.044032,0.04729,0.070807,0.129557,-0.035963,-0.16637,-0.011365,-0.045382,-0.024614,-1.354216,0.161241,0.085712,0.010078,-0.036693,-0.007498,-0.026778,0.064163,0.073626,-0.024493,-0.03025,0.131388,0.059747,-0.007005,-0.032529,-0.073974,-0.017553,0.110213,-0.021786,-0.172281,0.000107,0.010849,0.044472,0.034493,-0.070292,-0.067825,0.073044,-0.038468,-0.008674,-0.055527,-0.028196,-0.03408,0.082122,-0.033235,0.043809,0.006419,-0.102874,0.034654,0.18136,0.024187,0.009319,-0.008791,-0.204814,0.049321,-0.106441,0.021153,-0.007461,-0.021797,0.024696,0.150996,0.135688,-0.040321,-0.030447,0.047498,-0.050681,-0.093079,0.015193,-0.094193,-0.090784,0.136584,0.130653,-0.146277,0.000718,0.007981,-0.116778,-0.024586,-0.005659,0.084167,0.033936,0.048238,0.020066,0.056296,0.025807,-0.058774,-0.022232,-0.036466,0.116304,0.043381,-0.113566,0.00766,-0.027736,0.084799,-0.187572,-0.109295,0.011042,-0.127151,-0.075888,0.021356,0.079585,0.025478,-0.013408,-0.079472,0.12821,0.004822,-0.029085,-0.003305,0.01938,-0.086531,0.022649,-0.033449,-0.025566,-0.039737,-0.020381,0.117777,0.098218,-0.044115,-0.013237,-0.080241,-0.015788,0.187272,0.077214,-0.002124,0.021758,-0.019549,-0.061797,0.223339,-0.041037,-0.023931,0.034293,0.099071,-0.108518,0.053382,0.027428,0.13572,0.100821,-0.040613,-0.037502,0.074776,0.381035,0.087978,0.097046,-0.072669,-0.175758,-0.167479,-0.111243,-0.003511,-0.053624,-0.061003,-0.044033,0.294909,-0.002508,0.044555,8.4e-05,0.048923,0.026238,0.098434,0.074174,-0.15665,0.090625,0.024589,-0.146703,0.05231,-0.161986,-0.064251,-0.03708,0.132006,-0.044512,-0.146552,0.053996,0.053079
862770,-0.682849,0.170309,-0.231356,-0.094217,-0.136815,-0.068679,-0.002768,-0.122273,0.038507,2.028709,-0.135112,-0.101387,-0.057832,0.056614,-0.177816,-0.022841,-0.096315,0.709027,-0.19933,0.033213,-0.039119,0.032293,0.02269,-0.211917,-0.023907,-0.048426,-0.097131,-0.107827,-0.002865,-0.165915,-0.122423,0.068857,-0.159815,-0.068435,0.175638,0.152144,-0.042198,-0.00129,0.013768,0.042586,0.027681,-0.054343,-0.035703,-0.006189,0.072237,-0.083859,-0.051633,0.098153,-0.164706,0.026209,-0.105801,0.238401,0.04886,-0.008619,0.004743,0.050648,-0.081439,-0.068946,0.017485,-0.03899,-0.099314,-0.118938,-0.005817,0.07376,0.242943,-0.117606,0.019671,0.008802,0.100667,-0.06644,0.137499,0.092272,0.116666,-0.049707,-0.01424,0.11097,0.099217,-0.14135,-0.004902,0.231471,-0.114457,0.005726,0.022973,-0.066244,0.006616,-0.086779,0.307084,-0.203609,-0.054981,-0.118228,-0.066743,0.013136,-0.123476,0.007522,0.018734,-0.20533,0.090131,-0.036878,0.0549,0.027547,-0.093224,0.10292,-0.087827,0.020134,0.261152,-1.172634,-0.020813,0.006972,-0.064017,-0.035322,-0.009502,-0.219104,-0.016765,-0.206596,-0.112912,0.053294,0.031006,-0.183848,-0.110365,-0.121161,0.057413,0.073227,0.011546,-0.181135,-0.085888,-0.043857,0.004942,-0.060996,-0.107527,-0.001543,-0.055558,0.004822,0.038277,0.07714,0.074018,-0.112878,-0.181409,0.026935,-0.025865,-0.071927,-1.616935,0.198859,0.207191,-0.024448,-0.091263,-0.035404,-0.033805,0.053571,0.06093,-0.034319,0.000552,0.148003,0.042058,0.065134,0.015708,-0.025825,-0.117955,0.066899,0.031643,-0.182974,-0.008936,-0.057417,0.103808,0.075194,-0.064962,-0.068457,0.103667,0.010154,0.016475,-0.095694,-0.063738,-0.097768,0.045053,-0.064301,0.157072,0.004769,-0.14981,0.102744,0.124299,-0.005614,0.012693,-0.05933,-0.275666,0.137123,-0.095945,0.071269,-0.016135,0.008665,0.101784,0.150139,0.240777,-0.070375,-0.026369,0.021059,-0.103141,-0.068717,-0.039507,-0.080657,-0.114225,0.144231,0.082233,-0.094612,0.01241,0.01293,-0.044801,-0.010337,0.029677,0.065439,0.072764,0.08832,0.028538,0.094973,0.015604,-0.056339,0.010292,-0.055299,0.061106,0.075525,-0.190454,-0.107463,-0.029405,0.074189,-0.189891,-0.157183,0.053969,-0.159083,-0.074799,0.036963,0.027582,-0.019515,0.01208,-0.103504,0.165947,0.009393,0.039939,0.05942,0.13162,-0.043044,-0.016938,-0.034947,0.064932,0.009064,0.013792,0.222273,0.090078,-0.09888,-0.014916,-0.182763,-0.019419,0.259444,-0.004355,0.001964,0.058178,-0.064742,-0.053085,0.186741,-0.075461,-0.063339,0.070639,0.15446,-0.208072,0.059682,0.01961,0.1101,0.175815,-0.023355,-0.090083,0.097967,0.37402,0.148482,0.189816,-0.110123,-0.107294,-0.240783,-0.088121,0.045111,-0.028933,-0.03514,-0.009299,0.264282,-0.040687,-0.038236,-0.023089,0.034577,0.045744,0.112811,0.111821,-0.214316,0.072317,-0.043734,-0.195971,0.073442,-0.125935,-0.047837,-0.009339,0.236287,-0.04877,-0.142264,0.024141,0.030714


In [38]:
des_vectors_df[des_vectors_df.isna().any(axis=1)]

Unnamed: 0,description V_1,description V_2,description V_3,description V_4,description V_5,description V_6,description V_7,description V_8,description V_9,description V_10,description V_11,description V_12,description V_13,description V_14,description V_15,description V_16,description V_17,description V_18,description V_19,description V_20,description V_21,description V_22,description V_23,description V_24,description V_25,description V_26,description V_27,description V_28,description V_29,description V_30,description V_31,description V_32,description V_33,description V_34,description V_35,description V_36,description V_37,description V_38,description V_39,description V_40,description V_41,description V_42,description V_43,description V_44,description V_45,description V_46,description V_47,description V_48,description V_49,description V_50,description V_51,description V_52,description V_53,description V_54,description V_55,description V_56,description V_57,description V_58,description V_59,description V_60,description V_61,description V_62,description V_63,description V_64,description V_65,description V_66,description V_67,description V_68,description V_69,description V_70,description V_71,description V_72,description V_73,description V_74,description V_75,description V_76,description V_77,description V_78,description V_79,description V_80,description V_81,description V_82,description V_83,description V_84,description V_85,description V_86,description V_87,description V_88,description V_89,description V_90,description V_91,description V_92,description V_93,description V_94,description V_95,description V_96,description V_97,description V_98,description V_99,description V_100,description V_101,description V_102,description V_103,description V_104,description V_105,description V_106,description V_107,description V_108,description V_109,description V_110,description V_111,description V_112,description V_113,description V_114,description V_115,description V_116,description V_117,description V_118,description V_119,description V_120,description V_121,description V_122,description V_123,description V_124,description V_125,description V_126,description V_127,description V_128,description V_129,description V_130,description V_131,description V_132,description V_133,description V_134,description V_135,description V_136,description V_137,description V_138,description V_139,description V_140,description V_141,description V_142,description V_143,description V_144,description V_145,description V_146,description V_147,description V_148,description V_149,description V_150,description V_151,description V_152,description V_153,description V_154,description V_155,description V_156,description V_157,description V_158,description V_159,description V_160,description V_161,description V_162,description V_163,description V_164,description V_165,description V_166,description V_167,description V_168,description V_169,description V_170,description V_171,description V_172,description V_173,description V_174,description V_175,description V_176,description V_177,description V_178,description V_179,description V_180,description V_181,description V_182,description V_183,description V_184,description V_185,description V_186,description V_187,description V_188,description V_189,description V_190,description V_191,description V_192,description V_193,description V_194,description V_195,description V_196,description V_197,description V_198,description V_199,description V_200,description V_201,description V_202,description V_203,description V_204,description V_205,description V_206,description V_207,description V_208,description V_209,description V_210,description V_211,description V_212,description V_213,description V_214,description V_215,description V_216,description V_217,description V_218,description V_219,description V_220,description V_221,description V_222,description V_223,description V_224,description V_225,description V_226,description V_227,description V_228,description V_229,description V_230,description V_231,description V_232,description V_233,description V_234,description V_235,description V_236,description V_237,description V_238,description V_239,description V_240,description V_241,description V_242,description V_243,description V_244,description V_245,description V_246,description V_247,description V_248,description V_249,description V_250,description V_251,description V_252,description V_253,description V_254,description V_255,description V_256,description V_257,description V_258,description V_259,description V_260,description V_261,description V_262,description V_263,description V_264,description V_265,description V_266,description V_267,description V_268,description V_269,description V_270,description V_271,description V_272,description V_273,description V_274,description V_275,description V_276,description V_277,description V_278,description V_279,description V_280,description V_281,description V_282,description V_283,description V_284,description V_285,description V_286,description V_287,description V_288,description V_289,description V_290,description V_291,description V_292,description V_293,description V_294,description V_295,description V_296,description V_297,description V_298,description V_299,description V_300


## Storage of Precomputed Tables

The proposed recommendation system pipeline relies on the availability of precomputed, normalized tables. <br>
This approach is particularly beneficial due to the time-consuming nature of selecting English-only content and  Spacy doc to vectors. 

In order to efficiently utilize these tables, they can be computed beforehand and stored alongside the standard Steam tables provided. This will facilitate seamless incorporation of updated scores whenever any changes occur within the dataset. 

For the purpose of storing and retrieving these tables, the Pickle library will be utilized within the pipeline.

In [39]:
stored_tables = {'scores_df': scores_df, 
                 'dev_df': dev_df, 
                 'pub_df': pub_df,
                 'tag_df': tag_df,
                 'gen_df': gen_df, 
                 'cat_df': cat_df,
                 'rating_df': rating_df,
                 'date_df': date_df,
                 'des_vectors_df': des_vectors_df,
                 'name_df': name_df,
                 'title': df['name'],
                 'rel_date': df['release_date']}


with open('steam_eng_tables', 'wb') as f:
    pickle.dump(stored_tables, f)

# Pipeline

In [40]:
import numpy as np
import pandas as pd
import pickle

with open('steam_eng_tables', 'rb') as f:
    stored_tables = pickle.load(f)

## Calculating Similarity Scores

Functions to calculate similarity scores were created for each DataFrame using different methods based on the nature of the information stored:

- **Developers and Publishers DataFrames**: We calculated a score based on the number of common developers or publishers between the target game and the compared game, taking into account the total number of developers or publishers for each. A score of 1 is given if all developers or publishers are the same, 0.8 if at least half of the developers or publishers are the same, 0.5 if there is at least one common developer or publisher, and 0 otherwise. 


- **Tags, Genres, and Categories DataFrames**: We used the Jaccard similarity index to calculate a score based on the similarity of the tags, genres, or categories between the target game and the compared game. Having a multilabel list of tags and genres fits well with Jaccard similarity because it is a measure designed to compare sets, making it a suitable choice for situations where data is in the form of unordered lists.


- **Description and Tags DataFrames**: The cosine similarity is used to calculate a score based on the similarity of the game descriptions between the target game and the compared game. Having a large number of standardized continuous values between 0 and 1 or -1 and 1 fits well with cosine similarity because it effectively captures the angle between two vectors, providing a meaningful measure of similarity of the vectors.


- **Release Date DataFrame**: We used a simple score based on the proximity of the release date of the target game and the compared game. A smoothing factor has been added, applying the power of a float number between 0 and 1 to the original score (between 0 and 1) to smooth the values. This approach ensures that small differences in release dates do not lead to a significant impact on the overall similarity score.


- **Rating DataFrame**: The rating DataFrame stayed the same because it had been normalized previously, and the weighted rating already provides a meaningful metric for comparison.


In [41]:
def calculate_dev_score(game_devs, other_game_devs):
    intersection = set(game_devs) & set(other_game_devs)
    num_common_devs = len(intersection)
    num_devs = len(game_devs)
    num_other_dev = len(other_game_devs)
    
    if num_common_devs == num_devs and num_common_devs == num_other_dev:
        return 1
    elif num_common_devs > 0 and num_common_devs >= (num_devs / 2):
        return 0.8
    elif num_common_devs > 0:
        return 0.5
    else:
        return 0

def get_dev_scores(df,game_index):
    target = df.loc[game_index][0]
    dev_scores = dev_df.iloc[:,0].apply(lambda x: calculate_dev_score(target, x))
    df.columns = ['dev_score']
    return dev_scores.to_frame()

get_dev_scores(dev_df, 10).T



Unnamed: 0,10,20,30,40,50,60,70,80,130,220,240,280,300,320,340,360,380,400,420,440,500,550,570,620,630,730,1002,1200,1250,1300,1500,1510,1520,1530,1600,1610,1630,1640,1670,1690,1700,1840,1900,1930,2100,2200,2210,2270,2280,2290,2300,2310,2320,2330,2340,2350,2360,2370,2390,2400,2450,2500,2520,2590,2600,2610,2620,2630,2640,2710,2720,2780,2800,2810,2820,2840,2850,2870,2900,2910,2920,2990,3010,3020,3050,3130,3170,3230,3260,3270,3330,3410,3480,3483,3490,3500,3510,3520,3530,3540,3560,3570,3580,3590,3600,3610,3620,3700,3710,3720,3730,3800,3810,3820,3830,3900,3910,3920,3960,3980,3990,4000,4100,4230,4290,4300,4420,4460,4470,4500,4520,4530,4560,4570,4580,4700,4720,4760,4770,4780,4800,4850,4870,4880,4890,4900,4920,6000,6010,6020,6030,6040,6060,6080,6090,6120,6200,6210,6220,6250,6270,6300,6310,6370,6400,6420,6510,6550,6800,6810,6830,6840,6850,6860,6870,6880,6900,6910,6920,6980,7000,7010,7020,7110,7200,7210,7220,7260,7340,7510,7520,7530,7600,7610,7620,7650,7660,7730,7760,7770,...,1043480,1043500,1043510,1043560,1043580,1043610,1043680,1043730,1043740,1043890,1044170,1044200,1044240,1044340,1044350,1044450,1044530,1044630,1044640,1044770,1044830,1044840,1044920,1044950,1045020,1045080,1045220,1045300,1045530,1045740,1045850,1045930,1046030,1046070,1046110,1046230,1046240,1046330,1046370,1046430,1046490,1046530,1046560,1046670,1046750,1046820,1047120,1047140,1047160,1047190,1047240,1047670,1047720,1047780,1047910,1047960,1048000,1048040,1048080,1048100,1048170,1048260,1048410,1048470,1048570,1048640,1048830,1048850,1048920,1048960,1049070,1049080,1049090,1049140,1049230,1049270,1049370,1049420,1049660,1049680,1049730,1049800,1049840,1049910,1049930,1049950,1050010,1050150,1050190,1050210,1050230,1050240,1050430,1050470,1050690,1050730,1050760,1050870,1051130,1051160,1051170,1051250,1051280,1051310,1051500,1051530,1051810,1051830,1051840,1051890,1052010,1052070,1052220,1052480,1052760,1052850,1052870,1052900,1053040,1053060,1053090,1053160,1053190,1053250,1053300,1053650,1053660,1053680,1053730,1053740,1053780,1053960,1054240,1054250,1054560,1054650,1054750,1054790,1054900,1054930,1054980,1055000,1055090,1055140,1055430,1055620,1055690,1055770,1055890,1055970,1055990,1056260,1056470,1056660,1057180,1057390,1057420,1057430,1057460,1057500,1057660,1057690,1057710,1058150,1058350,1058430,1058590,1058660,1058910,1058930,1058940,1059090,1059190,1059280,1059500,1059710,1060030,1060110,1060170,1060300,1060440,1060770,1060870,1061230,1061470,1062120,1062670,1062880,1063060,1063230,1063560,1064060,1064580,1064890,1065160,1065230,1065570,1065650,1066700,1069460
dev_list,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [42]:
def calculate_pub_score(game_pubs, other_game_pubs):
    intersection = set(game_pubs) & set(other_game_pubs)
    num_common_pubs = len(intersection)
    num_pubs = len(game_pubs)
    num_other_pub = len(other_game_pubs)
    
    if num_common_pubs == num_pubs and num_common_pubs == num_other_pub:
        return 1
    elif num_common_pubs > 0 and num_common_pubs >= (num_pubs / 2):
        return 0.8
    elif num_common_pubs > 0:
        return 0.5
    else:
        return 0

def get_pub_scores(df,game_index):
    target = df.loc[game_index][0]
    pub_scores = pub_df.iloc[:,0].apply(lambda x: calculate_pub_score(target, x))
    df.columns = ['pub_score']
    return pub_scores.to_frame()

get_pub_scores(pub_df, 10).T

Unnamed: 0,10,20,30,40,50,60,70,80,130,220,240,280,300,320,340,360,380,400,420,440,500,550,570,620,630,730,1002,1200,1250,1300,1500,1510,1520,1530,1600,1610,1630,1640,1670,1690,1700,1840,1900,1930,2100,2200,2210,2270,2280,2290,2300,2310,2320,2330,2340,2350,2360,2370,2390,2400,2450,2500,2520,2590,2600,2610,2620,2630,2640,2710,2720,2780,2800,2810,2820,2840,2850,2870,2900,2910,2920,2990,3010,3020,3050,3130,3170,3230,3260,3270,3330,3410,3480,3483,3490,3500,3510,3520,3530,3540,3560,3570,3580,3590,3600,3610,3620,3700,3710,3720,3730,3800,3810,3820,3830,3900,3910,3920,3960,3980,3990,4000,4100,4230,4290,4300,4420,4460,4470,4500,4520,4530,4560,4570,4580,4700,4720,4760,4770,4780,4800,4850,4870,4880,4890,4900,4920,6000,6010,6020,6030,6040,6060,6080,6090,6120,6200,6210,6220,6250,6270,6300,6310,6370,6400,6420,6510,6550,6800,6810,6830,6840,6850,6860,6870,6880,6900,6910,6920,6980,7000,7010,7020,7110,7200,7210,7220,7260,7340,7510,7520,7530,7600,7610,7620,7650,7660,7730,7760,7770,...,1043480,1043500,1043510,1043560,1043580,1043610,1043680,1043730,1043740,1043890,1044170,1044200,1044240,1044340,1044350,1044450,1044530,1044630,1044640,1044770,1044830,1044840,1044920,1044950,1045020,1045080,1045220,1045300,1045530,1045740,1045850,1045930,1046030,1046070,1046110,1046230,1046240,1046330,1046370,1046430,1046490,1046530,1046560,1046670,1046750,1046820,1047120,1047140,1047160,1047190,1047240,1047670,1047720,1047780,1047910,1047960,1048000,1048040,1048080,1048100,1048170,1048260,1048410,1048470,1048570,1048640,1048830,1048850,1048920,1048960,1049070,1049080,1049090,1049140,1049230,1049270,1049370,1049420,1049660,1049680,1049730,1049800,1049840,1049910,1049930,1049950,1050010,1050150,1050190,1050210,1050230,1050240,1050430,1050470,1050690,1050730,1050760,1050870,1051130,1051160,1051170,1051250,1051280,1051310,1051500,1051530,1051810,1051830,1051840,1051890,1052010,1052070,1052220,1052480,1052760,1052850,1052870,1052900,1053040,1053060,1053090,1053160,1053190,1053250,1053300,1053650,1053660,1053680,1053730,1053740,1053780,1053960,1054240,1054250,1054560,1054650,1054750,1054790,1054900,1054930,1054980,1055000,1055090,1055140,1055430,1055620,1055690,1055770,1055890,1055970,1055990,1056260,1056470,1056660,1057180,1057390,1057420,1057430,1057460,1057500,1057660,1057690,1057710,1058150,1058350,1058430,1058590,1058660,1058910,1058930,1058940,1059090,1059190,1059280,1059500,1059710,1060030,1060110,1060170,1060300,1060440,1060770,1060870,1061230,1061470,1062120,1062670,1062880,1063060,1063230,1063560,1064060,1064580,1064890,1065160,1065230,1065570,1065650,1066700,1069460
pub_list,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [43]:
# normalize the vector lenghts to 1, needed to calculate cosine similarity
def normalize_df(df):
    norm = np.linalg.norm(df, axis=1)[:, np.newaxis]
    return df / norm

# calculate cosine similarity ongly for the target row, instead of the whole matrix
def single_row_cosine_similarity(df, target_row, title):
    normalized_df = normalize_df(df)
    target_row_normalized = normalized_df.loc[target_row]
    cosine_similarities = normalized_df @ target_row_normalized
    
    return pd.DataFrame(cosine_similarities, columns = [title])


display(single_row_cosine_similarity(tag_df, 10, 'tag_score').T)
display(single_row_cosine_similarity(des_vectors_df, 10, 'des_score').T)

appid,10,20,30,40,50,60,70,80,130,220,240,280,300,320,340,360,380,400,420,440,500,550,570,620,630,730,1002,1200,1250,1300,1309,1313,1500,1510,1520,1530,1600,1610,1630,1640,1670,1690,1700,1840,1900,1930,2100,2200,2210,2270,2280,2290,2300,2310,2320,2330,2340,2350,2360,2370,2390,2400,2420,2450,2500,2520,2540,2570,2590,2600,2610,2620,2630,2640,2700,2710,2720,2780,2800,2810,2820,2840,2850,2870,2900,2910,2920,2990,3010,3020,3050,3130,3170,3230,3260,3270,3300,3310,3320,3330,3340,3350,3360,3380,3390,3400,3410,3420,3430,3440,3450,3460,3480,3483,3490,3500,3510,3520,3530,3540,3560,3570,3580,3590,3600,3610,3620,3700,3710,3720,3730,3800,3810,3820,3830,3900,3910,3920,3960,3980,3990,4000,4100,4230,4290,4300,4420,4460,4470,4500,4520,4530,4560,4570,4580,4700,4720,4760,4770,4780,4800,4850,4870,4880,4890,4900,4920,6000,6010,6020,6030,6040,6060,6080,6090,6120,6200,6210,6220,6250,6270,6300,6310,6370,6400,6420,6510,6550,6600,6800,6810,6830,6840,6850,6860,6870,6880,6900,6910,6920,...,1044340,1044350,1044450,1044530,1044630,1044640,1044770,1044830,1044840,1044920,1044950,1045010,1045020,1045080,1045130,1045140,1045220,1045300,1045530,1045650,1045740,1045850,1045930,1046030,1046070,1046110,1046230,1046240,1046330,1046370,1046430,1046490,1046530,1046560,1046670,1046750,1046770,1046820,1047120,1047140,1047160,1047190,1047240,1047670,1047680,1047720,1047780,1047910,1047960,1048000,1048040,1048080,1048100,1048170,1048260,1048320,1048410,1048470,1048570,1048640,1048830,1048850,1048920,1048960,1049040,1049070,1049080,1049090,1049140,1049230,1049270,1049370,1049420,1049660,1049680,1049730,1049800,1049840,1049910,1049930,1049950,1050010,1050150,1050190,1050210,1050230,1050240,1050430,1050470,1050690,1050730,1050760,1050870,1051130,1051160,1051170,1051250,1051280,1051310,1051500,1051530,1051810,1051830,1051840,1051890,1052010,1052070,1052220,1052480,1052760,1052850,1052870,1052900,1053040,1053060,1053090,1053160,1053190,1053250,1053300,1053650,1053660,1053680,1053730,1053740,1053780,1053960,1054240,1054250,1054560,1054650,1054750,1054790,1054900,1054930,1054980,1055000,1055090,1055140,1055430,1055620,1055690,1055770,1055890,1055970,1055990,1056010,1056260,1056470,1056500,1056660,1056710,1057180,1057390,1057420,1057430,1057460,1057500,1057660,1057690,1057710,1058000,1058150,1058350,1058430,1058590,1058660,1058910,1058930,1058940,1059090,1059190,1059280,1059500,1059710,1060030,1060110,1060170,1060300,1060440,1060770,1060870,1061230,1061470,1062120,1062240,1062670,1062880,1063060,1063230,1063560,1064060,1064580,1064890,1065160,1065230,1065570,1065650,1066700,1069460
tag_score,1.0,0.891156,0.776308,0.904292,0.690284,0.792197,0.726906,0.931129,0.665625,0.640264,0.916603,0.695079,0.767209,0.814604,0.639661,0.862575,0.594849,0.254944,0.636142,0.602868,0.598991,0.557709,0.282683,0.240848,0.396257,0.839739,0.060527,0.689954,0.559951,0.711145,0.702534,0.702534,0.092829,0.070651,0.168675,0.145357,0.098314,0.051564,0.11766,0.16696,0.055965,0.077115,0.237902,0.09098,0.104958,0.179465,0.405826,0.818104,0.754705,0.722644,0.674439,0.754617,0.708495,0.778428,0.768244,0.801619,0.796759,0.860224,0.685159,0.78465,0.72819,0.508267,0.394593,0.768205,0.462547,0.0,0.199774,0.0,0.722086,0.179777,0.357376,0.735132,0.727403,0.732388,0.074445,0.46261,0.383483,0.696776,0.116421,0.084983,0.098045,0.076177,0.095356,0.143625,0.055868,0.065379,0.057562,0.297489,0.07901,0.587035,0.0,0.055837,0.047751,0.062019,0.023881,0.79677,0.0,0.0,0.171936,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.138078,0.0,0.0,0.0,0.0,0.0,0.027558,0.080214,0.0,0.0,0.0,0.0,0.0,0.0,0.036365,0.0,0.0,0.062307,0.0,0.0,0.292503,0.512285,0.724974,0.058072,0.710309,0.446946,0.361765,0.359216,0.212164,0.152255,0.176574,0.125584,0.067786,0.051764,0.06958,0.375346,0.0,0.0,0.0,0.387927,0.071383,0.046246,0.076032,0.403727,0.476985,0.467016,0.336281,0.22221,0.221519,0.21342,0.406992,0.226232,0.099047,0.137993,0.056085,0.270808,0.06541,0.065355,0.062641,0.0,0.683877,0.654513,0.12687,0.551265,0.639317,0.101432,0.616421,0.251049,0.58454,0.319625,0.1449,0.054187,0.230988,0.315484,0.0,0.0,0.10295,0.372578,0.389432,0.157198,0.408609,0.348775,0.0,0.39079,0.447314,0.412948,0.383464,0.443492,0.35464,0.449777,0.474605,0.43254,0.365199,0.451944,...,0.049884,0.0,0.0,0.0,0.0,0.0,0.313318,0.405186,0.230858,0.269242,0.0,0.239286,0.0,0.120651,0.0,0.0,0.341589,0.068003,0.0,0.08797,0.0,0.0,0.0,0.037822,0.346326,0.0,0.0,0.385622,0.335769,0.0,0.330833,0.336082,0.291193,0.0,0.0,0.345946,0.0,0.0,0.0,0.264615,0.0,0.0,0.035159,0.23787,0.293063,0.0,0.57302,0.322645,0.0,0.0,0.260533,0.06642,0.23756,0.0,0.335844,0.204374,0.0,0.036448,0.0,0.254948,0.198825,0.0,0.0,0.2843,0.0,0.0,0.0,0.110498,0.0,0.0,0.0,0.258559,0.0,0.266951,0.330833,0.0,0.289811,0.246862,0.035575,0.0,0.351527,0.0,0.171247,0.0,0.0,0.325823,0.0,0.0,0.0,0.0,0.321213,0.330833,0.335844,0.325388,0.289899,0.242648,0.416006,0.047327,0.283023,0.038636,0.149316,0.212818,0.068855,0.43992,0.346326,0.049723,0.002388,0.0,0.0,0.011584,0.0,0.02938,0.29332,0.116428,0.47306,0.0,0.025955,0.36271,0.0,0.0,0.352223,0.19975,0.003882,0.336082,0.126195,0.2271,0.38637,0.0,0.0,0.0,0.0,0.0,0.335844,0.0,0.01536,0.053072,0.341089,0.04038,0.039211,0.0,0.356388,0.0,0.341314,0.036076,0.041242,0.011095,0.0,0.0,0.320078,0.0,0.0,0.0,0.228886,0.405186,0.389447,0.155952,0.0,0.0,0.069004,0.049944,0.1051,0.0,0.0,0.0,0.008012,0.070318,0.296561,0.039279,0.014396,0.0,0.304668,0.187702,0.0,0.0,0.0,0.405186,0.311302,0.0,0.049282,0.0,0.344758,0.0,0.266099,0.0,0.0,0.237425,0.0,0.289899,0.270747,0.064666,0.239978,0.0,0.09826,0.0,0.065322,0.0,0.454207,0.330833,0.0,0.0


Unnamed: 0,10,20,30,40,50,60,70,80,130,220,240,280,300,320,340,360,380,400,420,440,500,550,570,620,630,730,1002,1200,1250,1300,1500,1510,1520,1530,1600,1610,1630,1640,1670,1690,1700,1840,1900,1930,2100,2200,2210,2270,2280,2290,2300,2310,2320,2330,2340,2350,2360,2370,2390,2400,2450,2500,2520,2590,2600,2610,2620,2630,2640,2710,2720,2780,2800,2810,2820,2840,2850,2870,2900,2910,2920,2990,3010,3020,3050,3130,3170,3230,3260,3270,3330,3410,3480,3483,3490,3500,3510,3520,3530,3540,3560,3570,3580,3590,3600,3610,3620,3700,3710,3720,3730,3800,3810,3820,3830,3900,3910,3920,3960,3980,3990,4000,4100,4230,4290,4300,4420,4460,4470,4500,4520,4530,4560,4570,4580,4700,4720,4760,4770,4780,4800,4850,4870,4880,4890,4900,4920,6000,6010,6020,6030,6040,6060,6080,6090,6120,6200,6210,6220,6250,6270,6300,6310,6370,6400,6420,6510,6550,6800,6810,6830,6840,6850,6860,6870,6880,6900,6910,6920,6980,7000,7010,7020,7110,7200,7210,7220,7260,7340,7510,7520,7530,7600,7610,7620,7650,7660,7730,7760,7770,...,1043480,1043500,1043510,1043560,1043580,1043610,1043680,1043730,1043740,1043890,1044170,1044200,1044240,1044340,1044350,1044450,1044530,1044630,1044640,1044770,1044830,1044840,1044920,1044950,1045020,1045080,1045220,1045300,1045530,1045740,1045850,1045930,1046030,1046070,1046110,1046230,1046240,1046330,1046370,1046430,1046490,1046530,1046560,1046670,1046750,1046820,1047120,1047140,1047160,1047190,1047240,1047670,1047720,1047780,1047910,1047960,1048000,1048040,1048080,1048100,1048170,1048260,1048410,1048470,1048570,1048640,1048830,1048850,1048920,1048960,1049070,1049080,1049090,1049140,1049230,1049270,1049370,1049420,1049660,1049680,1049730,1049800,1049840,1049910,1049930,1049950,1050010,1050150,1050190,1050210,1050230,1050240,1050430,1050470,1050690,1050730,1050760,1050870,1051130,1051160,1051170,1051250,1051280,1051310,1051500,1051530,1051810,1051830,1051840,1051890,1052010,1052070,1052220,1052480,1052760,1052850,1052870,1052900,1053040,1053060,1053090,1053160,1053190,1053250,1053300,1053650,1053660,1053680,1053730,1053740,1053780,1053960,1054240,1054250,1054560,1054650,1054750,1054790,1054900,1054930,1054980,1055000,1055090,1055140,1055430,1055620,1055690,1055770,1055890,1055970,1055990,1056260,1056470,1056660,1057180,1057390,1057420,1057430,1057460,1057500,1057660,1057690,1057710,1058150,1058350,1058430,1058590,1058660,1058910,1058930,1058940,1059090,1059190,1059280,1059500,1059710,1060030,1060110,1060170,1060300,1060440,1060770,1060870,1061230,1061470,1062120,1062670,1062880,1063060,1063230,1063560,1064060,1064580,1064890,1065160,1065230,1065570,1065650,1066700,1069460
des_score,1.0,0.961512,0.945795,0.934382,0.943008,0.960876,0.957626,0.921382,0.939103,0.95993,0.921551,0.927461,0.94593,0.90659,0.898946,0.914862,0.911405,0.95099,0.931475,0.970041,0.962765,0.967606,0.970802,0.963702,0.954715,0.934344,0.943194,0.945811,0.961023,0.978808,0.925389,0.951446,0.974835,0.969522,0.906787,0.950787,0.940282,0.957382,0.963802,0.962142,0.960466,0.946349,0.942822,0.971871,0.967026,0.960562,0.95777,0.955703,0.950295,0.952328,0.958971,0.909103,0.969787,0.924597,0.95802,0.960333,0.948411,0.928311,0.962551,0.973973,0.969066,0.964309,0.942311,0.96016,0.948947,0.962135,0.966577,0.976587,0.951485,0.958576,0.97249,0.965662,0.949014,0.94475,0.957397,0.965393,0.956783,0.975227,0.930475,0.965291,0.955009,0.947873,0.961526,0.952512,0.950183,0.944986,0.962261,0.969044,0.975305,0.953375,0.962711,0.958304,0.961449,0.955212,0.956311,0.942178,0.958476,0.926531,0.942921,0.961367,0.968038,0.950098,0.955279,0.952408,0.960756,0.948867,0.952486,0.961579,0.956144,0.967617,0.943873,0.957578,0.950428,0.939503,0.947423,0.962539,0.930562,0.963361,0.959207,0.966648,0.959322,0.975176,0.959713,0.950691,0.910562,0.94832,0.963569,0.963666,0.958741,0.960476,0.970293,0.965641,0.974327,0.967185,0.966052,0.95497,0.965087,0.949277,0.928205,0.944663,0.958683,0.950566,0.925063,0.954232,0.95542,0.946145,0.964523,0.974514,0.957152,0.957979,0.946351,0.958036,0.948665,0.962144,0.947758,0.952939,0.960638,0.951802,0.945193,0.974822,0.950378,0.964237,0.950164,0.972334,0.963421,0.958661,0.958819,0.954291,0.96682,0.96698,0.953282,0.951441,0.975359,0.967495,0.958691,0.970512,0.971294,0.965467,0.964162,0.956259,0.960949,0.96019,0.970371,0.973522,0.956718,0.95578,0.952817,0.959254,0.945291,0.960193,0.967427,0.958603,0.962634,0.940812,0.955779,0.94324,0.94078,0.97287,0.961719,0.936183,...,0.946043,0.94297,0.949489,0.971296,0.96426,0.966926,0.97363,0.957489,0.94,0.970431,0.950728,0.972359,0.828483,0.936045,0.963522,0.918857,0.957226,0.941465,0.960446,0.955229,0.955674,0.960985,0.969218,0.94959,0.924427,0.958533,0.957792,0.953473,0.959179,0.940672,0.942549,0.963387,0.974729,0.951037,0.934142,0.964091,0.964987,0.967064,0.932189,0.952246,0.939736,0.96545,0.953909,0.965528,0.966015,0.958385,0.940388,0.947768,0.950597,0.962264,0.96,0.941808,0.930306,0.956229,0.937603,0.912688,0.961058,0.943022,0.966474,0.959943,0.921762,0.944196,0.954443,0.949174,0.940707,0.953197,0.961353,0.948041,0.934947,0.973124,0.965572,0.934495,0.939686,0.969127,0.952988,0.961716,0.957197,0.960871,0.962724,0.938512,0.960901,0.964545,0.947015,0.968496,0.959844,0.974989,0.615458,0.930237,0.952555,0.971246,0.95678,0.944006,0.930249,0.948925,0.939268,0.959918,0.962569,0.90985,0.958129,0.952535,0.956871,0.961064,0.9572,0.950524,0.953567,0.961812,0.949973,0.953713,0.965139,0.949604,0.967367,0.968573,0.933704,0.957237,0.957139,0.967361,0.938544,0.95639,0.899245,0.948379,0.965855,0.94801,0.946661,0.947022,0.94445,0.95593,0.961782,0.935555,0.935236,0.969378,0.897731,0.965289,0.93819,0.951056,0.960344,0.927521,0.924759,0.964051,0.962288,0.943448,0.907275,0.961646,0.967157,0.971186,0.955137,0.827653,0.959458,0.952787,0.93116,0.945404,0.966081,0.963893,0.952557,0.945036,0.961546,0.966425,0.965785,0.944068,0.96518,0.909878,0.909632,0.968595,0.94873,0.934,0.962003,0.975232,0.961227,0.961177,0.964887,0.949574,0.952748,0.949352,0.877001,0.965642,0.928912,0.962623,0.940005,0.944509,0.966472,0.96168,0.935264,0.958787,0.951599,0.958628,0.959649,0.950458,0.963108,0.958878,0.962332,0.944474,0.941093,0.961861,0.929656,0.960278,0.963284,0.960613,0.959668,0.93736,0.965141,0.965985


In [44]:
# funtion that calculate the Jaccard similarity
def jaccard_similarity(target, row2):
    intersection = len(set(target)&set(row2))
    union = len(set(target).union(set(row2)))
    return intersection / union

# Calculate the Jaccard similarity for one row instead of the whole matrix
def single_row_jaccard_similarity(df, target_row, title):
    similarities = df.iloc[:,0].apply(lambda x: jaccard_similarity(
                                                df.loc[target_row][0], x))
    similarities = similarities.to_frame()
    similarities.columns = [title]
    return similarities

display(single_row_jaccard_similarity(gen_df, 10, 'gen_score').T)
display(single_row_jaccard_similarity(cat_df, 10, 'cat_score').T)


Unnamed: 0,10,20,30,40,50,60,70,80,130,220,240,280,300,320,340,360,380,400,420,440,500,550,570,620,630,730,1002,1200,1250,1300,1500,1510,1520,1530,1600,1610,1630,1640,1670,1690,1700,1840,1900,1930,2100,2200,2210,2270,2280,2290,2300,2310,2320,2330,2340,2350,2360,2370,2390,2400,2450,2500,2520,2590,2600,2610,2620,2630,2640,2710,2720,2780,2800,2810,2820,2840,2850,2870,2900,2910,2920,2990,3010,3020,3050,3130,3170,3230,3260,3270,3330,3410,3480,3483,3490,3500,3510,3520,3530,3540,3560,3570,3580,3590,3600,3610,3620,3700,3710,3720,3730,3800,3810,3820,3830,3900,3910,3920,3960,3980,3990,4000,4100,4230,4290,4300,4420,4460,4470,4500,4520,4530,4560,4570,4580,4700,4720,4760,4770,4780,4800,4850,4870,4880,4890,4900,4920,6000,6010,6020,6030,6040,6060,6080,6090,6120,6200,6210,6220,6250,6270,6300,6310,6370,6400,6420,6510,6550,6800,6810,6830,6840,6850,6860,6870,6880,6900,6910,6920,6980,7000,7010,7020,7110,7200,7210,7220,7260,7340,7510,7520,7530,7600,7610,7620,7650,7660,7730,7760,7770,...,1043480,1043500,1043510,1043560,1043580,1043610,1043680,1043730,1043740,1043890,1044170,1044200,1044240,1044340,1044350,1044450,1044530,1044630,1044640,1044770,1044830,1044840,1044920,1044950,1045020,1045080,1045220,1045300,1045530,1045740,1045850,1045930,1046030,1046070,1046110,1046230,1046240,1046330,1046370,1046430,1046490,1046530,1046560,1046670,1046750,1046820,1047120,1047140,1047160,1047190,1047240,1047670,1047720,1047780,1047910,1047960,1048000,1048040,1048080,1048100,1048170,1048260,1048410,1048470,1048570,1048640,1048830,1048850,1048920,1048960,1049070,1049080,1049090,1049140,1049230,1049270,1049370,1049420,1049660,1049680,1049730,1049800,1049840,1049910,1049930,1049950,1050010,1050150,1050190,1050210,1050230,1050240,1050430,1050470,1050690,1050730,1050760,1050870,1051130,1051160,1051170,1051250,1051280,1051310,1051500,1051530,1051810,1051830,1051840,1051890,1052010,1052070,1052220,1052480,1052760,1052850,1052870,1052900,1053040,1053060,1053090,1053160,1053190,1053250,1053300,1053650,1053660,1053680,1053730,1053740,1053780,1053960,1054240,1054250,1054560,1054650,1054750,1054790,1054900,1054930,1054980,1055000,1055090,1055140,1055430,1055620,1055690,1055770,1055890,1055970,1055990,1056260,1056470,1056660,1057180,1057390,1057420,1057430,1057460,1057500,1057660,1057690,1057710,1058150,1058350,1058430,1058590,1058660,1058910,1058930,1058940,1059090,1059190,1059280,1059500,1059710,1060030,1060110,1060170,1060300,1060440,1060770,1060870,1061230,1061470,1062120,1062670,1062880,1063060,1063230,1063560,1064060,1064580,1064890,1065160,1065230,1065570,1065650,1066700,1069460
gen_score,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.5,1.0,1.0,0.333333,0.5,1.0,0.5,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.333333,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.333333,1.0,0.0,0.0,0.333333,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.5,0.5,0.5,0.333333,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,1.0,0.0,1.0,1.0,0.0,1.0,0.2,1.0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0.333333,0.5,0.5,0.0,0.5,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.5,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.5,0.0,0.0,0.0,0.0,0.333333,0.25,0.5,0.5,0.0,0.333333,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.5,0.166667,0.333333,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.25,1.0,0.0,0.5,0.5,0.5,0.0,0.0,0.5,0.0,0.0,0.333333,0.0,0.0,0.0,0.142857,0.0,1.0,0.333333,0.0,0.0,0.2,0.0,0.2,0.0,0.333333,0.0,0.0,0.0,0.333333,0.142857,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.2,0.333333,0.0,0.25,0.166667,0.0,0.0,0.333333,0.0,0.1,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.2,0.333333,0.333333,0.333333,0.25,0.111111,0.25,0.0,0.333333,0.0,0.0,0.2,0.0,0.333333,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.5,0.0,0.0,1.0,0.0,0.0,0.5,0.2,0.0,0.333333,0.0,0.125,0.5,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.25,0.0,0.333333,0.0,0.0,0.0,0.0,0.333333,0.0,0.2,0.5,0.333333,0.25,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.5,0.2,0.0,0.0,0.0,0.5,0.25,0.0,0.0,0.0,0.333333,0.0,0.2,0.0,0.0,0.0,0.25,0.2,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.333333,0.333333,0.0,0.0


Unnamed: 0,10,20,30,40,50,60,70,80,130,220,240,280,300,320,340,360,380,400,420,440,500,550,570,620,630,730,1002,1200,1250,1300,1500,1510,1520,1530,1600,1610,1630,1640,1670,1690,1700,1840,1900,1930,2100,2200,2210,2270,2280,2290,2300,2310,2320,2330,2340,2350,2360,2370,2390,2400,2450,2500,2520,2590,2600,2610,2620,2630,2640,2710,2720,2780,2800,2810,2820,2840,2850,2870,2900,2910,2920,2990,3010,3020,3050,3130,3170,3230,3260,3270,3330,3410,3480,3483,3490,3500,3510,3520,3530,3540,3560,3570,3580,3590,3600,3610,3620,3700,3710,3720,3730,3800,3810,3820,3830,3900,3910,3920,3960,3980,3990,4000,4100,4230,4290,4300,4420,4460,4470,4500,4520,4530,4560,4570,4580,4700,4720,4760,4770,4780,4800,4850,4870,4880,4890,4900,4920,6000,6010,6020,6030,6040,6060,6080,6090,6120,6200,6210,6220,6250,6270,6300,6310,6370,6400,6420,6510,6550,6800,6810,6830,6840,6850,6860,6870,6880,6900,6910,6920,6980,7000,7010,7020,7110,7200,7210,7220,7260,7340,7510,7520,7530,7600,7610,7620,7650,7660,7730,7760,7770,...,1043480,1043500,1043510,1043560,1043580,1043610,1043680,1043730,1043740,1043890,1044170,1044200,1044240,1044340,1044350,1044450,1044530,1044630,1044640,1044770,1044830,1044840,1044920,1044950,1045020,1045080,1045220,1045300,1045530,1045740,1045850,1045930,1046030,1046070,1046110,1046230,1046240,1046330,1046370,1046430,1046490,1046530,1046560,1046670,1046750,1046820,1047120,1047140,1047160,1047190,1047240,1047670,1047720,1047780,1047910,1047960,1048000,1048040,1048080,1048100,1048170,1048260,1048410,1048470,1048570,1048640,1048830,1048850,1048920,1048960,1049070,1049080,1049090,1049140,1049230,1049270,1049370,1049420,1049660,1049680,1049730,1049800,1049840,1049910,1049930,1049950,1050010,1050150,1050190,1050210,1050230,1050240,1050430,1050470,1050690,1050730,1050760,1050870,1051130,1051160,1051170,1051250,1051280,1051310,1051500,1051530,1051810,1051830,1051840,1051890,1052010,1052070,1052220,1052480,1052760,1052850,1052870,1052900,1053040,1053060,1053090,1053160,1053190,1053250,1053300,1053650,1053660,1053680,1053730,1053740,1053780,1053960,1054240,1054250,1054560,1054650,1054750,1054790,1054900,1054930,1054980,1055000,1055090,1055140,1055430,1055620,1055690,1055770,1055890,1055970,1055990,1056260,1056470,1056660,1057180,1057390,1057420,1057430,1057460,1057500,1057660,1057690,1057710,1058150,1058350,1058430,1058590,1058660,1058910,1058930,1058940,1059090,1059190,1059280,1059500,1059710,1060030,1060110,1060170,1060300,1060440,1060770,1060870,1061230,1061470,1062120,1062670,1062880,1063060,1063230,1063560,1064060,1064580,1064890,1065160,1065230,1065570,1065650,1066700,1069460
cat_score,1.0,1.0,0.333333,1.0,0.25,0.666667,0.5,0.25,0.0,0.0,0.333333,0.0,0.333333,0.333333,0.0,0.333333,0.0,0.0,0.0,0.333333,0.2,0.2,0.25,0.0,0.2,0.333333,0.25,0.333333,0.2,0.0,0.0,0.0,0.25,0.25,0.25,0.25,0.2,0.2,0.25,0.25,0.0,0.0,0.2,0.4,0.25,0.25,0.25,0.0,0.0,0.0,0.25,0.2,0.2,0.25,0.25,0.25,0.0,0.0,0.25,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.25,0.25,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.2,0.0,0.25,0.25,0.25,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.25,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.25,0.0,0.0,0.2,0.0,0.25,0.25,0.0,0.0,0.0,0.25,0.25,0.2,0.2,0.25,0.25,0.25,0.25,0.0,0.25,0.25,0.25,0.25,0.25,0.0,0.0,0.75,0.0,0.666667,0.5,0.0,0.5,0.5,0.0,0.5,0.142857,0.0,0.0,0.0,0.0,0.25,0.25,0.25,0.0,0.0,0.2,0.166667,0.25,0.25,0.0,0.2,0.2,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.2,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.333333,0.0,0.25,0.25,0.0,0.0,0.0,0.25,0.0,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.4,0.0,0.0,0.2,0.0,0.0,0.25,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.25,0.285714,0.0,0.0,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0


In [45]:
# smoting needs to be between 0 and 1 in order to transform the linear value in a curve with slope to top left
def date_score(df, index, smooting):
    target = df.loc[index].values[0]
    df = (1-abs(df-target))**smooting
    df.columns = ['date_score']
    return df
    
date_score(date_df, 10, 0.8).T

Unnamed: 0,10,20,30,40,50,60,70,80,130,220,240,280,300,320,340,360,380,400,420,440,500,550,570,620,630,730,1002,1200,1250,1300,1500,1510,1520,1530,1600,1610,1630,1640,1670,1690,1700,1840,1900,1930,2100,2200,2210,2270,2280,2290,2300,2310,2320,2330,2340,2350,2360,2370,2390,2400,2450,2500,2520,2590,2600,2610,2620,2630,2640,2710,2720,2780,2800,2810,2820,2840,2850,2870,2900,2910,2920,2990,3010,3020,3050,3130,3170,3230,3260,3270,3330,3410,3480,3483,3490,3500,3510,3520,3530,3540,3560,3570,3580,3590,3600,3610,3620,3700,3710,3720,3730,3800,3810,3820,3830,3900,3910,3920,3960,3980,3990,4000,4100,4230,4290,4300,4420,4460,4470,4500,4520,4530,4560,4570,4580,4700,4720,4760,4770,4780,4800,4850,4870,4880,4890,4900,4920,6000,6010,6020,6030,6040,6060,6080,6090,6120,6200,6210,6220,6250,6270,6300,6310,6370,6400,6420,6510,6550,6800,6810,6830,6840,6850,6860,6870,6880,6900,6910,6920,6980,7000,7010,7020,7110,7200,7210,7220,7260,7340,7510,7520,7530,7600,7610,7620,7650,7660,7730,7760,7770,...,1043480,1043500,1043510,1043560,1043580,1043610,1043680,1043730,1043740,1043890,1044170,1044200,1044240,1044340,1044350,1044450,1044530,1044630,1044640,1044770,1044830,1044840,1044920,1044950,1045020,1045080,1045220,1045300,1045530,1045740,1045850,1045930,1046030,1046070,1046110,1046230,1046240,1046330,1046370,1046430,1046490,1046530,1046560,1046670,1046750,1046820,1047120,1047140,1047160,1047190,1047240,1047670,1047720,1047780,1047910,1047960,1048000,1048040,1048080,1048100,1048170,1048260,1048410,1048470,1048570,1048640,1048830,1048850,1048920,1048960,1049070,1049080,1049090,1049140,1049230,1049270,1049370,1049420,1049660,1049680,1049730,1049800,1049840,1049910,1049930,1049950,1050010,1050150,1050190,1050210,1050230,1050240,1050430,1050470,1050690,1050730,1050760,1050870,1051130,1051160,1051170,1051250,1051280,1051310,1051500,1051530,1051810,1051830,1051840,1051890,1052010,1052070,1052220,1052480,1052760,1052850,1052870,1052900,1053040,1053060,1053090,1053160,1053190,1053250,1053300,1053650,1053660,1053680,1053730,1053740,1053780,1053960,1054240,1054250,1054560,1054650,1054750,1054790,1054900,1054930,1054980,1055000,1055090,1055140,1055430,1055620,1055690,1055770,1055890,1055970,1055990,1056260,1056470,1056660,1057180,1057390,1057420,1057430,1057460,1057500,1057660,1057690,1057710,1058150,1058350,1058430,1058590,1058660,1058910,1058930,1058940,1059090,1059190,1059280,1059500,1059710,1060030,1060110,1060170,1060300,1060440,1060770,1060870,1061230,1061470,1062120,1062670,1062880,1063060,1063230,1063560,1064060,1064580,1064890,1065160,1065230,1065570,1065650,1066700,1069460
date_score,1.0,0.941382,0.90752,0.978676,0.963114,1.0,0.926688,0.876035,0.978676,0.848971,0.850538,0.866483,0.62536,0.850538,0.812721,0.79301,0.789713,0.736476,0.736476,0.736476,0.692394,0.651732,0.498678,0.59357,0.624571,0.53676,0.814305,0.798108,0.672749,0.792053,0.823795,0.780868,0.776917,0.698874,0.801821,0.801821,0.785986,0.785986,0.783855,0.7751,0.756979,0.541676,0.796197,0.674299,0.774138,0.74383,0.581195,0.74383,0.74383,0.74383,0.74383,0.74383,0.74383,0.74383,0.74383,0.74383,0.74383,0.74383,0.74383,0.785453,0.613034,0.792265,0.76825,0.733442,0.758269,0.775421,0.775421,0.775421,0.775421,0.719752,0.746205,0.719752,0.784388,0.784388,0.69591,0.615414,0.615414,0.483219,0.774031,0.774031,0.774031,0.768036,0.780761,0.733334,0.696789,0.653741,0.651732,0.743506,0.74102,0.734309,0.780121,0.780121,0.759989,0.739614,0.74804,0.735826,0.729537,0.71048,0.675959,0.69602,0.692284,0.684464,0.682147,0.673746,0.675959,0.66454,0.658979,0.665762,0.678171,0.668426,0.64536,0.778519,0.778519,0.778519,0.775635,0.774138,0.774138,0.824111,0.774138,0.756656,0.756656,0.770393,0.776276,0.770928,0.748363,0.772747,0.758484,0.697777,0.705778,0.758484,0.745666,0.745666,0.745666,0.743398,0.743398,0.771891,0.528658,0.741128,0.741128,0.741128,0.771142,0.619488,0.578666,0.578666,0.578666,0.768786,0.52854,0.66665,0.66665,0.658867,0.658867,0.66665,0.66665,0.66665,0.66665,0.613488,0.768036,0.768036,0.763425,0.759237,0.738532,0.765678,0.753966,0.571409,0.757732,0.747177,0.747608,0.749442,0.759022,0.759022,0.759022,0.759022,0.759022,0.759022,0.759022,0.759022,0.759022,0.757517,0.757517,0.757517,0.757517,0.757517,0.757517,0.760741,0.715938,0.759129,0.759129,0.737017,0.634593,0.674299,0.602013,0.587047,0.753643,0.753643,0.753643,0.753643,0.700519,0.700519,0.700519,0.700519,...,0.228949,0.228078,0.228804,0.22648,0.224879,0.228659,0.228078,0.228368,0.228804,0.225607,0.227933,0.227207,0.228659,0.226916,0.22648,0.228078,0.228659,0.228659,0.228804,0.228078,0.223713,0.22648,0.228804,0.228078,0.224733,0.226625,0.226625,0.227497,0.225753,0.226625,0.228078,0.227788,0.226625,0.227933,0.227788,0.227788,0.225753,0.228223,0.223713,0.227207,0.225462,0.223859,0.226044,0.227933,0.227497,0.226771,0.226044,0.226335,0.227061,0.225898,0.224588,0.227933,0.225753,0.227061,0.224588,0.227643,0.223859,0.226335,0.227788,0.227497,0.227061,0.225753,0.227207,0.227643,0.223859,0.227788,0.225462,0.225607,0.228223,0.225607,0.226044,0.22648,0.226335,0.226044,0.227643,0.224879,0.225607,0.225025,0.223859,0.226771,0.225607,0.227061,0.227788,0.225462,0.227352,0.225607,0.226044,0.224296,0.224296,0.227643,0.227643,0.227497,0.225753,0.227497,0.224733,0.227497,0.226625,0.226771,0.22648,0.22648,0.226044,0.226771,0.224588,0.224442,0.225025,0.224442,0.224442,0.226916,0.224733,0.226044,0.225607,0.224442,0.224879,0.224296,0.226916,0.225607,0.225753,0.226044,0.224733,0.225607,0.226044,0.224442,0.224442,0.225607,0.225607,0.225607,0.22648,0.224733,0.224879,0.224588,0.226771,0.22648,0.22648,0.226625,0.22648,0.224588,0.226044,0.224588,0.225462,0.225607,0.224296,0.224733,0.224588,0.225025,0.224588,0.225462,0.22648,0.225607,0.225607,0.224733,0.225025,0.225753,0.225462,0.224296,0.224879,0.224442,0.224005,0.224879,0.224588,0.224588,0.225607,0.225607,0.223713,0.22517,0.225316,0.224442,0.225025,0.225462,0.224296,0.223713,0.226916,0.224005,0.224879,0.224733,0.225316,0.226771,0.224588,0.224442,0.224005,0.223713,0.224442,0.223713,0.224442,0.224879,0.224442,0.224588,0.224442,0.224588,0.224005,0.224442,0.223713,0.223713,0.223859,0.223859,0.223859,0.223713,0.223859,0.223713,0.224733,0.223713


# Game Recommendation System Functions

In this section, I created five functions to get similar games based on features and scores.

## Function 1: score_tab

This function takes the target game index and the stored dataframes as input, and calculates the scores for each feature, such as developers, publishers, genres, categories, tags, ratings, release dates, and descriptions. It then joins these scores into a single DataFrame and returns it.

## Function 2: get_score

This function combines all the scores from the first function into a single score by assigning different weights to each feature based on their importance. These weights were determined through knowledge of video games, discussions with gamers, and testing. The function then sorts the games by their total score and returns the top 30 similar games.

The weights in this function are designed to give more importance to games released around the same time as the target game. This is based on the assumption that users looking for a specific game might be more interested in other games from the same time period.

## Function 3: get_score_new_games

This function is similar to the previous one but adjusts the weights to give more importance to the release date, specifically favoring newer games. This function was created to ensure that the recommendation system also includes new games for users to discover, regardless of the target game's release date. The other weights in this function were fine-tuned through extensive testing with known games.

## Function 4: get_results

This function combines the results from the `get_score` and `get_score_new_games` functions by selecting the top 4 similar games and 3 random games from the top 30, and then removing any common games from the similar recent games list. It then adds the top 3 new games and 2 random games from the new games list, resulting in a total of 12 recommended games, which is the same number of games Steam displays on their website. The function then shuffles the results randomly and returns them.

## Function 5: get_recommendations

This function takes a game index as input and calls all the previous functions to provide the 12 recommended games in a single line of code. The final recommendations include a mix of similar games, random selections from the top matches, and a focus on both new and older games to provide a diverse set of recommendations for the user.


In [46]:
def score_tab(target,
              scores_df = stored_tables['scores_df'], 
              dev_df = stored_tables['dev_df'],
              pub_df = stored_tables['pub_df'],
              tag_df = stored_tables['tag_df'],
              gen_df = stored_tables['gen_df'],
              cat_df = stored_tables['cat_df'],
              rating_df = stored_tables['rating_df'], 
              date_df = stored_tables['date_df'],
              des_vectors_df = stored_tables['des_vectors_df']):
    
    scores= scores_df.join([
                            get_dev_scores(dev_df, target),
                            get_pub_scores(pub_df, target),
                            single_row_jaccard_similarity(gen_df, target, 'gen_score'),
                            single_row_jaccard_similarity(cat_df, target, 'cat_score'),
                            single_row_cosine_similarity(tag_df,target, 'tag_score'),
                            rating_df,
                            date_score(date_df, target, 0.7),
                            single_row_cosine_similarity(des_vectors_df,target, 'des_score'),
                            ]).drop(target, axis=0)
    return scores


def get_score(scores, top_val=30):
    multipliers = {'dev_score': 5,
                   'pub_score': 6,
                   'gen_score': 7,
                   'cat_score': 8,
                   'tag_score': 10,
                   'weighted_rating': 35,
                   'date_score': 9,
                   'des_score': 20}
    scores = scores.mul(pd.Series(multipliers), axis=1) 

    score = pd.DataFrame(scores.sum(axis=1), columns=['SCORE'])
    score = score.sort_values(by='SCORE', ascending=False)
    
    return score.head(top_val)/100

# elevating by a number bigger then 1 the days_norm will became a ripid curve,
# decreasing the scores of values more distant to 1 
def get_score_new_games(scores, date_df = stored_tables['date_df'], top_val=30):
    scores['date_score'] = (1-date_df['days_norm'])**5
    multipliers = {'dev_score': 6,
                   'pub_score': 7,
                   'gen_score': 5,
                   'cat_score': 6,
                   'tag_score': 10,
                   'weighted_rating': 10,
                   'date_score': 50,
                   'des_score': 1}
    scores = scores.mul(pd.Series(multipliers), axis=1) 
    
    score = pd.DataFrame(scores.sum(axis=1), columns=['SCORE'])
    score = score.sort_values(by='SCORE', ascending=False)
    
    return score.head(top_val)/100


def get_results(score, score_new_games, drop_score=True, add_titles=True, title=stored_tables['title']):
    result = pd.concat([score[:4], 
                        score[4:].sample(3)], 
                        axis=0)

    score_new_games = score_new_games.drop(index=result.index, errors='ignore')

    new_games_results = pd.concat([score_new_games[:3], 
                                   score_new_games[3:].sample(2)],
                                   axis=0)
    
    result = pd.concat([result, new_games_results], axis=0).sample(frac=1)
    if add_titles:
        result = result.join(title)
    
    if drop_score:
        result = result.drop(columns=['SCORE'])
        
    return result

def get_recommendations(target, print_target_game=False, drop_score=True,  add_titles=True):
    if print_target_game:
        print('Target game:', name_df.loc[target][0])
        
    scores = score_tab(target)
    score = get_score(scores)
    score_new_games = get_score_new_games(scores)
    
    return get_results(score, score_new_games, drop_score, add_titles)

In [47]:
# Known games' indexes for test: 10, 45700, 6550, 566050, 289070
# Remove comment to the second target assigment to test on a random game 

target = 289070
# target = df.sample().index[0]

print('Target:', name_df.loc[target][0])
print('index:', target)


Target: Sid Meier’s Civilization® VI
index: 289070


In [48]:
get_recommendations(target)


Unnamed: 0,name
8930,Sid Meier's Civilization® V
1005930,Timeflow – Time and Money Simulator
200510,XCOM: Enemy Unknown
8800,Civilization IV: Beyond the Sword
4700,Total War: MEDIEVAL II – Definitive Edition
965240,Akabeth Tactics
1058590,Franchise Wars
977690,Skyworld: Kingdom Brawl
603850,Age of Civilizations II
48950,Greed Corp


### Example of each step

In [49]:
scores = score_tab(target)
scores.sample(5)

Unnamed: 0,dev_score,pub_score,gen_score,cat_score,tag_score,weighted_rating,date_score,des_score
264560,0.0,0.0,0.0,0.5,0.0,0.679699,0.925683,0.943874
850690,0.0,0.0,0.0,0.5,0.0,0.25,0.946092,0.955254
317940,0.0,0.0,0.0,0.5,0.026965,0.431868,0.952733,0.970279
385350,0.0,0.0,0.333333,0.5,0.320498,0.442105,0.984058,0.969294
316370,0.0,0.0,0.0,0.5,0.0,0.334091,0.969073,0.974595


In [50]:
s = get_score(scores)
s.head().join(name_df)


Unnamed: 0,SCORE,name
8930,0.933661,Sid Meier's Civilization® V
3900,0.872145,Sid Meier's Civilization® IV
200510,0.871319,XCOM: Enemy Unknown
8800,0.848674,Civilization IV: Beyond the Sword
3910,0.843349,Sid Meier's Civilization® III Complete


In [51]:
s_new_games = get_score_new_games(scores)
s_new_games.join(name_df).join(stored_tables['rel_date']).head()

Unnamed: 0,SCORE,name,release_date
1058590,0.700207,Franchise Wars,2019-04-15
965240,0.699047,Akabeth Tactics,2019-04-11
603850,0.6953,Age of Civilizations II,2018-11-21
798510,0.685977,SUPER DRAGON BALL HEROES WORLD MISSION,2019-04-04
1046030,0.685899,ISLANDERS,2019-04-04


In [52]:
get_results(s, s_new_games, drop_score=False, add_titles=True)

Unnamed: 0,SCORE,name
603850,0.6953,Age of Civilizations II
1058590,0.700207,Franchise Wars
3900,0.872145,Sid Meier's Civilization® IV
607050,0.656442,Wargroove
8930,0.933661,Sid Meier's Civilization® V
65980,0.782468,Sid Meier's Civilization®: Beyond Earth™
221380,0.809435,Age of Empires II HD
40970,0.787063,Stronghold Crusader HD
200510,0.871319,XCOM: Enemy Unknown
8800,0.848674,Civilization IV: Beyond the Sword


# Conclusions

In this game recommendation system, a variety of techniques ad been used to find and recommend similar games for users. 

Some of the key techniques and their advantages are:
- Jaccard similarity for comparing categorical features like genres and categories, which is effective in measuring similarity between sets of items.
- Cosine similarity for comparing continuous features like tags and description vectors, which can capture the similarity between items even if they are not identical.
- Weighted scores for combining various features based on their importance, which provides a comprehensive similarity score that takes multiple aspects of the games into account.

This recommendations system acheived comparable results to the ones suggested by Steam on their website and some suggested games are shared. The system's running time is approximately 1.45 seconds when the precomputed tables are provided.

However, there are some limitations, such as time constraints, limited resources, and access to more data that could potentially improve the recommendations.

Possible expansions for this recommendation system include:

- Evaluating user comments on games, although this may be biased due to the tendency of users to write short negative reviews and the potential for sentiment analysis to be misleading for certain genres like horror or gore games.
- Recommending games that are similar to the ones in a registered user's library or most-played list, which would provide a more personalized set of recommendations.
- Recommending games that are similar to those in the libraries of other users who have the target game in their libraries, based on playtime and other data that we did not have access to.
- ncorporating Steam users' libraries analysis to identify clusters of users with similar preferences and recommend games that are popular within these clusters.

By addressing these limitations and implementing the suggested expansions, it would be possible to further improve the accuracy and effectiveness of the game recommendation system, providing users with an even better experience when discovering new games to play.
