# <center>BoardGameGeek Content-based Recommender Model</center>

## Introduction

Gathering a collection exceeding 150,000 board games enables us to construct a recommendation framework relying on the attributes of these games. Within recommendation systems, two primary types are content-based and collaborative filtering. This notebook will concentrate on crafting a content-based recommender model utilizing **Inverse-Document Frequency (IDF)** to identify games with similar mechanics and categories to a specified board game. The objective of this model is to identify board games sharing similar *essence* and *structure*, discerned through the distinctive combination and interaction of their mechanics and categories.

### Methodology - Inverse Document Frequency (IDF)

TF-IDF, widely employed in Natural Language Processing (NLP), gauges the significance of a term in a document relative to a corpus. Term Frequency (TF) quantifies how often a term appears in a document relative to the total number of words in that document. In contrast, Inverse Document Frequency (IDF) evaluates the significance of a term across multiple documents by considering how frequently it occurs in the corpus. A term with high TF in a single document suggests it may be a distinctive keyword for that document. Conversely, a term with low occurrence across other documents indicates its uniqueness in the corpus, resulting in a higher IDF value.


Within the framework of board games, TF can be likened to the distinct mechanics and categories of a particular board game, which are inherently unique. In other words, each board game's mechanics and categories will consistently hold a value of one. Therefore, the IDF value becomes the more crucial metric, indicating the rarity of a mechanic or category across other games. This rarity signifies the distinctiveness of a game's structure in relation to the entirety of a board game collection.

To determine the IDF values of each mechanic and category, the ff formula is used:

$$IDF = \log\frac{N}{df}$$


#### Legend
* N - number of board games in a collection
* df - occurences of a mechanic or category across all board games

#### Notes
* IDF value is high for rare mechanics/categories, low for frequently used mechanics/categories (e.g. Dice Rolling, Card Game)
* All distinct categories and mechanics have a corresponding IDF value (i.e. IDF Vector)

#### For single game recommendations:

1. Calculate the IDF values of all mechanics and categories in the collection.
2. Choose a board game of interest.
3. Get all existing mechanics and categories of that board game as a binary vector.
4. Perform `AND` operation with the binary mechanics and categories vectors in all other games to determine similar games that has at least one unique mechanic or category with the board game of interest. 
5. Multiply the binary vectors of all other games with its corresponding IDF values.
6. Take the sum of IDF values.
    * Score for each game reflects how similar each board game is to the board game of interest.
    
#### For multiple games recommendations:

1. Compute recommendation sets for each of the selected multiple games using the method for a single game.
2. Arrange the overall recommendations according to the similarity scores of the games.

## Declarations

### Libraries

In [24]:
import pandas as pd
import numpy as np
import boto3
from dotenv import load_dotenv
import warnings

pd.set_option('display.max_columns', None)
warnings.filterwarnings('ignore')

### Environment Variables

In [4]:
load_dotenv()
access_key = os.getenv("AWS_ACCESS_KEY")
secret_key = os.getenv("AWS_SECRET_KEY")
aws_region = os.getenv("AWS_REGION")
role_arn = os.getenv("AWS_ROLE_ARN")

athena_output_directory = os.getenv("ATHENA_OUTPUT_DIRECTORY")
athena_database = os.getenv("ATHENA_DATABASE")

### Function Definition

In [5]:
def initialize_aws_role(access_key, secret_key, aws_region, role_arn):
    """Initializes AWS user and role.
    
    Parameters:
        access_key {string} -- an ID of unique identifier to authenticate requests to AWS.
        secret_key {string} -- a digital signature for API requests made to AWS. 
        aws_region {string} -- location of AWS resources.
        role_arn {string} -- unique identifier for an IAM role to grant permission to AWS resources.
        
    Returns:
        {dict} -- contains response to a successful IAM role requests granting temporary permissions to AWS services \
        including the AWS user granted with such permission and the expiration time of the rewarded token.
    """
    try:
        sts_client = boto3.client(
            'sts',
            aws_access_key_id = access_key,
            aws_secret_access_key= secret_key,
            region_name = aws_region,
        )

        assumed_role = sts_client.assume_role(
            RoleArn=role_arn,
            RoleSessionName='BGGRoleSession',
            DurationSeconds=7200,
        )

        aws_user = sts_client.get_caller_identity()['Arn'].split("/")[-1]
        aws_role = assumed_role['AssumedRoleUser']['Arn'].split("/")[-2]
        token_expiration = assumed_role['Credentials']['Expiration'] + timedelta(hours=8)

        print(f"AWS user '{aws_user}' assigned with role '{aws_role}'.")
        print(f"Token expiration time: {token_expiration.strftime('%I:%M %p %d-%b-%Y')}")
        
        return assumed_role
    
    except Exception as e:
        print(f"Error initializing AWS: {e}")
        raise

In [6]:
def initilize_aws_service(aws_service, assumed_role):
    """Initializes an AWS service client.
    
    Parameters:
        aws_service {string} -- AWS service that needs client session initialization.
        assumed_role {dict} -- contains the temporary security credentials granted to a specific AWS user.
    
    Returns:
        {client_instance} -- can be used to call API requests to the corresponding AWS service client.
    """
    
    try:
        client = boto3.client(
            aws_service,
            aws_access_key_id=assumed_role['Credentials']['AccessKeyId'],
            aws_secret_access_key=assumed_role['Credentials']['SecretAccessKey'],
            aws_session_token=assumed_role['Credentials']['SessionToken']
        )
        print(f"{aws_service} successfully initilized.")

        return client

    except Exception as e:

        print(f"Error initializing {aws_service}: {e}")
        raise    

In [7]:
def athena_query_df(athena_query, athena_client, s3_client):
    """Send a query to AWS Athena and return the results in a pandas dataframe format.
    
    Parameters:
        athena_query {string} -- the query to run in AWS Athena using an SQL like syntax.
        athena_client {client_instance} -- initialized client of AWS Athena that permits API requests.
        s3_client {client_instance} -- initialized client of AWS S3 that permits API requests.
    
    Returns:
        {dataframe} -- table containing the results of the Athena query.
    """
    
    query_response = athena_client.start_query_execution(
        QueryString=athena_query,
        QueryExecutionContext = {"Database": athena_database},
        ResultConfiguration={
            "OutputLocation":athena_output_directory,
        }
    )
    
    #Wait for Athena query to finish execution
    while True:
        execution_response = athena_client.get_query_execution(QueryExecutionId=query_response['QueryExecutionId'])
        athena_state = execution_response['QueryExecution']['Status']['State']
        if athena_state in ['RUNNING', 'QUEUED']:
            time.sleep(5)
        else:
            break

    athena_output_location = execution_response['QueryExecution']['ResultConfiguration']['OutputLocation']
    athena_bucket, athena_key = athena_output_location.replace('s3://', '').split('/', 1)

    s3_response = s3_client.get_object(
        Bucket=athena_bucket, 
        Key=athena_key
    )
    
    athena_query_results = BytesIO(s3_response['Body'].read())

    return pd.read_csv(athena_query_results)

In [8]:
#get no of nunique, cardinality, % of NaNs and Uknowns
def analyse_features(df, cat_cols):
    """Explores the various features of the dataframe in terms of data type, values, and cardinality.
    
    Parameters:
        df {dataframe} -- the dataframe where feature analysis will be conducted. 
        cat_cols {dataframe columns} -- columns of the dataframe that will be included in the feature analysis.
    
    Returns:
        {dataframe} -- returns the data type, unique values, cardinality, minimum value, \
        maximum value, and null values for each feature.
    """
    
    feature_details = []

    for column in cat_cols:
        
        feature_details.append({
            "Feature":column,
            "Data Type":df[column].dtype,
            "Uniques":df[column].unique(),
            "Cardinality":df[column].nunique(),
            "Minimum": df[column].min() if df[column].dtype != "object" else "-",
            "Maximum":df[column].max() if df[column].dtype != "object" else "-",
            "Nans":df[column].isnull().sum(),
            "% Share of Nans":round((df[column].isnull().sum()/len(df))*100,2)
        })

    return pd.DataFrame(feature_details)

## Data Import

In [9]:
assumed_role = initialize_aws_role(access_key, secret_key, aws_region, role_arn)
s3_client = initilize_aws_service('s3', assumed_role)
athena_client = initilize_aws_service('athena', assumed_role)

AWS user 'bgg_user' assigned with role 'bgg_user_role'.
Token expiration time: 01:57 AM 28-Jan-2024
s3 successfully initilized.
athena successfully initilized.


In [10]:
athena_query = '''
    SELECT *
    FROM bgg_analytics;
'''

In [11]:
bgg_ranked = athena_query_df(athena_query, athena_client, s3_client)

In [13]:
athena_query = '''
    SELECT *
    FROM bgg_analytics_classification
    WHERE classification='boardgamemechanic';
'''

In [14]:
bgg_mechanics = athena_query_df(athena_query, athena_client, s3_client)

In [16]:
athena_query = '''
    SELECT *
    FROM bgg_analytics_classification
    WHERE classification='boardgamecategory';
'''

In [17]:
bgg_categories = athena_query_df(athena_query, athena_client, s3_client)

## Data Preprocessing

### BGG Ranked

In [61]:
analyse_features(bgg_ranked, bgg_ranked.columns)

Unnamed: 0,Feature,Data Type,Uniques,Cardinality,Minimum,Maximum,Nans,% Share of Nans
0,users_rated,int64,"[5552, 3846, 1507, 1607, 2057, 2570, 1580, 663...",3125,191,123891,0,0.0
1,num_owners,int64,"[8587, 8721, 3135, 3247, 3560, 3923, 2379, 124...",3754,188,199586,0,0.0
2,max_players,int64,"[4, 6, 8, 20, 5, 7, 2, 99, 68, 1, 10, 3, 9, 10...",26,0,100,0,0.0
3,bgg_rank,int64,"[1203, 2448, 2559, 2435, 2501, 1254, 1238, 239...",5000,1,5000,0,0.0
4,average_weight,float64,"[2.2528, 2.3643, 3.0175, 2.6391, 1.1463, 2.827...",3387,1.0,4.8097,0,0.0
5,average,float64,"[6.86597, 6.55408, 6.82227, 6.92658, 6.67878, ...",4937,5.92768,9.22328,0,0.0
6,min_players,int64,"[2, 1, 4, 3, 5, 6, 0, 8]",8,0,8,0,0.0
7,year_published,int64,"[2014, 2003, 2019, 2011, 2015, 2020, 1992, 201...",100,-3000,2023,0,0.0
8,name,object,"[La Isla, Risk: The Lord of the Rings Trilogy ...",4973,-,-,0,0.0
9,max_playtime,int64,"[60, 180, 240, 15, 75, 20, 30, 90, 10, 70, 40,...",75,0,12000,0,0.0


* Since `bgg_id` serves as a unique identifier for each board game in the collection, it is advisable to convert it into a string type format and utilize it as an index.
* `date` and `type` refers to the date of collection and holds no value in the analysis.
* The dataframe is arranged based on its rank, prioritizing highly rated games over those with unknown ratings.

In [4]:
bgg_ranked['bgg_id'] = bgg_ranked['bgg_id'].astype(str)
bgg_ranked = bgg_ranked.drop(['date', 'type'], axis=1)
bgg_ranked = bgg_ranked[['bgg_id', 'bgg_rank', 'name', 'average', 'average_weight', 'num_owners', 'users_rated', 
                         'min_players', 'max_players', 'min_playtime', 'max_playtime', 'year_published']]
bgg_ranked = bgg_ranked.sort_values('bgg_rank').reset_index(drop=True)
bgg_ranked = bgg_ranked.set_index('bgg_id')

In [63]:
analyse_features(bgg_ranked, bgg_ranked.columns)

Unnamed: 0,Feature,Data Type,Uniques,Cardinality,Minimum,Maximum,Nans,% Share of Nans
0,bgg_rank,int64,"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...",5000,1,5000,0,0.0
1,name,object,"[Brass: Birmingham, Pandemic Legacy: Season 1,...",4973,-,-,0,0.0
2,average,float64,"[8.60549, 8.53167, 8.60523, 8.53375, 8.60721, ...",4937,5.92768,9.22328,0,0.0
3,average_weight,float64,"[3.8916, 2.8322, 3.9008, 3.7398, 4.3169, 3.259...",3387,1.0,4.8097,0,0.0
4,num_owners,int64,"[58962, 81047, 94566, 55960, 27551, 131519, 56...",3754,188,199586,0,0.0
5,users_rated,int64,"[42488, 52171, 60552, 37952, 22367, 94781, 404...",3125,191,123891,0,0.0
6,min_players,int64,"[2, 1, 3, 5, 4, 6, 8, 0]",8,0,8,0,0.0
7,max_players,int64,"[4, 6, 5, 2, 7, 8, 100, 1, 10, 3, 12, 21, 16, ...",26,0,100,0,0.0
8,min_playtime,int64,"[60, 90, 240, 120, 30, 150, 180, 75, 100, 40, ...",52,0,5400,0,0.0
9,max_playtime,int64,"[120, 60, 150, 480, 180, 240, 90, 115, 30, 200...",75,0,12000,0,0.0


In [5]:
bgg_ranked.head()

Unnamed: 0_level_0,bgg_rank,name,average,average_weight,num_owners,users_rated,min_players,max_players,min_playtime,max_playtime,year_published
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
224517,1,Brass: Birmingham,8.60549,3.8916,58962,42488,2,4,60,120,2018
161936,2,Pandemic Legacy: Season 1,8.53167,2.8322,81047,52171,2,4,60,60,2015
174430,3,Gloomhaven,8.60523,3.9008,94566,60552,1,4,60,120,2017
342942,4,Ark Nova,8.53375,3.7398,55960,37952,1,4,90,150,2021
233078,5,Twilight Imperium: Fourth Edition,8.60721,4.3169,27551,22367,3,6,240,480,2017


### BGG Mechanics

In [66]:
analyse_features(bgg_mechanics, bgg_mechanics.columns)

Unnamed: 0,Feature,Data Type,Uniques,Cardinality,Minimum,Maximum,Nans,% Share of Nans
0,value,object,"[Campaign / Battle Card Driven, Hand Managemen...",191,-,-,0,0.0
1,name,object,"[Fields of Fire 2, A Game of Thrones Collectib...",4935,-,-,0,0.0
2,bgg_id,int64,"[139433, 4286, 86156, 242705, 334011, 12166, 2...",4962,1,400314,0,0.0
3,date,object,[1/2/2024],1,-,-,0,0.0
4,type,object,[classification],1,-,-,0,0.0
5,classification,object,[boardgamemechanic],1,-,-,0,0.0


* `classification` is dropped in the dataframe since it only refers to `boardgamemechanic`.

In [6]:
bgg_mechanics['bgg_id'] = bgg_mechanics['bgg_id'].astype(str)
bgg_mechanics = bgg_mechanics.drop(['date', 'type', 'classification'], axis=1)
bgg_mechanics = bgg_mechanics.sort_values('bgg_id').rename({'value':'mechanic'}, axis=1)

In [8]:
bgg_mechanics.head()

Unnamed: 0,mechanic,name,bgg_id
10410,Alliances,Die Macher,1
10411,Area Majority / Influence,Die Macher,1
10412,Auction/Bidding,Die Macher,1
10413,Dice Rolling,Die Macher,1
10414,Hand Management,Die Macher,1


#### BGG Mechanics Vector

* To generate the binary vector representing the mechanics of each board game in the entire collection, we will pivot the dataframe. Each row will correspond to a board game, and each column will display all the mechanics present in the collection. A value of 1 indicates that the board game possesses that mechanic.

In [9]:
bgg_mechanics_vector = bgg_mechanics.pivot_table(index='bgg_id', columns='mechanic', aggfunc='count', fill_value=0)
bgg_mechanics_vector.columns = bgg_mechanics_vector.columns.droplevel(0)
bgg_mechanics_vector.columns.name = None

* `bgg_mechanics_vector.columns.droplevel(0)` is used to flatten the column into a single level which makes it easier to isolate specific columns.

In [11]:
bgg_mechanics_vector.head()

Unnamed: 0_level_0,Acting,Action Drafting,Action Points,Action Queue,Action Retrieval,Action Timer,Action/Event,Advantage Token,Alliances,Area Majority / Influence,Area Movement,Area-Impulse,Auction Compensation,Auction/Bidding,Auction: Dexterity,Auction: Dutch,Auction: Dutch Priority,Auction: English,Auction: Fixed Placement,Auction: Multiple Lot,Auction: Once Around,Auction: Sealed Bid,Auction: Turn Order Until Pass,Automatic Resource Growth,Betting and Bluffing,Bias,Bids As Wagers,Bingo,Bribery,Campaign / Battle Card Driven,Card Play Conflict Resolution,Catch the Leader,Chaining,Chit-Pull System,Closed Drafting,Closed Economy Auction,Command Cards,Commodity Speculation,Communication Limits,Connections,Constrained Bidding,Contracts,Cooperative Game,Crayon Rail System,Critical Hits and Failures,Cube Tower,Deck Construction,"Deck, Bag, and Pool Building",Deduction,Delayed Purchase,Dice Rolling,Die Icon Resolution,Different Dice Movement,Drawing,Elapsed Real Time Ending,Enclosure,End Game Bonuses,Events,Finale Ending,Flicking,Follow,Force Commitment,Grid Coverage,Grid Movement,Hand Management,Hexagon Grid,Hidden Movement,Hidden Roles,Hidden Victory Points,Highest-Lowest Scoring,Hot Potato,"I Cut, You Choose",Impulse Movement,Income,Increase Value of Unchosen Resources,Induction,Interrupts,Investment,Kill Steal,King of the Hill,Ladder Climbing,Layering,Legacy Game,Line Drawing,Line of Sight,Loans,Lose a Turn,Mancala,Map Addition,Map Deformation,Map Reduction,Market,Matching,Measurement Movement,Melding and Splaying,Memory,Minimap Resolution,Modular Board,Move Through Deck,Movement Points,Movement Template,Moving Multiple Units,Multi-Use Cards,Multiple Maps,Narrative Choice / Paragraph,Negotiation,Neighbor Scope,Network and Route Building,Once-Per-Game Abilities,Open Drafting,Order Counters,Ordering,Ownership,Paper-and-Pencil,Passed Action Token,Pattern Building,Pattern Movement,Pattern Recognition,Physical Removal,Pick-up and Deliver,Pieces as Map,Player Elimination,Player Judge,Point to Point Movement,Predictive Bid,Prisoner's Dilemma,Programmed Movement,Push Your Luck,Questions and Answers,Race,Random Production,Ratio / Combat Results Table,Re-rolling and Locking,Real-Time,Relative Movement,Resource Queue,Resource to Move,Rock-Paper-Scissors,Role Playing,Roles with Asymmetric Information,Roll / Spin and Move,Rondel,Scenario / Mission / Campaign Game,Score-and-Reset Game,Secret Unit Deployment,Selection Order Bid,Semi-Cooperative Game,Set Collection,Simulation,Simultaneous Action Selection,Singing,Single Loser Game,Slide/Push,Solo / Solitaire Game,Speed Matching,Square Grid,Stacking and Balancing,Stat Check Resolution,Static Capture,Stock Holding,Storytelling,Sudden Death Ending,Tags,Take That,Targeted Clues,Team-Based Game,Tech Trees / Tech Tracks,Three Dimensional Movement,Tile Placement,Track Movement,Trading,Traitor Game,Trick-taking,Tug of War,Turn Order: Auction,Turn Order: Claim Action,Turn Order: Pass Order,Turn Order: Progressive,Turn Order: Random,Turn Order: Role Order,Turn Order: Stat-Based,Turn Order: Time Track,Variable Phase Order,Variable Player Powers,Variable Set-up,Victory Points as a Resource,Voting,Worker Placement,Worker Placement with Dice Workers,"Worker Placement, Different Worker Types",Zone of Control
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1
1,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1002,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
100423,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
100679,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0


### BGG Categories

* The same procedures used in BGG mechanics is repeated for the categories of the board game.

In [128]:
analyse_features(bgg_categories, bgg_categories.columns)

Unnamed: 0,Feature,Data Type,Uniques,Cardinality,Minimum,Maximum,Nans,% Share of Nans
0,value,object,"[City Building, Medieval, Dice, Renaissance, A...",84,-,-,0,0.0
1,name,object,"[Comuni, Coimbra, Livingstone, Beowulf: The Mo...",4930,-,-,0,0.0
2,bgg_id,int64,"[37231, 245638, 40444, 29308, 730, 8107, 17971...",4957,1,400314,0,0.0
3,date,object,[1/2/2024],1,-,-,0,0.0
4,type,object,[classification],1,-,-,0,0.0
5,classification,object,[boardgamecategory],1,-,-,0,0.0


In [13]:
bgg_categories['bgg_id'] = bgg_categories['bgg_id'].astype(str)
bgg_categories = bgg_categories.drop(['date', 'type', 'classification'], axis=1)
bgg_categories = bgg_categories.sort_values('bgg_id').rename({'value':'category'}, axis=1)

In [14]:
bgg_categories.head()

Unnamed: 0,category,name,bgg_id
1574,Economic,Die Macher,1
1575,Negotiation,Die Macher,1
1576,Political,Die Macher,1
1982,Travel,Elfenland,10
1981,Fantasy,Elfenland,10


#### BGG Categories Vector

In [15]:
bgg_categories_vector = bgg_categories.pivot_table(index='bgg_id', columns='category', aggfunc='count', fill_value=0)
bgg_categories_vector.columns = bgg_categories_vector.columns.droplevel(0)
bgg_categories_vector.columns.name = None

In [16]:
bgg_categories_vector.head()

Unnamed: 0_level_0,Abstract Strategy,Action / Dexterity,Adventure,Age of Reason,American Civil War,American Indian Wars,American Revolutionary War,American West,Ancient,Animals,Arabian,Aviation / Flight,Bluffing,Book,Card Game,Children's Game,City Building,Civil War,Civilization,Collectible Components,Comic Book / Strip,Deduction,Dice,Economic,Educational,Electronic,Environmental,Expansion for Base-game,Exploration,Fan Expansion,Fantasy,Farming,Fighting,Game System,Horror,Humor,Industry / Manufacturing,Korean War,Mafia,Math,Mature / Adult,Maze,Medical,Medieval,Memory,Miniatures,Modern Warfare,Movies / TV / Radio theme,Murder/Mystery,Music,Mythology,Napoleonic,Nautical,Negotiation,Novel-based,Number,Party Game,Pike and Shot,Pirates,Political,Post-Napoleonic,Prehistoric,Print & Play,Puzzle,Racing,Real-time,Religious,Renaissance,Science Fiction,Space Exploration,Spies/Secret Agents,Sports,Territory Building,Trains,Transportation,Travel,Trivia,Video Game Theme,Vietnam War,Wargame,Word Game,World War I,World War II,Zombies
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
1002,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
100423,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
100679,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


## Content-Based Recommender using IDF

After generating the mechanic and category vectors, the IDF values for each will be calculated. The total number of board games in the collection is shown below.

In [17]:
n_games = len(bgg_ranked)

In [18]:
n_games

5000

### Mechanics IDF Values

* To determine the IDF values, the number of instances where each mechanic occurs in a board game for the entire collection is calculated. Then, the uniqueness of each mechanic will be determined by applying the formula of IDF as discussed earlier.

In [20]:
n_mechanics = bgg_mechanics.value_counts('mechanic').reset_index(name='count')
n_mechanics['idf'] = np.log(n_games/n_mechanics['count'])
n_mechanics = n_mechanics.sort_values('mechanic').reset_index(drop=True)

In [21]:
n_mechanics.head()

Unnamed: 0,mechanic,count,idf
0,Acting,38,4.879607
1,Action Drafting,49,4.625373
2,Action Points,491,2.320749
3,Action Queue,158,3.454598
4,Action Retrieval,67,4.312501


### Categories - IDF values

* The procedures above are repeated for board game categories.

In [22]:
n_categories = bgg_categories.value_counts('category').reset_index(name='count')
n_categories['idf'] = np.log(n_games/n_categories['count'])
n_categories = n_categories.sort_values('category').reset_index(drop=True)

In [23]:
n_categories.head()

Unnamed: 0,category,count,idf
0,Abstract Strategy,270,2.918771
1,Action / Dexterity,130,3.649659
2,Adventure,397,2.533257
3,Age of Reason,50,4.60517
4,American Civil War,24,5.339139


### Find similar games - Single Game

* The methodology for a content-based recommender will now be applied to a game titled **Scythe**, an award-winning game with over 80,000 user ratings and currently holding the 17th position on BoardGameGeek's rankings.

In [24]:
#Sample Game - Scythe

bgg_id = '169786'

In [163]:
bgg_ranked[bgg_ranked.index == bgg_id]

Unnamed: 0_level_0,bgg_rank,name,average,average_weight,num_owners,users_rated,min_players,max_players,min_playtime,max_playtime,year_published
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
169786,17,Scythe,8.15621,3.4441,109362,80329,1,5,90,115,2016


In [164]:
bgg_mechanics_vector[bgg_mechanics_vector.index == bgg_id]

Unnamed: 0_level_0,Acting,Action Drafting,Action Points,Action Queue,Action Retrieval,Action Timer,Action/Event,Advantage Token,Alliances,Area Majority / Influence,Area Movement,Area-Impulse,Auction Compensation,Auction/Bidding,Auction: Dexterity,Auction: Dutch,Auction: Dutch Priority,Auction: English,Auction: Fixed Placement,Auction: Multiple Lot,Auction: Once Around,Auction: Sealed Bid,Auction: Turn Order Until Pass,Automatic Resource Growth,Betting and Bluffing,Bias,Bids As Wagers,Bingo,Bribery,Campaign / Battle Card Driven,Card Play Conflict Resolution,Catch the Leader,Chaining,Chit-Pull System,Closed Drafting,Closed Economy Auction,Command Cards,Commodity Speculation,Communication Limits,Connections,Constrained Bidding,Contracts,Cooperative Game,Crayon Rail System,Critical Hits and Failures,Cube Tower,Deck Construction,"Deck, Bag, and Pool Building",Deduction,Delayed Purchase,Dice Rolling,Die Icon Resolution,Different Dice Movement,Drawing,Elapsed Real Time Ending,Enclosure,End Game Bonuses,Events,Finale Ending,Flicking,Follow,Force Commitment,Grid Coverage,Grid Movement,Hand Management,Hexagon Grid,Hidden Movement,Hidden Roles,Hidden Victory Points,Highest-Lowest Scoring,Hot Potato,"I Cut, You Choose",Impulse Movement,Income,Increase Value of Unchosen Resources,Induction,Interrupts,Investment,Kill Steal,King of the Hill,Ladder Climbing,Layering,Legacy Game,Line Drawing,Line of Sight,Loans,Lose a Turn,Mancala,Map Addition,Map Deformation,Map Reduction,Market,Matching,Measurement Movement,Melding and Splaying,Memory,Minimap Resolution,Modular Board,Move Through Deck,Movement Points,Movement Template,Moving Multiple Units,Multi-Use Cards,Multiple Maps,Narrative Choice / Paragraph,Negotiation,Neighbor Scope,Network and Route Building,Once-Per-Game Abilities,Open Drafting,Order Counters,Ordering,Ownership,Paper-and-Pencil,Passed Action Token,Pattern Building,Pattern Movement,Pattern Recognition,Physical Removal,Pick-up and Deliver,Pieces as Map,Player Elimination,Player Judge,Point to Point Movement,Predictive Bid,Prisoner's Dilemma,Programmed Movement,Push Your Luck,Questions and Answers,Race,Random Production,Ratio / Combat Results Table,Re-rolling and Locking,Real-Time,Relative Movement,Resource Queue,Resource to Move,Rock-Paper-Scissors,Role Playing,Roles with Asymmetric Information,Roll / Spin and Move,Rondel,Scenario / Mission / Campaign Game,Score-and-Reset Game,Secret Unit Deployment,Selection Order Bid,Semi-Cooperative Game,Set Collection,Simulation,Simultaneous Action Selection,Singing,Single Loser Game,Slide/Push,Solo / Solitaire Game,Speed Matching,Square Grid,Stacking and Balancing,Stat Check Resolution,Static Capture,Stock Holding,Storytelling,Sudden Death Ending,Tags,Take That,Targeted Clues,Team-Based Game,Tech Trees / Tech Tracks,Three Dimensional Movement,Tile Placement,Track Movement,Trading,Traitor Game,Trick-taking,Tug of War,Turn Order: Auction,Turn Order: Claim Action,Turn Order: Pass Order,Turn Order: Progressive,Turn Order: Random,Turn Order: Role Order,Turn Order: Stat-Based,Turn Order: Time Track,Variable Phase Order,Variable Player Powers,Variable Set-up,Victory Points as a Resource,Voting,Worker Placement,Worker Placement with Dice Workers,"Worker Placement, Different Worker Types",Zone of Control
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1
169786,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,1


In [165]:
bgg_categories_vector[bgg_categories_vector.index == bgg_id]

Unnamed: 0_level_0,Abstract Strategy,Action / Dexterity,Adventure,Age of Reason,American Civil War,American Indian Wars,American Revolutionary War,American West,Ancient,Animals,Arabian,Aviation / Flight,Bluffing,Book,Card Game,Children's Game,City Building,Civil War,Civilization,Collectible Components,Comic Book / Strip,Deduction,Dice,Economic,Educational,Electronic,Environmental,Expansion for Base-game,Exploration,Fan Expansion,Fantasy,Farming,Fighting,Game System,Horror,Humor,Industry / Manufacturing,Korean War,Mafia,Math,Mature / Adult,Maze,Medical,Medieval,Memory,Miniatures,Modern Warfare,Movies / TV / Radio theme,Murder/Mystery,Music,Mythology,Napoleonic,Nautical,Negotiation,Novel-based,Number,Party Game,Pike and Shot,Pirates,Political,Post-Napoleonic,Prehistoric,Print & Play,Puzzle,Racing,Real-time,Religious,Renaissance,Science Fiction,Space Exploration,Spies/Secret Agents,Sports,Territory Building,Trains,Transportation,Travel,Trivia,Video Game Theme,Vietnam War,Wargame,Word Game,World War I,World War II,Zombies
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1
169786,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


#### AND operation

* In this step, board games with at least one similar mechanic with Scythe will be isolated from `bgg_mechanics_vector`.

In [25]:
#Obtain mechanics of selected game
game_mechanics_vector = bgg_mechanics_vector.loc[bgg_id]
game_mechanics = game_mechanics_vector[game_mechanics_vector.values == 1].index.to_list()

In [26]:
game_mechanics

['Area Majority / Influence',
 'Card Play Conflict Resolution',
 'Contracts',
 'End Game Bonuses',
 'Force Commitment',
 'Grid Movement',
 'Hexagon Grid',
 'King of the Hill',
 'Movement Points',
 'Solo / Solitaire Game',
 'Take That',
 'Tech Trees / Tech Tracks',
 'Variable Player Powers',
 'Variable Set-up',
 'Victory Points as a Resource',
 'Zone of Control']

* The list above shows the game mechanics of Scythe.

In [27]:
#isolate columns based on mechanics of selected game
bgg_game_mechanics_vector = bgg_mechanics_vector[game_mechanics]

In [69]:
bgg_game_mechanics_vector.head()

Unnamed: 0_level_0,Area Majority / Influence,Card Play Conflict Resolution,Contracts,End Game Bonuses,Force Commitment,Grid Movement,Hexagon Grid,King of the Hill,Movement Points,Solo / Solitaire Game,Take That,Tech Trees / Tech Tracks,Variable Player Powers,Variable Set-up,Victory Points as a Resource,Zone of Control
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1002,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
100423,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0
100679,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0


* The columns of the dataframe above were reduced to include only the mechanics that is specific to Scythe.

In [70]:
#determine games that has at least one similar mechanic with the selected game
similar_games_mechanics = bgg_game_mechanics_vector[(bgg_game_mechanics_vector & game_mechanics_vector).any(axis=1)]\
.drop(index=bgg_id)

* The `AND` operation was performed using `&` between `bgg_game_mechanics_vector` and `game_mechanics_vector`. The code `.any(axis=1)` isolate rows that have at least one similar mechanic to the original game. To avoid recommending the game to itself, `.drop(index=bgg_id)` is used to remove the board game of interest.

In [71]:
similar_games_mechanics.shape

(2971, 16)

In [72]:
similar_games_mechanics.head()

Unnamed: 0_level_0,Area Majority / Influence,Card Play Conflict Resolution,Contracts,End Game Bonuses,Force Commitment,Grid Movement,Hexagon Grid,King of the Hill,Movement Points,Solo / Solitaire Game,Take That,Tech Trees / Tech Tracks,Variable Player Powers,Variable Set-up,Victory Points as a Resource,Zone of Control
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1002,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
100423,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0
100679,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0
100901,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0


* There are 2971 games (out of 5000 in the collection) that has at least one similar mechanic with Scythe and the first five games are shown above.

The same procedures are repeated for the categories of Scythe.

In [73]:
# Categories

game_categories_vector = bgg_categories_vector.loc[bgg_id]
game_categories = game_categories_vector[game_categories_vector.values == 1].index.to_list()

bgg_game_categories_vector = bgg_categories_vector[game_categories]

similar_games_categories = bgg_game_categories_vector[(bgg_game_categories_vector & game_categories_vector).any(axis=1)]\
.drop(index=bgg_id)

In [74]:
similar_games_categories.shape

(1694, 4)

In [75]:
similar_games_categories.head()

Unnamed: 0_level_0,Economic,Fighting,Science Fiction,Territory Building
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1,0,0,0
100423,0,1,0,0
100679,0,1,0,0
10093,1,0,0,0
101721,0,1,0,0


* As shown above, there are 1694 board games with similar categories compared to Scythe.

### IDF values per game

Now that the IDF values for each mechanic and category have been computed and the board games sharing at least one similar mechanic or category have been identified, the similarity scores of these isolated board games can be calculated.

#### Mechanics

In [76]:
# Isolate IDF values of current game
game_n_mechanics = n_mechanics[n_mechanics['mechanic'].isin(game_mechanics)]['idf']

In [77]:
game_n_mechanics.shape

(16,)

In [78]:
game_n_mechanics

9      1.936554
30     3.942482
41     3.375530
56     3.003764
61     5.744604
63     2.255702
65     2.513306
79     5.626821
99     3.641996
153    1.935168
163    2.609110
166    4.074542
183    1.434645
184    2.843870
185    3.952845
190    4.645992
Name: idf, dtype: float64

Scythe has 16 mechanics listed in BGG and the IDF values of those mechanics are listed above. Again, the values refer to the uniqueness of that mechanic relative to the entire collection. For example, mechanic no. 79 (King of the Hill) and mechanic no. 61 (Force Commitment) are the two most unique mechanics of Scythe. 

In [79]:
np.dot(similar_games_mechanics, game_n_mechanics)

array([1.93655405, 1.43464462, 3.36981267, ..., 7.1592983 , 1.93655405,
       3.69034613])

In [80]:
mechanics_idf = pd.DataFrame(index = similar_games_mechanics.index,
             data = {'idf':np.dot(similar_games_mechanics, game_n_mechanics)}
            )

In [81]:
mechanics_idf

Unnamed: 0_level_0,idf
bgg_id,Unnamed: 1_level_1
1,1.936554
1002,1.434645
100423,3.369813
100679,3.690346
100901,5.625514
...,...
986,7.255102
987,3.641996
99358,7.159298
99392,1.936554


* The operation above performs the matrix multiplication between the binary vectors of the isolated board games and the IDF values of each mechanic. The resulting dataframe shows the similarity scores of each board game with respect to the mechanics of Scythe.

#### Categories

In [82]:
game_n_categories = n_categories[n_categories['category'].isin(game_categories)]['idf']
categories_idf = pd.DataFrame(index = similar_games_categories.index,
             data = {'idf':np.dot(similar_games_categories, game_n_categories)}
            )

In [83]:
categories_idf

Unnamed: 0_level_0,idf
bgg_id,Unnamed: 1_level_1
1,2.085862
100423,2.187472
100679,2.187472
10093,2.085862
101721,2.187472
...,...
98347,2.249993
98351,2.249993
99,2.249993
99392,5.109994


* The procedures above are repeated for board game categories.

#### Total

In [84]:
similar_games = mechanics_idf.add(categories_idf, fill_value=0).sort_values('idf', ascending=False)

In [86]:
similar_games.head()

Unnamed: 0_level_0,idf
bgg_id,Unnamed: 1_level_1
167791,33.102766
3870,28.937241
220308,27.118127
344105,27.050883
184267,25.904293


* Lastly, the categories and mechanics IDF vectors are added to determine the ultimate similarity score considering both the mechanics and categories for each board game.

#### Top 10 Similar Games by IDF values

In [88]:
pd.merge(similar_games[:10],
         bgg_ranked[['name', 'bgg_rank', 'average_weight']],
         left_index=True,
         right_index=True,
         how='left')

Unnamed: 0_level_0,idf,name,bgg_rank,average_weight
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
167791,33.102766,Terraforming Mars,6,3.259
3870,28.937241,7 Ages,2773,3.9296
220308,27.118127,Gaia Project,12,4.3973
344105,27.050883,Anunnaki: Dawn of the Gods,4793,3.48
184267,25.904293,On Mars,48,4.6754
120677,24.86952,Terra Mystica,26,3.9728
233078,23.08474,Twilight Imperium: Fourth Edition,5,4.3169
156091,21.54576,Sons of Anarchy: Men of Mayhem,1193,2.5648
276025,20.960732,Maracaibo,58,3.903
286096,20.722725,Tapestry,271,2.9277


* Above are the top ten similar games, sorted by the most similar games first. Notably, Scythe possesses a weight score of 3.44 (Medium), yet there are recommended games with weight scores ranging from Medium Light to Medium Heavy.

#### Top 10 Similar Games by IDF values and Similar Weight Scale

In [89]:
#Compute for lower and upper boundary weight of selected game
lower_weight_bound = np.floor(bgg_ranked.loc[bgg_id]['average_weight'])
upper_weight_bound = np.ceil(bgg_ranked.loc[bgg_id]['average_weight'])

similar_games_df = bgg_ranked.loc[similar_games.index.to_list()]

In [90]:
recommended_games = similar_games_df[similar_games_df['average_weight']\
                                         .between(lower_weight_bound, upper_weight_bound)][:10]

In [91]:
recommended_games[['name', 'bgg_rank', 'average_weight']].sort_values('bgg_rank')

Unnamed: 0_level_0,name,bgg_rank,average_weight
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
167791,Terraforming Mars,6,3.259
187645,Star Wars: Rebellion,10,3.7444
120677,Terra Mystica,26,3.9728
276025,Maracaibo,58,3.903
285967,Ankh: Gods of Egypt,216,3.0936
228328,Rurik: Dawn of Kiev,892,3.0734
249277,Brazil: Imperial,939,3.0066
144239,Impulse,2058,3.0385
3870,7 Ages,2773,3.9296
344105,Anunnaki: Dawn of the Gods,4793,3.48


* To ensure a more comprehensive set of recommendations, we confine the selection to board games within the same weight scale as our target board game. This helps prevent recommending games that may suddenly be too complex or too easy for the user. Additionally, the recommendations are prioritized based on the geek rating, considering that well-received board games are more likely to be favored by other players. Upon reviewing the final set of recommendations, we observe that they are sourced from board games ranked as high as six and as low as 4793. This indicates that the methodology can suggest both well-known games and hidden gems from the collection.

## Content-based Recommender Function

### Single Game

In [495]:
def bgg_recommend_single(df, game):
    """Recommends board games based on similarity scores calculated using Inverse-Document Frequency within\
    the same weight scale and ranked by geek rating.
    
    Parameters:
        df {dataframe} -- contains the collection of board games where similar games will be identified.
        game {string} -- BGG ID of the board game of interest.
        
    Returns:
        {dataframe} -- top 10 recommendations by IDF values, complexity scale, and geek rating.
    """
    
    #Calculate IDF values of mechanics and categories
    n_games = len(df)

    n_mechanics = bgg_mechanics.value_counts('mechanic').reset_index(name='count')
    n_mechanics['idf'] = np.log(n_games/n_mechanics['count'])
    n_mechanics = n_mechanics.sort_values('mechanic').reset_index(drop=True)

    n_categories = bgg_categories.value_counts('category').reset_index(name='count')
    n_categories['idf'] = np.log(n_games/n_categories['count'])
    n_categories = n_categories.sort_values('category').reset_index(drop=True)
    
    #Filter Mechanics and Categories of df
    df_mechanics_vector = bgg_mechanics_vector[bgg_mechanics_vector.index.isin(df.index)]
    df_categories_vector = bgg_categories_vector[bgg_categories_vector.index.isin(df.index)]
   
    
    #AND Operation
    if game in df_mechanics_vector.index:
        #obtain mechanics of selected game
        game_mechanics_vector = bgg_mechanics_vector.loc[game]
        game_mechanics = game_mechanics_vector[game_mechanics_vector.values == 1].index.to_list()

        #isolate columns based on mechanics of selected game
        df_game_mechanics_vector = df_mechanics_vector[game_mechanics]

        #determine games that has at least one similar mechanic with the selected game
        similar_games_mechanics = df_game_mechanics_vector[(df_game_mechanics_vector & game_mechanics_vector).any(axis=1)]\
        .drop(index=game)
        
        #Compute for similarity score of similar games
        game_n_mechanics = n_mechanics[n_mechanics['mechanic'].isin(game_mechanics)]['idf']
        mechanics_idf = pd.DataFrame(index = similar_games_mechanics.index,
                     data = {'idf':np.dot(similar_games_mechanics, game_n_mechanics)}
                    )
    else:
        mechanics_idf = pd.DataFrame()
    
    if game in df_categories_vector.index:
        game_categories_vector = bgg_categories_vector.loc[game]
        game_categories = game_categories_vector[game_categories_vector.values == 1].index.to_list()


        df_game_categories_vector = df_categories_vector[game_categories]

        similar_games_categories = df_game_categories_vector[(df_game_categories_vector & game_categories_vector).any(axis=1)]\
        .drop(index=game)
        
        game_n_categories = n_categories[n_categories['category'].isin(game_categories)]['idf']
        categories_idf = pd.DataFrame(index = similar_games_categories.index,
                     data = {'idf':np.dot(similar_games_categories, game_n_categories)}
                    )
    else:
        categories_idf = 0
    
    similar_games = mechanics_idf.add(categories_idf, fill_value=0).sort_values('idf', ascending=False)
    similar_games_df = df.loc[similar_games.index.to_list()]

    #Compute for lower and upper boundary weight of selected game
    lower_weight_bound = np.floor(df.loc[game]['average_weight'])
    upper_weight_bound = np.ceil(df.loc[game]['average_weight'])
    
    #Recommend games that are within the same weight class
    recommended_games = similar_games_df[similar_games_df['average_weight']\
                                         .between(lower_weight_bound, upper_weight_bound)][:10]
    if len(recommended_games) < 10:
        recommended_games = similar_games_df[similar_games_df['average_weight']\
                                         .between(lower_weight_bound-0.25, upper_weight_bound+0.25)][:10]
        
    
    return pd.Series(recommended_games.sort_values('bgg_rank').index)

    
    

#### Function test

In [498]:
#Scythe

bgg_ranked.loc[bgg_recommend_single(bgg_ranked, '169786')][['name', 'bgg_rank', 'users_rated', 'average_weight', 
                                                                   'num_owners']]

Unnamed: 0_level_0,name,bgg_rank,users_rated,average_weight,num_owners
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
167791,Terraforming Mars,6,94781,3.259,131519
187645,Star Wars: Rebellion,10,31265,3.7444,48757
120677,Terra Mystica,26,47046,3.9728,51156
276025,Maracaibo,58,15241,3.903,21764
285967,Ankh: Gods of Egypt,216,8405,3.0936,14769
228328,Rurik: Dawn of Kiev,892,2375,3.0734,3169
249277,Brazil: Imperial,939,2750,3.0066,5319
144239,Impulse,2058,2146,3.0385,3616
3870,7 Ages,2773,1053,3.9296,1460
344105,Anunnaki: Dawn of the Gods,4793,310,3.48,608


* The results above is similar to the earlier calculation for Scythe.

In [357]:
#Avalon

bgg_ranked.loc[bgg_recommend_single(bgg_ranked, '128882')][['name', 'bgg_rank', 'users_rated', 'average_weight', 
                                                                   'num_owners']]

Unnamed: 0_level_0,name,bgg_rank,users_rated,average_weight,num_owners
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
188834,Secret Hitler,233,27393,1.7393,39051
41114,The Resistance,376,40267,1.5923,63689
163166,One Night Ultimate Werewolf: Daybreak,674,5851,1.3913,14873
134352,Two Rooms and a Boom,1103,5321,1.438,8626
38159,Ultimate Werewolf: Ultimate Edition,1337,4046,1.4908,7101
925,Werewolf,2234,4653,1.3935,4546
180956,One Night Ultimate Vampire,2482,2118,1.6087,7329
204431,One Night Ultimate Alien,2679,1316,1.8889,4362
316287,Quest,3183,727,1.8333,2115
176361,One Night Revolution,4851,1671,1.8636,4923


* *The Resistance: Avalon* is a short yet intense party game, and the recommended games are certainly ones that fans of *Avalon* would likely enjoy trying themselves!

### Multiple Games

In [505]:
def bgg_recommend_multiple(df, games):
    """Recommends board games based on similarity scores calculated using Inverse-Document Frequency within\
    the same weight scale and ranked by geek rating.
    
    Parameters:
        df {dataframe} -- contains the collection of board games where similar games will be identified.
        game {list} -- strings of BGG ID of the board games that the user would like to base the recommendations.
        
    Returns:
        {dataframe} -- top 10 recommendations by IDF values, complexity scale, and geek rating.
    """
    
    #Calculate IDF
    n_games = len(df)

    n_mechanics = bgg_mechanics.value_counts('mechanic').reset_index(name='count')
    n_mechanics['idf'] = np.log(n_games/n_mechanics['count'])
    n_mechanics = n_mechanics.sort_values('mechanic').reset_index(drop=True)

    n_categories = bgg_categories.value_counts('category').reset_index(name='count')
    n_categories['idf'] = np.log(n_games/n_categories['count'])
    n_categories = n_categories.sort_values('category').reset_index(drop=True)

    #Filter Mechanics and Categories of df
    df_mechanics_vector = bgg_mechanics_vector[bgg_mechanics_vector.index.isin(df.index)]
    df_categories_vector = bgg_categories_vector[bgg_categories_vector.index.isin(df.index)]

    recommendations_df = pd.DataFrame()

    for game in games:

        #AND Operation
        if game in df_mechanics_vector.index:
            #obtain mechanics of selected game
            game_mechanics_vector = bgg_mechanics_vector.loc[game]
            game_mechanics = game_mechanics_vector[game_mechanics_vector.values == 1].index.to_list()

            #isolate columns based on mechanics of selected game
            df_game_mechanics_vector = df_mechanics_vector[game_mechanics]

            #determine games that has at least one similar mechanic with the selected game
            similar_games_mechanics = df_game_mechanics_vector[(df_game_mechanics_vector & game_mechanics_vector).any(axis=1)]\
            .drop(index=game)

            #Compute for similarity score of similar games
            game_n_mechanics = n_mechanics[n_mechanics['mechanic'].isin(game_mechanics)]['idf']
            mechanics_idf = pd.DataFrame(index = similar_games_mechanics.index,
                         data = {'idf':np.dot(similar_games_mechanics, game_n_mechanics)}
                        )
        else:
            mechanics_idf = pd.DataFrame()
    
        if game in df_categories_vector.index:
            game_categories_vector = bgg_categories_vector.loc[game]
            game_categories = game_categories_vector[game_categories_vector.values == 1].index.to_list()


            df_game_categories_vector = df_categories_vector[game_categories]

            similar_games_categories = df_game_categories_vector[(df_game_categories_vector & game_categories_vector).any(axis=1)]\
            .drop(index=game)

            game_n_categories = n_categories[n_categories['category'].isin(game_categories)]['idf']
            categories_idf = pd.DataFrame(index = similar_games_categories.index,
                         data = {'idf':np.dot(similar_games_categories, game_n_categories)}
                        )
        else:
            categories_idf = 0

        similar_games = mechanics_idf.add(categories_idf, fill_value=0)

        recommendations_df = pd.concat([recommendations_df, similar_games])

    recommendations_df = recommendations_df.sort_values('idf', ascending=False)
    recommendations_df = recommendations_df[~recommendations_df.index.duplicated()]
    recommendations_df = recommendations_df[~recommendations_df.index.isin(games)]
    
    similar_games_df = df.loc[recommendations_df.index.to_list()]
    
    lower_weight_bound = np.floor(df.loc[games]['average_weight'].mean())
    upper_weight_bound = np.ceil(df.loc[games]['average_weight'].mean())
    
    
    recommended_games = similar_games_df[similar_games_df['average_weight']\
                                         .between(lower_weight_bound, upper_weight_bound)][:10]
    if len(recommended_games) < 10:
        recommended_games = similar_games_df[similar_games_df['average_weight']\
                                         .between(lower_weight_bound-0.25, upper_weight_bound+0.25)][:10]
    
    return pd.Series(recommended_games.sort_values('bgg_rank').index)

 
    

#### Function test

In [506]:
#jaipur, avalon, azul, splendor

bgg_ranked.loc[bgg_recommend_multiple(bgg_ranked, ['54043', '128882', '230802', '148228'])]\
[['name', 'bgg_rank', 'users_rated', 'average_weight', 'num_owners']]

Unnamed: 0_level_0,name,bgg_rank,users_rated,average_weight,num_owners
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
188834,Secret Hitler,233,27393,1.7393,39051
41114,The Resistance,376,40267,1.5923,63689
147949,One Night Ultimate Werewolf,573,27101,1.3739,49660
163166,One Night Ultimate Werewolf: Daybreak,674,5851,1.3913,14873
134352,Two Rooms and a Boom,1103,5321,1.438,8626
38159,Ultimate Werewolf: Ultimate Edition,1337,4046,1.4908,7101
925,Werewolf,2234,4653,1.3935,4546
180956,One Night Ultimate Vampire,2482,2118,1.6087,7329
316287,Quest,3183,727,1.8333,2115
176361,One Night Revolution,4851,1671,1.8636,4923


In [361]:
bgg_ranked.loc[['54043', '128882', '230802', '148228']]['average_weight'].mean()

1.686725

* The average complexity of the board games chosen have a weight of 1.687. Notice the the function return similar board games that are within the same weight scale.

In [507]:
#Scythe, Smallworld, Blood Rage, 7 Wonders

bgg_ranked.loc[bgg_recommend_multiple(bgg_ranked, ['169786', '40692', '170216', '68448'])]\
[['name', 'bgg_rank', 'users_rated', 'average_weight', 'num_owners']]

Unnamed: 0_level_0,name,bgg_rank,users_rated,average_weight,num_owners
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
173346,7 Wonders Duel,18,90090,2.2278,148535
271324,It's a Wonderful World,145,17457,2.3187,23332
316377,7 Wonders (Second Edition),248,6588,2.2996,11304
286096,Tapestry,271,19429,2.9277,26418
309630,Small World of Warcraft,1080,3316,2.6143,9793
60,Vinci,1142,4043,2.7823,3840
156091,Sons of Anarchy: Men of Mayhem,1193,2920,2.5648,6014
281946,Aftermath,1427,1729,2.8358,5823
252446,Key Flow,1728,1408,2.8788,2793
315727,Last Light,3160,599,2.7692,1463


In [363]:
bgg_ranked.loc[['169786', '40692', '170216', '68448']]['average_weight'].mean()

2.7471249999999996

* Previous recommendations for Scythe were within the Medium complexity scale. However, since the average weight of all chosen board games is in the Medium Light scale, the top 10 recommendations displayed now differ from the earlier analysis.

## Board game recommendations for top 5000 games based on rank

Now that the final model has been created to calculate similar games, the next step is to determine the top 10 recommendations for all the top 5000 board games on Board Game Geek. These recommendations will then be visualized in Tableau for our data users.

In [196]:
#Drop since this does not refer to any board games

bgg_ranked.loc[['18291', '23953']]

Unnamed: 0_level_0,bgg_rank,name,average,average_weight,num_owners,users_rated,min_players,max_players,min_playtime,max_playtime,year_published
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
18291,3364,Unpublished Prototype,6.9658,2.5325,1655,987,0,0,0,0,0
23953,4361,Outside the Scope of BGG,6.73695,1.7,3267,640,0,0,0,0,0


* Note that two BGG IDs were removed from the dataset as they do not correspond to any existing board games, resulting in them having no associated mechanics or categories.

In [197]:
bgg_ranked = bgg_ranked.drop(index=['18291', '23953'])

In [519]:
ranked_5000 = bgg_ranked

In [508]:
#This calculation may take a while

recommendations_list = []

for row in ranked_5000.iterrows():
    recommendations = bgg_recommend_single(bgg_ranked, row[0])
    recommendations_list.append({
        'bgg_id':row[0],
        1:recommendations[0],
        2:recommendations[1],
        3:recommendations[2],
        4:recommendations[3],
        5:recommendations[4],
        6:recommendations[5],
        7:recommendations[6],
        8:recommendations[7],
        9:recommendations[8],
        10:recommendations[9],
    })

In [514]:
recommended_games = pd.DataFrame(recommendations_list).set_index('bgg_id')

In [19]:
recommended_games.head()

Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
224517,167791,28720,2651,310873,17133,300322,155873,111341,65901,32424
161936,266507,198928,30549,260428,192153,150658,234671,391163,342848,329939
174430,291457,205637,164153,264220,251661,104162,359609,273330,295785,322524
342942,167791,120677,341169,256960,276025,300322,305096,244711,343905,317511
233078,220308,184267,12493,337627,281655,41066,254127,254,211716,217990


* The dataframe above shows the top 10 recommendations for all 5000 board games in the collection.

In [520]:
ranked_5000 = pd.merge(ranked_5000,
                       recommended_games,
                       left_index=True,
                       right_index=True,
                       how='left')

In [25]:
ranked_5000.head()

Unnamed: 0_level_0,bgg_rank,name,average,average_weight,num_owners,users_rated,min_players,max_players,min_playtime,max_playtime,year_published,1,2,3,4,5,6,7,8,9,10
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
224517,1,Brass: Birmingham,8.60549,3.8916,58962,42488,2,4,60,120,2018,167791,28720,2651,310873,17133,300322,155873,111341,65901,32424
161936,2,Pandemic Legacy: Season 1,8.53167,2.8322,81047,52171,2,4,60,60,2015,266507,198928,30549,260428,192153,150658,234671,391163,342848,329939
174430,3,Gloomhaven,8.60523,3.9008,94566,60552,1,4,60,120,2017,291457,205637,164153,264220,251661,104162,359609,273330,295785,322524
342942,4,Ark Nova,8.53375,3.7398,55960,37952,1,4,90,150,2021,167791,120677,341169,256960,276025,300322,305096,244711,343905,317511
233078,5,Twilight Imperium: Fourth Edition,8.60721,4.3169,27551,22367,3,6,240,480,2017,220308,184267,12493,337627,281655,41066,254127,254,211716,217990


In [510]:
#Power Grid Recommendations

ranked_5000.loc[ranked_5000.loc['2651'][list(range(1,11))]]

Unnamed: 0_level_0,bgg_rank,name,average,average_weight,num_owners,users_rated,min_players,max_players,min_playtime,max_playtime,year_published,1,2,3,4,5,6,7,8,9,10
bgg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
224517,1,Brass: Birmingham,8.60549,3.8916,58962,42488,2,4,60,120,2018,167791,28720,2651,310873,17133,300322,155873,111341,65901,32424
28720,20,Brass: Lancashire,8.19013,3.8578,31500,24280,2,4,60,120,2007,224517,2651,310873,300322,155873,111341,65901,348554,32424,55952
247763,42,Underwater Cities,8.07203,3.5979,24507,18463,1,4,80,150,2018,224517,167791,28720,35677,159675,310873,300322,343905,319807,330950
245638,218,Coimbra,7.62732,3.2489,15373,11601,2,4,60,90,2018,276025,93,203993,73439,351913,244711,298383,283863,341945,330950
300322,262,Hallertau,7.9093,3.304,9434,5354,1,4,50,140,2020,177736,247763,102794,31260,35677,200680,159675,146886,256730,341945
155873,299,Power Grid Deluxe: Europe/North America,7.95417,3.1942,7237,4373,2,6,120,120,2014,224517,28720,247763,2651,300322,111341,256730,65901,341945,55952
111341,334,The Great Zimbabwe,7.83405,3.6853,5968,4632,2,5,90,150,2012,224517,2651,122515,4098,105551,23540,65901,253608,166571,32424
341945,1389,La Granja: Deluxe Master Set,8.37603,3.5745,3726,957,1,4,90,120,2023,224517,177736,2651,245638,146886,300322,303551,330950,286158,292894
55952,2906,Greed Incorporated,6.96063,3.581,1019,965,3,5,180,240,2009,224517,28720,2651,39683,105551,26566,65901,23094,32666,32424
257435,3971,Brick & Mortar,7.4468,3.2593,684,428,2,4,60,120,2021,2651,105551,26990,23540,253608,23094,341945,303551,32424,292187


* Power Grid, a classic board game renowned for its challenging strategic-economic theme, is well-reflected in the top 10 recommendations provided above, as they share a similar essence.

---

## Data Preparation for Tableau

Before we export our recommendations, there are some preprocessing that should be done to prepare our data for visualization in Tableau.

In [48]:
bgg_recommendations = ranked_5000.reset_index()[['bgg_id', 1,2,3,4,5,6,7,8,9,10]]\
.melt(id_vars='bgg_id', var_name='position', value_name='recom_bgg_id').sort_values(['bgg_id', 'position'])\
.reset_index(drop=True)

In [49]:
bgg_recommendations.head()

Unnamed: 0,bgg_id,position,recom_bgg_id
0,1,1,233078
1,1,2,291572
2,1,3,332686
3,1,4,1513
4,1,5,242722


* To make it easier for Tableau to read the data, we pivot back the data to have the top 10 recommendations as columns of the dataframe.

### [Plum pudding chart coordinates for ten points](http://hydra.nat.uni-magdeburg.de/packing/cci/cci10.html)

The top ten recommendations will be visualize using a plum pudding chart with coordinates obtained using the above link.

In [50]:
coords_dict = {
    1:(-0.262258924190165855095630653709, -0.689552138434555425611558523406),
    2:(0.262258924190165855095630653709,  -0.689552138434555425611558523406),
    3:(-0.654207495490543857031888931623,  -0.340990392505480262378855842125),
    4:(0.654207495490543857031888931623,  -0.340990392505480262378855842125),
    5:(0.000000000000000000000000000000, -0.235306356998833687832605108517),
    6:(-0.715460686241806569843043724712,   0.179938604472344027862805139398),
    7:(0.715460686241806569843043724712,   0.179938604472344027862805139398),
    8:(0.000000000000000000000000000000,   0.289211491381498022358656198900),
    9:(-0.415055617900124834285924684739,   0.609910427019080420778753583334),
    10:(0.415055617900124834285924684739,   0.609910427019080420778753583334),
   }

In [52]:
for group in bgg_recommendations.groupby('bgg_id'):
    #assign random coordinates for each of the top ten recommendations for the board games
    pos_index = np.random.choice(np.arange(1, 11), size=10, replace=False)
    bgg_recommendations.loc[bgg_recommendations['bgg_id'] == group[0], 'pos_index'] = pos_index
    
bgg_recommendations['x'] = bgg_recommendations['pos_index'].apply(lambda x: coords_dict[x][0])
bgg_recommendations['y'] = bgg_recommendations['pos_index'].apply(lambda x: coords_dict[x][1])

#The top 3 recommendations will be given emphasis in the visualization
bgg_recommendations['is_top3'] = bgg_recommendations['position'].apply(lambda x: True if x < 4 else False)

In [53]:
bgg_recommendations = bgg_recommendations.drop('pos_index', axis=1)

In [55]:
bgg_recommendations.head()

Unnamed: 0,bgg_id,position,recom_bgg_id,x,y,is_top3
0,1,1,233078,-0.715461,0.179939,True
1,1,2,291572,0.654207,-0.34099,True
2,1,3,332686,0.715461,0.179939,True
3,1,4,1513,-0.654207,-0.34099,False
4,1,5,242722,0.262259,-0.689552,False


* The following dataframe shows the final format of our recommendations per board game including the coordinates and the top 3 recommendations in the list.

In [550]:
bgg_recommendations.to_csv('../data/analytics/bgg_recommendations.csv', index=False)

---