# SI618 Project 1 Team Formation, Data Description and Basic Manipulation

# Project Title — *World of Board Games: What make them popular and highly-rated?*

## Team Members
- Keqing Lu (keqinglu)
- Xiyuan Wang (denniswx)
- Fangqing Yuan (fangqing)

## Overview

By analyzing the data of more than 20,000 board games, we aim to analyze these games, evaluate what characteristics make them popular and highly rated, and fit models for popularity and rating base on statistical and machine learning techniques.

## Motivation

Board games have a long history, and are still evolving and immensely popular among modern players [**<font color = red>1</font>**](#reference). Many players have rated board games on online platforms, which give us access to measure how popular and appreciated by the players. Our motivation stems from the desire to delve deeper into the factors that contribute to the success and popularity of board games. By analyzing data from over 20,000 board games, we aim to unlock insights into what makes certain games stand out among the rest. Specifically, we aim to explore the following questions:
- What characteristics a board game has?
- What are the measurements for a board game's popularity and how well it is appreciated by its players?
- What factors, like playing time, minimum age, and number of players, affect the popularity of board games?
- What factors, like category and theme, affect the rating of board games?
- Are popular board games more highly rated, and vise versa?

## Data Source

The data files we use come from [kaggle](https://www.kaggle.com/datasets/joebeachcapital/board-games) [**<font color = red>2</font>**](#reference). The two datasets, `ratings.csv` and `details.csv`, according to kaggle page description, are acquired from [boardgamegeek.com](https://www.boardgamegeek.com/), which is an online forum for board gaming hobbyists and a game database that holds reviews, images and videos for over 125,600 different tabletop games, including European-style board games, wargames, and card games [**<font color = red>3</font>**](#reference).

`ratings.csv` contains the ratings information of each board game, and `details.csv` contains the detailed information, such as publish year, player info, and design info of the board games. By combining these two datasets, we will be able to study the relationship between the various factors of the board games and their ratings and popularity.

## Data Description

Based on the description on kaggle, the two datasets have following information:

`ratings.csv`:

<table>
<thead>
<tr>
<th>variable</th>
<th>class</th>
<th>description</th>
</tr>
</thead>
<tbody>
<tr>
<td>num</td>
<td>double</td>
<td>Game number</td>
</tr>
<tr>
<td>id</td>
<td>double</td>
<td>Game ID</td>
</tr>
<tr>
<td>name</td>
<td>character</td>
<td>Game name</td>
</tr>
<tr>
<td>year</td>
<td>double</td>
<td>Game year</td>
</tr>
<tr>
<td>rank</td>
<td>double</td>
<td>Game rank</td>
</tr>
<tr>
<td>average</td>
<td>double</td>
<td>Average rating</td>
</tr>
<tr>
<td>bayes_average</td>
<td>double</td>
<td>Bayes average rating</td>
</tr>
<tr>
<td>users_rated</td>
<td>double</td>
<td>Users rated</td>
</tr>
<tr>
<td>url</td>
<td>character</td>
<td>Game url</td>
</tr>
<tr>
<td>thumbnail</td>
<td>character</td>
<td>Game thumbnail</td>
</tr>
</tbody>
</table>


`details.csv`:

<table>
<thead>
<tr>
<th>variable</th>
<th>class</th>
<th>description</th>
</tr>
</thead>
<tbody>
<tr>
<td>num</td>
<td>double</td>
<td>Game number</td>
</tr>
<tr>
<td>id</td>
<td>double</td>
<td>Game ID</td>
</tr>
<tr>
<td>primary</td>
<td>character</td>
<td>Primary name</td>
</tr>
<tr>
<td>description</td>
<td>character</td>
<td>Description of game</td>
</tr>
<tr>
<td>yearpublished</td>
<td>double</td>
<td>Year published</td>
</tr>
<tr>
<td>minplayers</td>
<td>double</td>
<td>Min n of players</td>
</tr>
<tr>
<td>maxplayers</td>
<td>double</td>
<td>Max n of players</td>
</tr>
<tr>
<td>playingtime</td>
<td>double</td>
<td>Playing time in minutes</td>
</tr>
<tr>
<td>minplaytime</td>
<td>double</td>
<td>Min play time</td>
</tr>
<tr>
<td>maxplaytime</td>
<td>double</td>
<td>Max plat tome</td>
</tr>
<tr>
<td>minage</td>
<td>double</td>
<td>minimum age</td>
</tr>
<tr>
<td>boardgamecategory</td>
<td>character</td>
<td>Category</td>
</tr>
<tr>
<td>boardgamemechanic</td>
<td>character</td>
<td>Mechanic</td>
</tr>
<tr>
<td>boardgamefamily</td>
<td>character</td>
<td>Board game family</td>
</tr>
<tr>
<td>boardgameexpansion</td>
<td>character</td>
<td>Expansion</td>
</tr>
<tr>
<td>boardgameimplementation</td>
<td>character</td>
<td>Implementation</td>
</tr>
<tr>
<td>boardgamedesigner</td>
<td>character</td>
<td>Designer</td>
</tr>
<tr>
<td>boardgameartist</td>
<td>character</td>
<td>Artist</td>
</tr>
<tr>
<td>boardgamepublisher</td>
<td>character</td>
<td>Publisher</td>
</tr>
<tr>
<td>owned</td>
<td>double</td>
<td>Num owned</td>
</tr>
<tr>
<td>trading</td>
<td>double</td>
<td>Num trading</td>
</tr>
<tr>
<td>wanting</td>
<td>double</td>
<td>Num wanting</td>
</tr>
<tr>
<td>wishing</td>
<td>double</td>
<td>Num wishing</td>
</tr>
</tbody>
</table>

## Data Manipulation

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [8]:
ratings = pd.read_csv(r'archive//ratings.csv')
ratings.head()

Unnamed: 0,num,id,name,year,rank,average,bayes_average,users_rated,url,thumbnail
0,105,30549,Pandemic,2008,106,7.59,7.487,108975,/boardgame/30549/pandemic,https://cf.geekdo-images.com/S3ybV1LAp-8SnHIXL...
1,189,822,Carcassonne,2000,190,7.42,7.309,108738,/boardgame/822/carcassonne,https://cf.geekdo-images.com/okM0dq_bEXnbyQTOv...
2,428,13,Catan,1995,429,7.14,6.97,108024,/boardgame/13/catan,https://cf.geekdo-images.com/W3Bsga_uLP9kO91gZ...
3,72,68448,7 Wonders,2010,73,7.74,7.634,89982,/boardgame/68448/7-wonders,https://cf.geekdo-images.com/RvFVTEpnbb4NM7k0I...
4,103,36218,Dominion,2008,104,7.61,7.499,81561,/boardgame/36218/dominion,https://cf.geekdo-images.com/j6iQpZ4XkemZP07HN...


In [7]:
details = pd.read_csv(r'archive//details.csv')
details.head()

Unnamed: 0,num,id,primary,description,yearpublished,minplayers,maxplayers,playingtime,minplaytime,maxplaytime,...,boardgamefamily,boardgameexpansion,boardgameimplementation,boardgamedesigner,boardgameartist,boardgamepublisher,owned,trading,wanting,wishing
0,0,30549,Pandemic,"In Pandemic, several virulent diseases have br...",2008,2,4,45,45,45,...,"['Components: Map (Global Scale)', 'Components...",['Pandemic: Gen Con 2016 Promos – Z-Force Team...,"['Pandemic Legacy: Season 0', 'Pandemic Legacy...",['Matt Leacock'],"['Josh Cappel', 'Christian Hanisch', 'Régis Mo...","['Z-Man Games', 'Albi', 'Asmodee', 'Asmodee It...",168364,2508,625,9344
1,1,822,Carcassonne,Carcassonne is a tile-placement game in which ...,2000,2,5,45,30,45,...,"['Cities: Carcassonne (France)', 'Components: ...","['20 Jahre Darmstadt Spielt', 'Apothecaries (f...","['The Ark of the Covenant', 'Carcassonne für 2...",['Klaus-Jürgen Wrede'],"['Doris Matthäus', 'Anne Pätzke', 'Chris Quill...","['Hans im Glück', '999 Games', 'Albi', 'Bard C...",161299,1716,582,7383
2,2,13,Catan,"In CATAN (formerly The Settlers of Catan), pla...",1995,3,4,120,60,120,...,"['Animals: Sheep', 'Components: Hexagonal Tile...","['20 Jahre Darmstadt Spielt', 'Brettspiel Adve...","['Baden-Württemberg Catan', 'Catan Geographies...",['Klaus Teuber'],"['Volkan Baga', 'Tanja Donner', 'Pete Fenlon',...","['KOSMOS', '999 Games', 'Albi', 'Asmodee', 'As...",167733,2018,485,5890
3,3,68448,7 Wonders,You are the leader of one of the 7 great citie...,2010,2,7,30,30,30,...,"['Ancient: Babylon', 'Ancient: Egypt', 'Ancien...","['7 Wonders: Armada', '7 Wonders: Babel', '7 W...","['7 Wonders (Second Edition)', '7 Wonders Duel...",['Antoine Bauza'],"['Dimitri Chappuis', 'Miguel Coimbra', 'Etienn...","['Repos Production', 'ADC Blackfire Entertainm...",120466,1567,1010,12105
4,4,36218,Dominion,"&quot;You are a monarch, like your parents bef...",2008,2,4,30,30,30,...,"['Crowdfunding: Wspieram', 'Game: Dominion', '...","['Ancient Times (fan expansion for Dominion)',...","['Dominion (Second Edition)', 'Het Koninkrijk ...",['Donald X. Vaccarino'],"['Matthias Catrein', 'Julien Delval', 'Tomasz ...","['Rio Grande Games', '999 Games', 'Albi', 'Bar...",106956,2009,655,8621


In [10]:
ratings.shape

(21831, 10)

In [9]:
ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21831 entries, 0 to 21830
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   num            21831 non-null  int64  
 1   id             21831 non-null  int64  
 2   name           21831 non-null  object 
 3   year           21831 non-null  int64  
 4   rank           21831 non-null  int64  
 5   average        21831 non-null  float64
 6   bayes_average  21831 non-null  float64
 7   users_rated    21831 non-null  int64  
 8   url            21831 non-null  object 
 9   thumbnail      21825 non-null  object 
dtypes: float64(2), int64(5), object(3)
memory usage: 1.7+ MB


In [11]:
details.shape

(21631, 23)

In [12]:
details.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21631 entries, 0 to 21630
Data columns (total 23 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   num                      21631 non-null  int64 
 1   id                       21631 non-null  int64 
 2   primary                  21631 non-null  object
 3   description              21630 non-null  object
 4   yearpublished            21631 non-null  int64 
 5   minplayers               21631 non-null  int64 
 6   maxplayers               21631 non-null  int64 
 7   playingtime              21631 non-null  int64 
 8   minplaytime              21631 non-null  int64 
 9   maxplaytime              21631 non-null  int64 
 10  minage                   21631 non-null  int64 
 11  boardgamecategory        21348 non-null  object
 12  boardgamemechanic        20041 non-null  object
 13  boardgamefamily          17870 non-null  object
 14  boardgameexpansion       5506 non-null

## Data Visualization

<a id='reference'>
<strong>Reference</strong>
</a>

- [1] https://gitnux.org/board-game-popularity-statistics/
- [2] https://www.kaggle.com/datasets/joebeachcapital/board-games/data
- [3] https://en.wikipedia.org/wiki/BoardGameGeek