In this notebook, we will be using the previously scrapped statistics with the python wrapper for the board game geek api. the documentation for ths wrapper can be found here: https://lcosmin.github.io/boardgamegeek/. We are pulling some information that is easier to obtain through the wrapper than just through the api. specifically we are looking for the description, minimum playtime, maximum playtime, and rank for each game.

In [57]:
import pandas as pd
import numpy as np
import datetime
import boardgamegeek
import pickle

To use the api wrapper we create a BGGClient which we will then pass the index of each game to create a game list. From this list we will pull our relevant information. This process is shown below. The client that we create will auto-monitor our request rate so we won't need to worry about monitoring our timers. Unfortunately, we were not able to use the wrapper for our entire scrape because we needed a list of game_ids to use the wrapper and the id numbers are intermingled with other things such as video games or rpg items on the boardgamegeek website. Passing non-board game items to the wrapper returns an error, but passing it through the api filters them out.

In [2]:
bgg = boardgamegeek.BGGClient()

In [54]:
df = pd.read_csv('cleaned_data/cleaned_statistics', index_col=0)
print(len(df))
df.head()

119978


Unnamed: 0,name,type,year,designer,artist,publisher,min_players,max_players,play_time,min_age,num_ratings,avg_rating,bayes_avg,weight,categories,mechanics,families
1,Die Macher,boardgame,1986.0,Karl-Heinz Schmiel,Marcus Gschwendtner,Hans im Glück,3.0,5.0,240.0,14.0,5172,7.63044,7.14476,4.3245,"['Economic', 'Negotiation', 'Political']","['Alliances', 'Area Majority / Influence', 'Au...","['Country: Germany', 'Political: Elections', '..."
2,Dragonmaster,boardgame,1981.0,"G. W. ""Jerry"" D'Arcey",Bob Pepper,E.S. Lowe,3.0,4.0,30.0,12.0,547,6.61736,5.79938,1.963,"['Card Game', 'Fantasy']",['Trick-taking'],['Creatures: Dragons']
3,Samurai,boardgame,1998.0,Reiner Knizia,Franz Vohwinkel,Fantasy Flight Games,2.0,4.0,60.0,10.0,14580,7.44945,7.24741,2.4899,"['Abstract Strategy', 'Medieval']","['Area Majority / Influence', 'Hand Management...","['Country: Japan', 'Series: Euro Classics (Rei..."
4,Tal der Könige,boardgame,1992.0,Christian Beierer,Thomas di Paolo,KOSMOS,2.0,4.0,60.0,12.0,339,6.60773,5.6973,2.6667,['Ancient'],"['Action Points', 'Area Majority / Influence',...","['Containers: Triangular Boxes', 'Country: Egy..."
5,Acquire,boardgame,1964.0,Sid Sackson,Scott Okumura,3M,2.0,6.0,90.0,12.0,18021,7.3413,7.15733,2.505,['Economic'],"['Hand Management', 'Income', 'Investment', 'M...",['Series: 3M Bookshelf Series']


In [4]:
start = datetime.datetime.now()

to_scrape = list(df.index)
game_dict = {}
while len(to_scrape) > 0:
    games = bgg.game_list(game_id_list=to_scrape[:1000])
    for game in games:
        game_id = game.id
        description = game.description
        min_playtime = game.min_playing_time
        max_playtime = game.max_playing_time
        try:
            bgg_rank = game.bgg_rank
        except:
            bgg_rank = np.nan
        try:
            boardgame_rank = game.boardgame_rank
        except:
            boardgame_rank = np.nan
        
        game_dict[game_id] = [description, min_playtime, max_playtime, bgg_rank, boardgame_rank]

        
    to_scrape = to_scrape[1000:]
    print(len(game_dict.keys()), len(to_scrape))
    
games_df = pd.DataFrame.from_dict(game_dict, orient='index', columns=['description', 'min_playtime',
                                                                      'max_playtime','bgg_rank', 'boardgame_rank'])
end = datetime.datetime.now()
print(f'Time elapsed: {end-start}')

1000 118978
2000 117978
3000 116978
4000 115978
5000 114978
6000 113978
7000 112978
8000 111978
9000 110978
10000 109978
11000 108978
12000 107978
13000 106978
14000 105978
15000 104978
16000 103978
17000 102978
18000 101978
19000 100978
20000 99978
21000 98978
22000 97978
23000 96978
24000 95978
25000 94978
26000 93978
27000 92978
28000 91978
29000 90978
30000 89978
31000 88978
32000 87978
33000 86978
34000 85978
35000 84978
36000 83978
37000 82978
38000 81978
39000 80978
40000 79978
41000 78978
42000 77978
43000 76978
44000 75978
45000 74978


API returned 503, retrying
API returned 503, retrying


BGGApiError: couldn't fetch data within the configured number of retries

In [None]:
game = bgg.game(game_id = 1)

In [8]:
len(to_scrape)

69978

In [None]:
len(game_dict)

Unfortunately the servers were down for some maintenance in the middle of my scrape, but I was able to pick it back up later. The following cell picks up where we left off to finish scrpaing.

In [28]:
# In case of error, this cell will start where the last left off
while len(to_scrape) > 0:
    games = bgg.game_list(game_id_list=to_scrape[:500])
    for game in games:
        game_id = game.id
        description = game.description
        min_playtime = game.min_playing_time
        max_playtime = game.max_playing_time
        try:
            bgg_rank = game.bgg_rank
        except:
            bgg_rank = np.nan
        try:
            boardgame_rank = game.boardgame_rank
        except:
            boardgame_rank = np.nan
        
        game_dict[game_id] = [description, min_playtime, max_playtime, bgg_rank, boardgame_rank]
    to_scrape = to_scrape[500:]
    print(len(game_dict.keys()), len(to_scrape))
    
games_df = pd.DataFrame.from_dict(game_dict, orient='index', columns=['description', 'min_playtime',
                                                                      'max_playtime','bgg_rank', 'boardgame_rank'])

85500 34478
86000 33978
86500 33478
87000 32978
87500 32478
88000 31978
88500 31478
89000 30978
89500 30478
90000 29978
90500 29478
91000 28978
91500 28478
92000 27978
92500 27478
93000 26978
93500 26478
94000 25978
94500 25478
95000 24978
95500 24478
96000 23978
96500 23478
97000 22978
97500 22478
98000 21978
98500 21478
99000 20978
99500 20478
100000 19978
100500 19478
101000 18978
101500 18478
102000 17978
102500 17478
103000 16978
103500 16478
104000 15978
104500 15478
105000 14978
105500 14478
106000 13978
106500 13478
107000 12978
107500 12478
108000 11978
108500 11478
109000 10978
109500 10478
110000 9978
110500 9478
111000 8978
111500 8478
112000 7978
112500 7478
113000 6978
113500 6478
114000 5978
114500 5478
115000 4978
115500 4478
116000 3978
116500 3478
117000 2978
117500 2478
118000 1978
118500 1478
119000 978
119500 478
119978 0


In [35]:
games_df = games_df.sort_index()
len(games_df)

119978

In [49]:
sum(df.index == games_df.index)

119978

In [37]:
games_with_descriptions = pd.concat([df, games_df], axis=1)
games_with_descriptions.head()

Unnamed: 0,name,type,year,designer,artist,publisher,min_players,max_players,play_time,min_age,...,bayes_avg,weight,categories,mechanics,families,description,min_playtime,max_playtime,bgg_rank,boardgame_rank
1,Die Macher,boardgame,1986.0,Karl-Heinz Schmiel,Marcus Gschwendtner,Hans im Glück,3.0,5.0,240.0,14.0,...,7.14476,4.3245,"['Economic', 'Negotiation', 'Political']","['Alliances', 'Area Majority / Influence', 'Au...","['Country: Germany', 'Political: Elections', '...",Die Macher is a game about seven sequential po...,240,240,275.0,275.0
2,Dragonmaster,boardgame,1981.0,"G. W. ""Jerry"" D'Arcey",Bob Pepper,E.S. Lowe,3.0,4.0,30.0,12.0,...,5.79938,1.963,"['Card Game', 'Fantasy']",['Trick-taking'],['Creatures: Dragons'],Dragonmaster is a trick-taking card game based...,30,30,3613.0,3613.0
3,Samurai,boardgame,1998.0,Reiner Knizia,Franz Vohwinkel,Fantasy Flight Games,2.0,4.0,60.0,10.0,...,7.24741,2.4899,"['Abstract Strategy', 'Medieval']","['Area Majority / Influence', 'Hand Management...","['Country: Japan', 'Series: Euro Classics (Rei...",Samurai is set in medieval Japan. Players comp...,30,60,206.0,206.0
4,Tal der Könige,boardgame,1992.0,Christian Beierer,Thomas di Paolo,KOSMOS,2.0,4.0,60.0,12.0,...,5.6973,2.6667,['Ancient'],"['Action Points', 'Area Majority / Influence',...","['Containers: Triangular Boxes', 'Country: Egy...",When you see the triangular box and the luxuri...,60,60,4780.0,4780.0
5,Acquire,boardgame,1964.0,Sid Sackson,Scott Okumura,3M,2.0,6.0,90.0,12.0,...,7.15733,2.505,['Economic'],"['Hand Management', 'Income', 'Investment', 'M...",['Series: 3M Bookshelf Series'],"In Acquire, each player strategically invests ...",90,90,264.0,264.0


We have our completed dataframe of game statistics including the descriptions of the games. We will save this as a pickle to pick up in our next notebook.

In [60]:
games_with_descriptions.to_pickle('cleaned_data/games_with_descriptions')