Last Modified: September 5, 2019

Description: This program scraps the data from boardgamegeek.com. It grabs usernames, game names, gameids, and ratings.

This code was taken from the following link and slightly modified for my purposes.

https://sdsawtelle.github.io/blog/output/boardgamegeek-data-scraping.html

The code is explained in detail above and the project is well worth the time to read.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from bs4 import BeautifulSoup
import requests
import scipy.io
import pandas as pd
import numpy as np
import pickle
from time import sleep
import timeit
from math import ceil

I wanted a progress bar to tell me how close to finished I was. This was taken from https://stackoverflow.com/questions/3160699/python-progress-bar

In [2]:
def show_progress_bar(bar_length, completed, total):

    '''
    Produces progress bar to let user know how far along the simulation is.

    This function was pulled from https://stackoverflow.com/questions/3160699/python-progress-bar.

    PARAMETERS:
    bar_length: (int) Number of '#' in bar
    competed: (int) Number of iterations finished 
    total: (int) Total number of iterations to finish the simulation
    '''
    
    bar_length_unit_value = (total / bar_length)
    completed_bar_part = ceil(completed / bar_length_unit_value) 
    progress = '*'*completed_bar_part
    remaining = ' '*(bar_length - completed_bar_part)
    percent_done = '%.2f' % ((completed / total) * 100)
    print(f'[{progress}{remaining}] {percent_done}%', end='\r')

This function handles the requests from the server. If the server gives an error, it retries. 

A key addition is the timeout. Typically the request.get() call will try forever to get the information. The timeout specifies that it should cancel the request at a certain point. This triggers the retry and attempts to get the data again.

In [3]:
def request(msg, slp=1):
    status_code=0  
    while status_code!=200:
        sleep(slp)
        try:
            out=requests.get(msg,timeout=60)
            status_code=out.status_code
            if status_code!=200:
                print('Server Error! Response Code %i. Retrying...' % (out.status_code))
        except:
            print('Retrying...\nWaiting one seconds...')
            sleep(1)
    return out

A dataframe is generated to store the gameid, name, and the total number of ratings of that game. 

It gets games with at least 100 ratings.

In [4]:
df_all=pd.DataFrame(columns=['id', 'name', 'nrate'])
min_nrate=10**6
npage=1

while min_nrate>99:
    r=request('https://boardgamegeek.com/browse/boardgame/page/%i?sort=numvoters&sortdir=desc' % (npage,))
    soup=BeautifulSoup(r.text, 'html.parser')    
    
    
    table=soup.find_all('tr', attrs={'id': 'row_'}) 
    df=pd.DataFrame(columns=['id', 'name', 'nrate'], index=range(len(table)))  
    
    for idx, row in enumerate(table):
        links=row.find_all('a')
        if 'name' in links[0].attrs.keys():
            del links[0]
        gamelink=links[1]  # Get the relative URL for the specific game
        gameid=int(gamelink['href'].split('/')[2])  # Get the game ID by parsing the relative URL
        gamename=gamelink.contents[0]  # Get the actual name of the game as the link contents


        ratings_str=row.find_all('td', attrs={'class': 'collection_bggrating'})[2].contents[0]
        nratings=int(''.join(ratings_str.split()))

        df.iloc[idx, :]=[gameid, gamename, nratings]

    min_nrate=df['nrate'].min()  # The smallest number of ratings of any game on the page
    print('Page %i scraped, minimum number of ratings was %i' % (npage, min_nrate))
    df_all=pd.concat([df_all, df], axis=0)
    npage+=1
    sleep(1) 

Page 1 scraped, minimum number of ratings was 19053
Page 2 scraped, minimum number of ratings was 12296
Page 3 scraped, minimum number of ratings was 9043
Page 4 scraped, minimum number of ratings was 7128
Page 5 scraped, minimum number of ratings was 5918
Page 6 scraped, minimum number of ratings was 4860
Page 7 scraped, minimum number of ratings was 4179
Page 8 scraped, minimum number of ratings was 3699
Page 9 scraped, minimum number of ratings was 3264
Page 10 scraped, minimum number of ratings was 2912
Page 11 scraped, minimum number of ratings was 2666
Page 12 scraped, minimum number of ratings was 2428
Page 13 scraped, minimum number of ratings was 2211
Page 14 scraped, minimum number of ratings was 2036
Page 15 scraped, minimum number of ratings was 1879
Page 16 scraped, minimum number of ratings was 1750
Page 17 scraped, minimum number of ratings was 1644
Page 18 scraped, minimum number of ratings was 1560
Page 19 scraped, minimum number of ratings was 1466
Page 20 scraped, mi

The dataframe is then saved to a csv file.

In [5]:
df=df_all.copy()
df.reset_index(inplace=True, drop=True)
df.to_csv('bgg_gamelist.csv', index=False)
df.head()

Unnamed: 0,id,name,nrate
0,13,Catan,87271
1,822,Carcassonne,86952
2,30549,Pandemic,85739
3,68448,7 Wonders,70996
4,36218,Dominion,69531


There are 100 ratings per page, so another column of the number of full pages of ratings is stored.

In [6]:
df['nfullpages'] = (df['nrate']-50).apply(round, ndigits=-2)//100  # Round DOWN to nearest 100
df.head()

Unnamed: 0,id,name,nrate,nfullpages
0,13,Catan,87271,872
1,822,Carcassonne,86952,869
2,30549,Pandemic,85739,857
3,68448,7 Wonders,70996,709
4,36218,Dominion,69531,695


A new dataframe df_ratings is declared to store usernames, gameids, and ratings. The code loops over a group of 20 games and stores the ratings for all of the full pages.

These ratings are periodically saved to a csv.

In [7]:
bar_counter=0
bar_total=len(df)//20

for nm, grp in df.groupby(np.arange(len(df))//20):
    df_ratings=pd.DataFrame(columns=['username', 'gameid', 'rating'], index=range(grp['nrate'].sum()+10**3))

    dfidx_start=0
    dfidx=0
    
    pagenum=1
    while len(grp[grp['nfullpages'] > 0]) > 0: 
        active_games=grp[grp['nfullpages'] > 0]

        id_list=[]
        for game in active_games['id']:
            id_list+=[game]*100
        dfidx_end=dfidx_start+len(active_games)*100
        df_ratings.iloc[dfidx_start:dfidx_end, df_ratings.columns.get_loc('gameid')] = id_list

        id_strs=[str(gid) for gid in active_games['id']]
        gameids=','.join(id_strs)
        sleep(0.5)  
        r=request('http://www.boardgamegeek.com/xmlapi2/thing?id=%s&ratingcomments=1&page=%i' % (gameids, pagenum))

        
        soup=BeautifulSoup(r.text, 'xml')
        comments=soup('comment')
        l1=[0]*len(active_games)*100
        l2=[0]*len(active_games)*100
        j=0
        for comm in comments:
            l1[j]=comm['username']
            l2[j]=float(comm['rating'])
            j+=1
        df_ratings.iloc[dfidx_start:dfidx_end, df_ratings.columns.get_loc('username')] = l1
        df_ratings.iloc[dfidx_start:dfidx_end, df_ratings.columns.get_loc('rating')] = l2

        
        grp['nfullpages']-=1  
        dfidx_start=dfidx_end     
        pagenum+=1  
        print('pagenum updated to', pagenum)
    
    print('\nBatch Complete!\n')
    df_ratings=df_ratings.dropna(how='all')
    df_ratings.to_csv('bgg_ratings.csv', mode='a',index=False)
    del df_ratings
    bar_counter+=1
    show_progress_bar(100,bar_counter,bar_total)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


pagenum updated to 2
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
pagenum updated to 8
pagenum updated to 9
pagenum updated to 10
pagenum updated to 11
pagenum updated to 12
pagenum updated to 13
pagenum updated to 14
pagenum updated to 15
pagenum updated to 16
pagenum updated to 17
pagenum updated to 18
pagenum updated to 19
pagenum updated to 20
pagenum updated to 21
Retrying...
Waiting one seconds...
pagenum updated to 22
pagenum updated to 23
pagenum updated to 24
pagenum updated to 25
pagenum updated to 26
pagenum updated to 27
pagenum updated to 28
pagenum updated to 29
pagenum updated to 30
pagenum updated to 31
pagenum updated to 32
pagenum updated to 33
pagenum updated to 34
pagenum updated to 35
pagenum updated to 36
pagenum updated to 37
pagenum updated to 38
pagenum updated to 39
pagenum updated to 40
pagenum updated to 41
pagenum updated to 42
pagenum updated to 43
pagenum updated to 44
pagenum updated to 45
pagen

pagenum updated to 362
pagenum updated to 363
pagenum updated to 364
pagenum updated to 365
pagenum updated to 366
pagenum updated to 367
pagenum updated to 368
pagenum updated to 369
pagenum updated to 370
pagenum updated to 371
pagenum updated to 372
pagenum updated to 373
pagenum updated to 374
pagenum updated to 375
pagenum updated to 376
pagenum updated to 377
pagenum updated to 378
pagenum updated to 379
pagenum updated to 380
pagenum updated to 381
pagenum updated to 382
pagenum updated to 383
pagenum updated to 384
pagenum updated to 385
pagenum updated to 386
pagenum updated to 387
pagenum updated to 388
pagenum updated to 389
pagenum updated to 390
pagenum updated to 391
pagenum updated to 392
pagenum updated to 393
pagenum updated to 394
pagenum updated to 395
pagenum updated to 396
pagenum updated to 397
pagenum updated to 398
pagenum updated to 399
pagenum updated to 400
pagenum updated to 401
pagenum updated to 402
pagenum updated to 403
pagenum updated to 404
pagenum upd

pagenum updated to 719
pagenum updated to 720
pagenum updated to 721
pagenum updated to 722
pagenum updated to 723
pagenum updated to 724
pagenum updated to 725
pagenum updated to 726
pagenum updated to 727
pagenum updated to 728
pagenum updated to 729
pagenum updated to 730
pagenum updated to 731
pagenum updated to 732
pagenum updated to 733
pagenum updated to 734
pagenum updated to 735
pagenum updated to 736
pagenum updated to 737
pagenum updated to 738
pagenum updated to 739
pagenum updated to 740
pagenum updated to 741
pagenum updated to 742
pagenum updated to 743
pagenum updated to 744
pagenum updated to 745
pagenum updated to 746
pagenum updated to 747
pagenum updated to 748
pagenum updated to 749
pagenum updated to 750
pagenum updated to 751
pagenum updated to 752
pagenum updated to 753
pagenum updated to 754
pagenum updated to 755
pagenum updated to 756
pagenum updated to 757
pagenum updated to 758
pagenum updated to 759
pagenum updated to 760
pagenum updated to 761
pagenum upd

pagenum updated to 204
pagenum updated to 205
pagenum updated to 206
pagenum updated to 207
pagenum updated to 208
pagenum updated to 209
pagenum updated to 210
pagenum updated to 211
pagenum updated to 212
pagenum updated to 213
pagenum updated to 214
pagenum updated to 215
pagenum updated to 216
pagenum updated to 217
pagenum updated to 218
pagenum updated to 219
pagenum updated to 220
pagenum updated to 221
pagenum updated to 222
pagenum updated to 223
pagenum updated to 224
pagenum updated to 225
pagenum updated to 226
pagenum updated to 227
pagenum updated to 228
pagenum updated to 229
pagenum updated to 230
pagenum updated to 231
pagenum updated to 232
pagenum updated to 233
pagenum updated to 234
pagenum updated to 235
pagenum updated to 236
pagenum updated to 237
pagenum updated to 238
pagenum updated to 239
pagenum updated to 240
pagenum updated to 241
pagenum updated to 242
pagenum updated to 243
pagenum updated to 244
pagenum updated to 245
pagenum updated to 246
pagenum upd

pagenum updated to 148
pagenum updated to 149
pagenum updated to 150
pagenum updated to 151
pagenum updated to 152
pagenum updated to 153
pagenum updated to 154
pagenum updated to 155
pagenum updated to 156
pagenum updated to 157
pagenum updated to 158
pagenum updated to 159
pagenum updated to 160
pagenum updated to 161
pagenum updated to 162
pagenum updated to 163
pagenum updated to 164
pagenum updated to 165
pagenum updated to 166
pagenum updated to 167
pagenum updated to 168
pagenum updated to 169
pagenum updated to 170
pagenum updated to 171
pagenum updated to 172
pagenum updated to 173
pagenum updated to 174
pagenum updated to 175
pagenum updated to 176
pagenum updated to 177
pagenum updated to 178
pagenum updated to 179
pagenum updated to 180
pagenum updated to 181
pagenum updated to 182
pagenum updated to 183
pagenum updated to 184
pagenum updated to 185
pagenum updated to 186
pagenum updated to 187
pagenum updated to 188
pagenum updated to 189
pagenum updated to 190
pagenum upd

pagenum updated to 210
pagenum updated to 211
pagenum updated to 212
pagenum updated to 213
pagenum updated to 214
pagenum updated to 215
pagenum updated to 216
pagenum updated to 217
pagenum updated to 218
pagenum updated to 219
pagenum updated to 220
pagenum updated to 221
pagenum updated to 222
pagenum updated to 223
pagenum updated to 224
pagenum updated to 225
pagenum updated to 226
pagenum updated to 227
pagenum updated to 228
pagenum updated to 229
pagenum updated to 230
pagenum updated to 231
pagenum updated to 232
pagenum updated to 233
pagenum updated to 234
pagenum updated to 235
pagenum updated to 236
pagenum updated to 237
pagenum updated to 238
pagenum updated to 239
pagenum updated to 240
pagenum updated to 241
pagenum updated to 242
pagenum updated to 243
pagenum updated to 244
pagenum updated to 245
pagenum updated to 246
pagenum updated to 247
Batch Complete!
pagenum updated to 2                                                                                 ] 0.63%
p

pagenum updated to 107
pagenum updated to 108
pagenum updated to 109
pagenum updated to 110
pagenum updated to 111
pagenum updated to 112
pagenum updated to 113
pagenum updated to 114
pagenum updated to 115
pagenum updated to 116
pagenum updated to 117
pagenum updated to 118
pagenum updated to 119
pagenum updated to 120
pagenum updated to 121
pagenum updated to 122
pagenum updated to 123
pagenum updated to 124
pagenum updated to 125
pagenum updated to 126
pagenum updated to 127
pagenum updated to 128
pagenum updated to 129
pagenum updated to 130
pagenum updated to 131
pagenum updated to 132
pagenum updated to 133
pagenum updated to 134
pagenum updated to 135
pagenum updated to 136
pagenum updated to 137
pagenum updated to 138
pagenum updated to 139
pagenum updated to 140
pagenum updated to 141
pagenum updated to 142
pagenum updated to 143
pagenum updated to 144
pagenum updated to 145
pagenum updated to 146
pagenum updated to 147
pagenum updated to 148
pagenum updated to 149
pagenum upd

pagenum updated to 102
pagenum updated to 103
pagenum updated to 104
pagenum updated to 105
pagenum updated to 106
pagenum updated to 107
pagenum updated to 108
pagenum updated to 109
pagenum updated to 110
pagenum updated to 111
pagenum updated to 112
pagenum updated to 113
pagenum updated to 114
pagenum updated to 115
pagenum updated to 116
pagenum updated to 117
pagenum updated to 118
pagenum updated to 119
pagenum updated to 120
pagenum updated to 121
pagenum updated to 122
pagenum updated to 123
pagenum updated to 124
pagenum updated to 125
pagenum updated to 126
pagenum updated to 127
pagenum updated to 128
pagenum updated to 129
pagenum updated to 130
pagenum updated to 131
pagenum updated to 132
pagenum updated to 133
pagenum updated to 134
pagenum updated to 135
pagenum updated to 136
pagenum updated to 137
pagenum updated to 138
pagenum updated to 139
pagenum updated to 140
pagenum updated to 141
pagenum updated to 142
pagenum updated to 143
pagenum updated to 144
pagenum upd

pagenum updated to 22
pagenum updated to 23
pagenum updated to 24
pagenum updated to 25
pagenum updated to 26
pagenum updated to 27
pagenum updated to 28
pagenum updated to 29
pagenum updated to 30
pagenum updated to 31
pagenum updated to 32
pagenum updated to 33
pagenum updated to 34
pagenum updated to 35
pagenum updated to 36
pagenum updated to 37
pagenum updated to 38
pagenum updated to 39
pagenum updated to 40
pagenum updated to 41
pagenum updated to 42
pagenum updated to 43
pagenum updated to 44
pagenum updated to 45
pagenum updated to 46
pagenum updated to 47
pagenum updated to 48
pagenum updated to 49
pagenum updated to 50
pagenum updated to 51
pagenum updated to 52
pagenum updated to 53
pagenum updated to 54
pagenum updated to 55
pagenum updated to 56
pagenum updated to 57
pagenum updated to 58
pagenum updated to 59
pagenum updated to 60
pagenum updated to 61
pagenum updated to 62
pagenum updated to 63
pagenum updated to 64
pagenum updated to 65
pagenum updated to 66
pagenum up

pagenum updated to 34
pagenum updated to 35
pagenum updated to 36
pagenum updated to 37
pagenum updated to 38
pagenum updated to 39
pagenum updated to 40
pagenum updated to 41
pagenum updated to 42
pagenum updated to 43
pagenum updated to 44
pagenum updated to 45
pagenum updated to 46
pagenum updated to 47
pagenum updated to 48
pagenum updated to 49
pagenum updated to 50
pagenum updated to 51
pagenum updated to 52
pagenum updated to 53
pagenum updated to 54
pagenum updated to 55
pagenum updated to 56
pagenum updated to 57
pagenum updated to 58
pagenum updated to 59
pagenum updated to 60
pagenum updated to 61
pagenum updated to 62
pagenum updated to 63
pagenum updated to 64
pagenum updated to 65
pagenum updated to 66
pagenum updated to 67
pagenum updated to 68
pagenum updated to 69
pagenum updated to 70
pagenum updated to 71
pagenum updated to 72
pagenum updated to 73
pagenum updated to 74
pagenum updated to 75
pagenum updated to 76
pagenum updated to 77
pagenum updated to 78
pagenum up

pagenum updated to 18
pagenum updated to 19
pagenum updated to 20
pagenum updated to 21
pagenum updated to 22
pagenum updated to 23
pagenum updated to 24
pagenum updated to 25
pagenum updated to 26
pagenum updated to 27
pagenum updated to 28
pagenum updated to 29
pagenum updated to 30
pagenum updated to 31
pagenum updated to 32
pagenum updated to 33
pagenum updated to 34
pagenum updated to 35
pagenum updated to 36
pagenum updated to 37
pagenum updated to 38
pagenum updated to 39
pagenum updated to 40
pagenum updated to 41
pagenum updated to 42
pagenum updated to 43
pagenum updated to 44
pagenum updated to 45
pagenum updated to 46
pagenum updated to 47
pagenum updated to 48
pagenum updated to 49
pagenum updated to 50
pagenum updated to 51
pagenum updated to 52
pagenum updated to 53
pagenum updated to 54
pagenum updated to 55
pagenum updated to 56
pagenum updated to 57
pagenum updated to 58
pagenum updated to 59
pagenum updated to 60
pagenum updated to 61
pagenum updated to 62
pagenum up

pagenum updated to 45
pagenum updated to 46
pagenum updated to 47
pagenum updated to 48
pagenum updated to 49
pagenum updated to 50
pagenum updated to 51
pagenum updated to 52
pagenum updated to 53
pagenum updated to 54
Batch Complete!
pagenum updated to 2                                                                                 ] 4.44%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
pagenum updated to 8
pagenum updated to 9
pagenum updated to 10
pagenum updated to 11
pagenum updated to 12
pagenum updated to 13
pagenum updated to 14
pagenum updated to 15
pagenum updated to 16
pagenum updated to 17
pagenum updated to 18
pagenum updated to 19
pagenum updated to 20
pagenum updated to 21
pagenum updated to 22
pagenum updated to 23
pagenum updated to 24
pagenum updated to 25
pagenum updated to 26
pagenum updated to 27
pagenum updated to 28
pagenum updated to 29
pagenum updated to 30
pagenum updated to 31
pagenum updated to 32
pa

pagenum updated to 34
pagenum updated to 35
Batch Complete!
pagenum updated to 2                                                                                 ] 6.98%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
pagenum updated to 8
pagenum updated to 9
pagenum updated to 10
pagenum updated to 11
pagenum updated to 12
pagenum updated to 13
pagenum updated to 14
pagenum updated to 15
pagenum updated to 16
pagenum updated to 17
pagenum updated to 18
pagenum updated to 19
pagenum updated to 20
pagenum updated to 21
pagenum updated to 22
pagenum updated to 23
pagenum updated to 24
pagenum updated to 25
pagenum updated to 26
pagenum updated to 27
pagenum updated to 28
pagenum updated to 29
pagenum updated to 30
pagenum updated to 31
pagenum updated to 32
pagenum updated to 33
pagenum updated to 34
Batch Complete!
pagenum updated to 2                                                                                 ] 7.14%
pagenum u

pagenum updated to 25
pagenum updated to 26
pagenum updated to 27
pagenum updated to 28
Batch Complete!
pagenum updated to 2                                                                                 ] 8.73%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
pagenum updated to 8
pagenum updated to 9
pagenum updated to 10
pagenum updated to 11
pagenum updated to 12
pagenum updated to 13
pagenum updated to 14
pagenum updated to 15
pagenum updated to 16
pagenum updated to 17
pagenum updated to 18
pagenum updated to 19
pagenum updated to 20
pagenum updated to 21
pagenum updated to 22
pagenum updated to 23
pagenum updated to 24
pagenum updated to 25
pagenum updated to 26
pagenum updated to 27
Batch Complete!
pagenum updated to 2                                                                                 ] 8.89%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
pagenum update

pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
pagenum updated to 8
pagenum updated to 9
pagenum updated to 10
pagenum updated to 11
pagenum updated to 12
pagenum updated to 13
pagenum updated to 14
pagenum updated to 15
pagenum updated to 16
pagenum updated to 17
pagenum updated to 18
pagenum updated to 19
pagenum updated to 20
pagenum updated to 21
Batch Complete!
pagenum updated to 2                                                                                 ] 10.95%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
pagenum updated to 8
pagenum updated to 9
pagenum updated to 10
pagenum updated to 11
pagenum updated to 12
pagenum updated to 13
pagenum updated to 14
pagenum updated to 15
pagenum updated to 16
pagenum updated to 17
pagenum updated to 18
pagenum updated to 19
pagenum updated to 20
pagenum updated to 21
Batch Complete!
pagenum updated to 2                                                          

pagenum updated to 15
pagenum updated to 16
pagenum updated to 17
Batch Complete!
pagenum updated to 2                                                                                 ] 13.49%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
pagenum updated to 8
pagenum updated to 9
pagenum updated to 10
pagenum updated to 11
pagenum updated to 12
pagenum updated to 13
pagenum updated to 14
pagenum updated to 15
pagenum updated to 16
pagenum updated to 17
Batch Complete!
pagenum updated to 2                                                                                 ] 13.65%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
pagenum updated to 8
pagenum updated to 9
pagenum updated to 10
pagenum updated to 11
pagenum updated to 12
pagenum updated to 13
pagenum updated to 14
pagenum updated to 15
pagenum updated to 16
pagenum updated to 17
Batch Complete!
pagenum updated to 2

pagenum updated to 11
pagenum updated to 12
pagenum updated to 13
pagenum updated to 14
Batch Complete!
pagenum updated to 2                                                                                 ] 16.67%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
pagenum updated to 8
pagenum updated to 9
pagenum updated to 10
pagenum updated to 11
pagenum updated to 12
pagenum updated to 13
pagenum updated to 14
Batch Complete!
pagenum updated to 2                                                                                 ] 16.83%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
pagenum updated to 8
pagenum updated to 9
pagenum updated to 10
pagenum updated to 11
pagenum updated to 12
pagenum updated to 13
pagenum updated to 14
Batch Complete!
pagenum updated to 2                                                                                 ] 16.98%
pagenum updated to 3

pagenum updated to 8
pagenum updated to 9
pagenum updated to 10
pagenum updated to 11
Batch Complete!
pagenum updated to 2**                                                                               ] 20.48%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
pagenum updated to 8
pagenum updated to 9
pagenum updated to 10
pagenum updated to 11
Batch Complete!
pagenum updated to 2**                                                                               ] 20.63%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
pagenum updated to 8
pagenum updated to 9
pagenum updated to 10
pagenum updated to 11
Batch Complete!
pagenum updated to 2**                                                                               ] 20.79%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
pagenum updated to 8
pagenum updated to 9
pagenum

pagenum updated to 2************                                                                     ] 30.32%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
Batch Complete!
pagenum updated to 2************                                                                     ] 30.48%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
Batch Complete!
pagenum updated to 2************                                                                     ] 30.63%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
Batch Complete!
pagenum updated to 2************                                                                     ] 30.79%
pagenum updated to 3
pagenum updated to 4
pagenum updated to 5
pagenum updated to 6
pagenum updated to 7
Batch Complete!
pagenum updated to 2************                                            

pagenum updated to 2*************************                                                        ] 43.65%
pagenum updated to 3
pagenum updated to 4
Batch Complete!
pagenum updated to 2*************************                                                        ] 43.81%
pagenum updated to 3
pagenum updated to 4
Batch Complete!
pagenum updated to 2*************************                                                        ] 43.97%
pagenum updated to 3
pagenum updated to 4
Batch Complete!
pagenum updated to 2**************************                                                       ] 44.13%
pagenum updated to 3
pagenum updated to 4
Batch Complete!
pagenum updated to 2**************************                                                       ] 44.29%
pagenum updated to 3
pagenum updated to 4
Batch Complete!
pagenum updated to 2**************************                                                       ] 44.44%
pagenum updated to 3
pagenum updated to 4
Batch Co

pagenum updated to 2******************************************                                       ] 60.32%
pagenum updated to 3
Batch Complete!
pagenum updated to 2******************************************                                       ] 60.48%
pagenum updated to 3
Batch Complete!
pagenum updated to 2******************************************                                       ] 60.63%
pagenum updated to 3
Batch Complete!
pagenum updated to 2******************************************                                       ] 60.79%
pagenum updated to 3
Batch Complete!
pagenum updated to 2******************************************                                       ] 60.95%
pagenum updated to 3
Batch Complete!
pagenum updated to 2*******************************************                                      ] 61.11%
pagenum updated to 3
Batch Complete!
pagenum updated to 2*******************************************                                      ] 61.27%
pagenum 

pagenum updated to 2*************************************************************                    ] 80.00%
Batch Complete!
pagenum updated to 2**************************************************************                   ] 80.16%
Batch Complete!
pagenum updated to 2**************************************************************                   ] 80.32%
Batch Complete!
pagenum updated to 2**************************************************************                   ] 80.48%
Batch Complete!
pagenum updated to 2**************************************************************                   ] 80.63%
Batch Complete!
pagenum updated to 2**************************************************************                   ] 80.79%
Batch Complete!
pagenum updated to 2**************************************************************                   ] 80.95%
Batch Complete!
pagenum updated to 2***************************************************************                  ] 81.11%
Batch Co

Lastly, the final ratings from the partial pages are gathered and saved.

In [8]:
df['nfullpages']=(df['nrate']-50).apply(round, ndigits=-2)/100  

df_ratings=pd.DataFrame(columns=['username', 'gameid', 'rating'], index=range(5*10**5))

dfidx_start=0
dfidx=0

bar_counter=0
bar_total=len(df)
for idx, row in df.iterrows():
    pagenum=row['nfullpages']+1
    gameid=row['id']
    
    sleep(0.5)  
    r=request('http://www.boardgamegeek.com/xmlapi2/thing?id=%s&ratingcomments=1&page=%i' % (gameids, pagenum))
    
    soup=BeautifulSoup(r.text, 'xml')
    comments=soup('comment')

    id_list=[gameid]*len(comments)
    dfidx_end=dfidx_start+len(comments)
    df_ratings.iloc[dfidx_start:dfidx_end, df_ratings.columns.get_loc('gameid')] = id_list

    l1=[0]*len(comments)
    l2=[0]*len(comments)
    j=0
    for comm in comments:
        l1[j]=comm['username']
        l2[j]=float(comm['rating'])
        j+=1
    df_ratings.iloc[dfidx_start:dfidx_end, df_ratings.columns.get_loc('username')] = l1
    df_ratings.iloc[dfidx_start:dfidx_end, df_ratings.columns.get_loc('rating')] = l2

    dfidx_start=dfidx_end    

    if idx%100==0:
        
        df_ratings=df_ratings.dropna(how='all')
        df_ratings.to_csv('bgg_ratings.csv', mode='a',index=False)
        del df_ratings
        df_ratings=pd.DataFrame(columns=['username', 'gameid', 'rating'], index=range(5*10**5))
        
    bar_counter+=1
    show_progress_bar(100,bar_counter,bar_total)
    
df_ratings=df_ratings.dropna(how='all')

df_ratings.to_csv('bgg_ratings.csv', mode='a',index=False)

[****************************************************************************************************] 100.00%