In [1]:
from IPython.core.display import HTML
HTML("<style>.container { width:90% !important; }</style>")

Py-Goldsberry - Player-Level Box Score Data
===

This tutorial walks through using the py-goldsberry package to collect box score data at the player level.

To get started, we need to import py-goldsberry and we're going to go ahead import pandas so we can explore the data quickly once we have it collected.

In [2]:
from __future__ import division

import goldsberry
import pandas as pd
pd.options.display.max_columns = 100
pd.options.display.max_rows = 100
goldsberry.__version__

'0.8.0.1'

## Getting List of All Games

In [3]:
game_list = goldsberry.GameIDs()

In [4]:
df_games = pd.DataFrame(game_list.game_list())

In [5]:
df_games.head()

Unnamed: 0,AST,BLK,DREB,FG3A,FG3M,FG3_PCT,FGA,FGM,FG_PCT,FTA,FTM,FT_PCT,GAME_DATE,GAME_ID,MATCHUP,MIN,OREB,PF,PLUS_MINUS,PTS,REB,SEASON_ID,STL,TEAM_ABBREVIATION,TEAM_ID,TEAM_NAME,TOV,VIDEO_AVAILABLE,WL
0,28,4,45,29,8,0.276,124,56,0.452,46,27,0.587,2015-12-18,21500391,DET @ CHI,340,19,35,3,147,64,22015,7,DET,1610612765,Detroit Pistons,11,1,W
1,35,4,34,25,13,0.52,88,53,0.602,16,11,0.688,2016-02-25,21500855,GSW @ ORL,240,8,21,16,130,42,22015,9,GSW,1610612744,Golden State Warriors,21,1,W
2,30,2,36,23,9,0.391,87,53,0.609,34,27,0.794,2016-01-02,21500496,SAC vs. PHX,240,10,23,23,142,46,22015,9,SAC,1610612758,Sacramento Kings,15,1,W
3,36,8,26,18,10,0.556,76,52,0.684,22,18,0.818,2016-03-05,21500926,MIN vs. BKN,240,4,26,14,132,30,22015,8,MIN,1610612750,Minnesota Timberwolves,12,1,W
4,28,11,34,12,6,0.5,77,52,0.675,25,19,0.76,2016-03-01,21500892,MIA vs. CHI,240,4,23,18,129,38,22015,5,MIA,1610612748,Miami Heat,15,1,W


In [7]:
team_cols = ['TEAM_ID','TEAM_NAME','TEAM_ABBREVIATION','SEASON_ID',
             'GAME_DATE','GAME_ID','MATCHUP','WL',
             'PTS','REB','STL','TOV','AST','BLK','DREB','FG3A','FG3M',
             'FG3_PCT','FGA','FGM','FG_PCT','FTA','FTM','FT_PCT','MIN',
             'OREB','PF','PLUS_MINUS','VIDEO_AVAILABLE']

df_games = df_games[team_cols]

## Getting Player Level Data

In [8]:
player_list = goldsberry.PlayerList()

In [9]:
df_players = pd.DataFrame(player_list.players())

In [10]:
df_players.head()

Unnamed: 0,DISPLAY_FIRST_LAST,DISPLAY_LAST_COMMA_FIRST,FROM_YEAR,GAMES_PLAYED_FLAG,PERSON_ID,PLAYERCODE,ROSTERSTATUS,TEAM_ABBREVIATION,TEAM_CITY,TEAM_CODE,TEAM_ID,TEAM_NAME,TO_YEAR
0,Quincy Acy,"Acy, Quincy",2012,Y,203112,quincy_acy,1,SAC,Sacramento,kings,1610612758,Kings,2015
1,Jordan Adams,"Adams, Jordan",2014,Y,203919,jordan_adams,1,MEM,Memphis,grizzlies,1610612763,Grizzlies,2015
2,Steven Adams,"Adams, Steven",2013,Y,203500,steven_adams,1,OKC,Oklahoma City,thunder,1610612760,Thunder,2015
3,Arron Afflalo,"Afflalo, Arron",2007,Y,201167,arron_afflalo,1,NYK,New York,knicks,1610612752,Knicks,2015
4,Alexis Ajinca,"Ajinca, Alexis",2008,Y,201582,alexis_ajinca,1,NOP,New Orleans,pelicans,1610612740,Pelicans,2015


## Getting game logs for the entire league

Now that we know how to get the game log data for a single person, we can combine that knowledge with information in the `df_players` to loop through the entire league and create a dataset of player-level game logs for the entire league.

To do this, we're going to iterate over the `PERSON_ID` column in our `df_players`. We're going to save the results of each iteration to an ever expanding list, `league_logs`. Once we're done with the loop, we're going to convert it to a dataframe. 

In [11]:
league_logs = []
for _ , pid in df_players.PERSON_ID.iteritems():
    player_log = goldsberry.player.game_logs(pid)
    league_logs[0:0] = player_log.logs()

df_gamelogs = pd.DataFrame(league_logs)

Because we don't remember all of the Player's names by their ID, we're going to use our `df_players` data frame to append the appropriate name and team to the game log data. Finally, we're going to rearrange the columns because it doesn't make sense to keep them in alphabetical order. The table will be easier to understand if it's in some reasonable order given the nature of the data. 

In [12]:
df_gamelogs = pd.merge(df_gamelogs, df_players.loc[:,['DISPLAY_FIRST_LAST', 'PERSON_ID']], left_on = 'Player_ID', right_on='PERSON_ID')

In [13]:
col_order = ['Player_ID','DISPLAY_FIRST_LAST',
             'SEASON_ID','GAME_DATE','Game_ID','MATCHUP','WL',
             'PTS','REB','STL','TOV','AST','BLK','DREB','FG3A','FG3M',
             'FG3_PCT','FGA','FGM','FG_PCT','FTA','FTM','FT_PCT','MIN',
             'OREB','PF','PLUS_MINUS','VIDEO_AVAILABLE']
df_gamelogs = df_gamelogs[col_order]