##Collecting Player Game Logs
One of the primary modules in `py-goldsberry` is the `player` module. It provides access to a multitude of player-level statistics. 

Each class in the `player` module requires a specific **playerID**. If you have looked through the first tutorial, you can see that `py-goldsberry` has a built-in function that makes it easy to find the **playerID**s for a given season.

In [1]:
import goldsberry
import pandas as pd
goldsberry.__version__

'0.6.0'

One of the many things you can do with `py-goldsberry` is generate a list of game logs for a single player or the entire league (depending on what you desire). This can be accomplished very easily using two built-in methods and a simple custom function.

First, we generate a list of players from the 2014 season using the built-in `PlayerList()` function, and convert it to a Pandas DataFrame.

In [2]:
players_2014 = goldsberry.PlayerList(2014)

In [3]:
players_2014 = pd.DataFrame(players_2014)

In [4]:
players_2014.sample(5)

Unnamed: 0,DISPLAY_LAST_COMMA_FIRST,FROM_YEAR,PERSON_ID,PLAYERCODE,ROSTERSTATUS,TEAM_ABBREVIATION,TEAM_CITY,TEAM_CODE,TEAM_ID,TEAM_NAME,TO_YEAR
386,"Roberts, Brian",2012,203148,brian_roberts,1,CHA,Charlotte,hornets,1610612766,Hornets,2015
464,"Watson, CJ",2007,201228,cj_watson,1,IND,Indiana,pacers,1610612754,Pacers,2015
457,"Wade, Dwyane",2003,2548,dwyane_wade,1,MIA,Miami,heat,1610612748,Heat,2015
110,"Curry, Seth",2013,203552,seth_curry,0,,,,0,,2015
292,"Mahinmi, Ian",2007,101133,ian_mahinmi,1,IND,Indiana,pacers,1610612754,Pacers,2015


When you have the data into a DataFrame, you can take advantage of the Pandas functionality to search for specific players, teams, rookie cohorts, etc...

Let's start by looking for just James Harden.

In [5]:
players_2014.ix[players_2014['DISPLAY_LAST_COMMA_FIRST'].str.contains("Harden")]

Unnamed: 0,DISPLAY_LAST_COMMA_FIRST,FROM_YEAR,PERSON_ID,PLAYERCODE,ROSTERSTATUS,TEAM_ABBREVIATION,TEAM_CITY,TEAM_CODE,TEAM_ID,TEAM_NAME,TO_YEAR
196,"Harden, James",2009,201935,james_harden,1,HOU,Houston,rockets,1610612745,Rockets,2015


Fortunately, there is only one player with `Harden` somewhere in his name. If we had searched for `James`, it would have been a bit of a different story.

Because we want to get information on James Harden, we need to make note of the value in his **PERSON_ID** column. This is the unique id number that is associated with Harden in the NBA database. Anytime we want to search for James Harden related information, this will be a value to remember. 

To make it easy to remember, I'm going to save it as a variable in our environment that we can call it anytime we want. It's a bit easier for me to remember `harden_id` than `201935`.

In [6]:
harden_id = '201935'

###Game Logs

One of many pieces of available data for a player is their game logs. You can access these by using the `goldsberry.player.game_logs()` class and passing in the playerID. 

There are a few variables that can be manipulated in the game_logs to adjust the data that gets returned. The most important is the `season` argument. You can pass in whatever season you are interested in using the first four digits of the season. For example, if you want all of the game logs from the 2014-15 season, you pass `season=2014` when you call the class. If you wanted game logs from the 2009-2010 season, you would pass `season=2009`.

In [7]:
harden_2014_logs = goldsberry.player.game_logs(harden_id, season = 2014)

Now that we've collected the data from the NBA website, we want to create a Pandas DataFrame to view an analyze. 

In [8]:
harden_2014_logs = pd.DataFrame(harden_2014_logs.logs())

Notice that we passed `harden_2014_logs.logs()` and not `harden_2014_logs` to the DataFrame constructor. This is because, with many of the calls, there is more than one set of data that is returned. Instead of making multiple calls to the NBA's server, a single call is made and then the class methods return different types of data. 

(Until documentation is complete, take advantage of the [TAB] complete feature in jupyter.)

In [9]:
harden_2014_logs.head()

Unnamed: 0,AST,BLK,DREB,FG3A,FG3M,FG3_PCT,FGA,FGM,FG_PCT,FTA,...,PF,PLUS_MINUS,PTS,Player_ID,REB,SEASON_ID,STL,TOV,VIDEO_AVAILABLE,WL
0,10,1,11,5,2,0.4,8,3,0.375,8,...,2,28,16,201935,11,22014,1,6,1,W
1,6,2,4,8,1,0.125,20,7,0.35,16,...,1,0,29,201935,6,22014,1,2,1,W
2,7,1,1,6,1,0.167,18,10,0.556,9,...,2,1,30,201935,1,22014,1,4,1,W
3,10,0,2,5,2,0.4,19,5,0.263,6,...,2,-2,16,201935,2,22014,3,4,1,L
4,4,0,2,6,2,0.333,15,6,0.4,8,...,1,-26,22,201935,4,22014,1,1,1,L


Now that we know how to get a single players game logs, we can easily get game logs for a list of players by creating a simple function.

In [10]:
def get_all_logs(playerids, season=2014):
    logs = [] # Empty List to store results
    for i in playerids:
        try:
            i_log = goldsberry.player.game_logs(i, season=season)
            logs = logs + i_log.logs() # leveraging Goldsberry returning a list instead of a DataFrame
        except:
            ValueError("no record for " + str(i))
    return logs

In the function above, notice we do not construct any DataFrames. `py-goldsberry` returns lists, not DataFrames. I did this so that loops like the one above would run faster. There is no need to convert each players data into a DataFrame and merge when all observations can be placed in a single list and converted to a DataFrame upon completion.

There is a bit of error handling just in case a playerID is fed to the function that is not valid.

Before we execute our function, we need to create a list of playerids for which we want game logs. 

Once this list is created, we pass the list as well as the season into the function, saving the results as `all_logs`

In [11]:
playerids = players_2014['PERSON_ID'].tolist()
all_logs = get_all_logs(playerids, season=2014)

This will take a bit of time to execute (depending on the speed of your internet connection). Once it has completed, you can contruct a DataFrame by passing the results to the DataFrame constructor. Once you are done, you can sample your results.

In [12]:
all_logs_df = pd.DataFrame(all_logs)
all_logs_df.sample(10)

Unnamed: 0,AST,BLK,DREB,FG3A,FG3M,FG3_PCT,FGA,FGM,FG_PCT,FTA,...,PF,PLUS_MINUS,PTS,Player_ID,REB,SEASON_ID,STL,TOV,VIDEO_AVAILABLE,WL
9377,1,1,8,0,0,0.0,4,2,0.5,5,...,3,8,7,101162,9,22014,1,4,1,W
17935,0,0,0,2,1,0.5,5,2,0.4,0,...,3,-2,5,203094,0,22014,0,0,1,L
12164,0,0,9,4,1,0.25,10,5,0.5,8,...,4,-11,17,101141,12,22014,1,1,1,L
12418,4,0,2,3,0,0.0,7,2,0.286,0,...,2,-7,4,101127,2,22014,0,1,1,W
19687,5,0,3,1,0,0.0,1,0,0.0,4,...,3,2,2,101179,3,22014,1,4,1,W
20906,0,0,1,0,0,0.0,7,2,0.286,2,...,4,-12,6,202336,2,22014,3,0,1,L
12521,0,1,0,0,0,0.0,0,0,0.0,2,...,1,2,2,203108,0,22014,0,0,1,L
6031,0,0,0,1,0,0.0,1,0,0.0,0,...,0,-7,0,203584,0,22014,0,0,1,L
14315,0,0,0,0,0,0.0,0,0,0.0,0,...,0,-4,0,203087,0,22014,0,1,1,W
4834,3,0,3,1,0,0.0,7,3,0.429,2,...,2,-21,8,201596,4,22014,1,1,1,L


Once you have your list of game logs, you might want to add some additional player information to enrich the dataset. Take advantage of the Pandas merge feature to merge the `players_2014` and `all_logs_df` dataframes together

In [13]:
player_game_logs_2014_15 = pd.merge(players_2014, all_logs_df, left_on="PERSON_ID", right_on="Player_ID", how="left")

In [14]:
player_game_logs_2014_15.sample(10)

Unnamed: 0,DISPLAY_LAST_COMMA_FIRST,FROM_YEAR,PERSON_ID,PLAYERCODE,ROSTERSTATUS,TEAM_ABBREVIATION,TEAM_CITY,TEAM_CODE,TEAM_ID,TEAM_NAME,...,PF,PLUS_MINUS,PTS,Player_ID,REB,SEASON_ID,STL,TOV,VIDEO_AVAILABLE,WL
21783,"Smith, Ish",2010,202397,ish_smith,1,PHI,Philadelphia,sixers,1610612755,76ers,...,2,9,10,202397,5,22014,1,3,1,W
6704,"Dieng, Gorgui",2013,203476,gorgui_dieng,1,MIN,Minnesota,timberwolves,1610612750,Timberwolves,...,3,7,19,203476,11,22014,2,2,1,W
23672,"Turner, Evan",2010,202323,evan_turner,1,BOS,Boston,celtics,1610612738,Celtics,...,2,5,12,202323,6,22014,2,4,1,L
6422,"Dedmon, Dewayne",2013,203473,dewayne_dedmon,1,ORL,Orlando,magic,1610612753,Magic,...,3,-9,7,203473,10,22014,1,2,1,L
46,"Acy, Quincy",2012,203112,quincy_acy,1,NYK,New York,knicks,1610612752,Knicks,...,1,16,0,203112,4,22014,2,1,1,W
3243,"Booker, Trevor",2010,202344,trevor_booker,1,UTA,Utah,jazz,1610612762,Jazz,...,3,-9,6,202344,8,22014,0,1,1,L
22589,"Sullinger, Jared",2012,203096,jared_sullinger,1,BOS,Boston,celtics,1610612738,Celtics,...,4,0,26,203096,9,22014,1,2,1,L
2666,"Beverley, Patrick",2012,201976,patrick_beverley,1,HOU,Houston,rockets,1610612745,Rockets,...,3,9,10,201976,4,22014,2,1,1,W
1269,"Ariza, Trevor",2004,2772,trevor_ariza,1,HOU,Houston,rockets,1610612745,Rockets,...,4,20,10,2772,3,22014,3,0,1,W
415,"Aldrich, Cole",2010,202332,cole_aldrich,1,NYK,New York,knicks,1610612752,Knicks,...,2,2,11,202332,7,22014,0,1,1,L


If you've found this helpful and/or have any other requests, shoot me an email [bradley@cardinaladvising.com](mailto:bradley@cardinaladvising.com) or post an issue on [github](http://github.com/bradleyfay/py-goldsberry)