![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Getting More Data

So far we have been using data that is already downloaded into this repository. But we might want to get newer, or older, data or more data about other teams or players. This notebook will use the code library [nba_api](https://github.com/swar/nba_api) to get data from [NBA.com](https://nba.com).

The code cell below will install that code library.

In [None]:
!pip install nba_api

Now we can get career data about a particular player.

In [None]:
player_name = 'Pascal Siakam'

import pandas as pd
from nba_api.stats.endpoints import playercareerstats
from nba_api.stats.static import players
player = players.find_players_by_full_name(player_name)[0]
career = playercareerstats.PlayerCareerStats(player_id = player['id'])
for i, df in enumerate(career.get_data_frames()):
    print('dataframe', i)
    display(df)

We can also get stats for more than 1 player. To do this we can make a list of players we are interested in and repeat steps similar to what we did above. This time we'll only display the first dataframe which shows the player's total stats for a specific season.

In [None]:
player_names = ['Pascal Siakam','Lebron James','Kobe Bryant','Stephen Curry']
player_careers = []

for p in player_names:
    player = players.find_players_by_full_name(p)[0]
    careers = playercareerstats.PlayerCareerStats(player_id=player['id'])
    print(player['full_name'])
    display(careers.get_data_frames()[0])

---

### Exercise

Get the career stats for `'Michael Jordan'` and display the *first* dataframe. *Hint: the numbering starts at 0.*

What do you think this dataframe is communicating?

---

There are plenty of other 'endpoints' that we can access and depending on what we want to do with the data, we can use different dataframes. For example, if instead of looking at individual players, we were more interested at looking at how different teams are performing, we can use the `'League Standings'` endpoint.

In [None]:
from nba_api.stats.endpoints import leaguestandingsv3
standings = leaguestandingsv3.LeagueStandingsV3().get_data_frames()[0]
standings

There are a lot of columns (88 columns!)! Sometimes we're not interested in having that many columns and it's worthwhile to only keep the columns that we are interested in. For example if we only cared about which teams had the best records and longest win/lose streaks, we should only keep columns that help us better understand that.

In [None]:
simplified_standings = standings[['TeamCity','TeamName','Conference','ConferenceRecord','PlayoffRank','WINS','LOSSES','Record','WinPCT','HOME','ROAD','LongWinStreak','LongLossStreak',]]
simplified_standings

Another endpoint that's available is the "All Time Leaders" for each category. If we are interested in who has the most assists or points, we can use this endpoint to get the top 10 from each category.

Each dataframe is for a particular statistic. For example, the first dataframe is for number of games played. We'll look at "All Time Steals", which is the fourth dataframe (index = `3`).

In [None]:
from nba_api.stats.endpoints import alltimeleadersgrids as atl
all_time = atl.AllTimeLeadersGrids()

for df in all_time.get_data_frames():
    print('ALL TIME', df.columns.values.tolist()[2])
    display(df)

all_time_steals = all_time.get_data_frames()[3]

---

### Exercise

Display the "All-Time Assists" dataframe.

---

## Combining Multiple Data Sources

Sometimes we want to combine information that comes from two different sources. For example, if we want to know what each teams top scorer is based on their total points, we can combine two different endpoints together.

In [None]:
from nba_api.stats.endpoints import leagueleaders

leaders = leagueleaders.LeagueLeaders().get_data_frames()[0]

#we only want the highest point getters from each team so we'll only focus on the first row for each team
team_leaders = leaders.drop_duplicates(subset=['TEAM_ID'],keep='first')

#We already created the dataframe for the teams in a previous activity above
display(standings)
display(team_leaders)

Lets trim down the number of columns of each dataframe to only keep the ones we want.

In [None]:
trimmed_standings = standings[['TeamID','TeamCity','TeamName','Conference','ConferenceRecord','Record']]
trimmed_leaders = team_leaders[['PLAYER','TEAM_ID','PTS']]

display(trimmed_standings)
display(trimmed_leaders)

Let's also rename some columns so there's less confusion when we combine the dataframes together.

In [None]:
trimmed_leaders = trimmed_leaders.rename(columns={'PLAYER':'Highest Scorer','TEAM_ID':'TeamID','PTS':'Player_Pts'})
trimmed_leaders

Finally, lets combine the two dataframes together into one big dataframe with all the information we need. To do this we need a column that is similar between both of them. In our case this would be the `'TeamID'` column.

In [None]:
combined_dataframe = trimmed_standings.merge(trimmed_leaders)
combined_dataframe

---

### Exercise

Combine the `'League Leaders'` and `'Team Standings'` dataframes based on the `'Highest Assist Getters'` from each team.

1. Sort the 'leaders' dataframe based on the highest assist numbers, e.g. `assist_leaders = leaders.sort_values(by='AST')`
1. Keep only the top 10 assist leaders from each team, e.g. `team_assist_leaders = assist_leaders.drop_duplicates(subset=['TEAM_ID'],keep='first')`
1. Create a `trimmed_assist_leaders` dataframe that only includes the columns `['PLAYER','TEAM_ID','AST']`
1. Rename the columns with `trimmed_assist_leaders = trimmed_assist_leaders.rename(columns={'PLAYER':'Highest Assist Getter','TEAM_ID':'TeamID','AST':'Player_Ast'})`

---

---

### Bonus Exercise

Sort the dataframe you created above based on the player's assist numbers. Have the team that has the highest assist getter at the top of the dataframe.

---

Here are some other interesting endpoints:

- `'drafthistory'` = information on all drafts since the beginning of the NBA
- `'franchisehistory'` = information on the history of a specific team
- `'commonteamroster'` = information on the roster for teams (including coaches)
- `'playoffpicture'` = information on how the playoff look for a specific season
- `'gamerotation'` = information on the rotation of a specific game

There are plenty more endpoints to explore. You can find information on them here [NBA_API Endpoints](https://github.com/swar/nba_api/tree/master/docs/nba_api/stats/endpoints).

The [next notebook](07-design-thinking.ipynb) will introduce design thinking.

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)