**banner**

# Getting More Data

So far we have been using data that is already downloaded into this repository. But we might want to get newer, or older, data or more data about other teams or players. This notebook will use the code library [nba_api](https://github.com/swar/nba_api) to get data from [NBA.com](https://nba.com).

Unfortunately the nba_api library doesn't work in JupyterLite where you are currently running these notebooks, so you will need to open this in [Google Colab](https://colab.research.google.com/github/callysto/basketball-and-data-science/blob/main/content/06-getting-more-data.ipynb) or the [Callysto Hub](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fbasketball-and-data-science&branch=main&subPath=content/06-getting-more-data.ipynb&depth=1).

The code cell below will demonstrate getting career data about a particular player.

In [None]:
!pip install nba_api

In [None]:
player_name = 'Pascal Siakam'

from nba_api.stats.endpoints import playercareerstats
from nba_api.stats.static import players
player = players.find_players_by_full_name(player_name)[0]
career = playercareerstats.PlayerCareerStats(player_id = player['id'])
for i, df in enumerate(career.get_data_frames()):
    print('dataframe', i)
    display(df)

We can also get stats for more than 1 player. To do this we can make a list of players we are interested in and repeat steps similar to what we did above. This time we'll only display the first dataframe which shows the player's total stats for a specific season.

In [None]:
player_names = ['Pascal Siakam','Lebron James','Kobe Bryant','Stephen Curry']
player_careers = []

for p in player_names:
    player = players.find_players_by_full_name(p)[0]
    careers = playercareerstats.PlayerCareerStats(player_id=player['id'])
    print(player['full_name'])
    display(careers.get_data_frames()[0])

---

### Exercise

Get the career stats for `'Michael Jordan'` and display the `'1st dataframe'`. What do you think this dataframe is communicating?\
*Hint: Indexing starts at 0!*

---

There are plenty of other 'endpoints' that we can access and depending on what we want to do with the data, we can use different dataframes. For example, if instead of looking at individual players, we were more interested at looking at how different teams are performing, we can use the `'League Standings'` endpoint.

In [None]:
from nba_api.stats.endpoints import leaguestandingsv3

standings = leaguestandingsv3.LeagueStandingsV3().get_data_frames()[0]

display(standings)

There are a lot of columns (88 columns!)! Sometimes we're not interested in having that many columns and it's worthwhile to only keep the columns that we are interested in. For example if we only cared about which teams had the best records and longest win/lose streaks, we should only keep columns that help us better understand that.

In [None]:
simplified_standings = standings[['TeamCity','TeamName','Conference','ConferenceRecord','PlayoffRank','WINS','LOSSES','Record','WinPCT','HOME','ROAD','LongWinStreak','LongLossStreak',]]

simplified_standings

Another endpoint that's available is the All Time leaders for each category. If we are interested in who has the most assists or points, we can use this endpoint to get the top 10 from each category. Each dataframe is for a particular statistic. For example, the first dataframe is for # of games played. You can get specific statistics by specifying the index of the dataframe.

In [None]:
from nba_api.stats.endpoints import alltimeleadersgrids as atl

all_time = atl.AllTimeLeadersGrids()

for df in all_time.get_data_frames():
    print('ALL TIME', df.columns.values.tolist()[2])
    display(df)

#steals dataframe is at index number 3
all_time_steals = all_time.get_data_frames()[3]

**banner**