![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fcurriculum-notebooks&branch=master&subPath=Mathematics/StatisticsProject/AccessingData/nba.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# National Basketball Association

We can get data from [ESPN NBA Stats](https|//www.espn.com/nba/stats). For example, team statistics for the following variables.

|Variable|Meaning|
|-|-|
|GP|Games Played|
|GS|Games Started|
|MIN|Minutes Per Game|
|PTS|Points Per Game|
|OR|Offensive Rebounds Per Game|
|DR|Defensive Rebounds Per Game|
|REB|Rebounds Per Game|
|AST|Assists Per Game|
|STL|Steals Per Game|
|BLK|Blocks Per Game|
|TO|Turnovers Per Game|
|PF|Fouls Per Game|
|AST/TO|Assist To Turnover Ratio|
|FGM|Average Field Goals Made|
|FGA|Average Field Goals Attempted|
|FG%|Field Goal Percentage|
|3PM|Average 3-Point Field Goals Made|
|3PA|Average 3-Point Field Goals Attempted|
|3P%|3-Point Field Goal Percentage|
|FTM|Average Free Throws Made|
|FTA|Average Free Throws Attempted|
|FT%|Free Throw Percentage|
|2PM|2-Point Field Goals Made per Game|
|2PA|2-Point Field Goals Attempted per Game|
|2P%|2-Point Field Goal Percentage|
|SC-EFF|Scoring Efficiency|
|SH-EFF|Shooting Efficiency|

Let's get stats for the [Toronto Raptors](https://www.espn.com/nba/team/stats/_/name/tor/toronto-raptors).

In [None]:
import pandas as pd
team = 'toronto-raptors'
url = 'https://www.espn.com/nba/team/stats/_/name/tor/toronto-raptors'
page = pd.read_html(url)
df = page[0].join(page[1]).join(page[3])
df['Team'] = team
df

Now that we can get stats for one team, let's get stats for all of the teams. First we will get a list of links to the team stats pages.

In [None]:
import requests
from bs4 import BeautifulSoup
page = requests.get('https://www.espn.com/nba/teams')
links = BeautifulSoup(page.content, 'html.parser').find_all('a', class_='AnchorLink')
teams = [link.get('href').split('/name/')[1] for link in links if 'team/stats' in link.get('href')]
teams

Now we can get stats for each of those teams, and store it all in a dataframe.

In [None]:
def getStats(team):
    page = pd.read_html('https://www.espn.com/nba/team/stats/_/name/'+team)
    df = page[0].join(page[1]).join(page[3])
    df['Team'] = team.split('/')[1]
    return df

df = pd.DataFrame()
for team in teams:
    df = pd.concat([df, getStats(team)], ignore_index=True)
df

We may also want to clean up the player names.

If there is a `*`, they were traded midseason. We will also move the last "word" in the name column to a column named `Position`.

In [None]:
df['MidseasonTrade'] = df['Name'].str.contains('\*')  # if there is a * in the Name column, then the player was traded midseason
df['Name'] = df['Name'].str.replace('*', '', regex=True)  # remove the * from the Name column

df['Position'] = df['Name'].str.strip().str.split(' ').str[-1]  # create a Position column as the last word from the Name column
df['Name'] = df['Name'].str.strip().str.split(' ').str[:-1].str.join(' ')  # remove the Position from the Name column
df.loc[df['Position']=='Total', 'Name'] = 'Total'  # replace the Name column with 'Total' if the Position column is 'Total'

df

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)