![Dell Logo](https://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Dell_Technologies_logo.svg/1280px-Dell_Technologies_logo.svg.png)

![Digital Moment Logo](https://digitalmoment.org/img/logo-DM-dark.png)

![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fbasketball-and-data-science&branch=main&subPath=notebooks/03-cleaning-and-visualizing.ipynb&depth=1"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Basketball and Data Science

## Cleaning and Visualizing Data

Let's take a look at the [NBA 2023 Standings from basketball-reference.com]('https://www.basketball-reference.com/leagues/NBA_2023_standings.html).

There are a few tables on that page, let's read them using `pandas`.

In [None]:
import pandas as pd
page = pd.read_html('https://www.basketball-reference.com/leagues/NBA_2023_standings.html')
for table in page:
    display(table)

We can then visualize those first two data tables using `Plotly`.

In [None]:
import plotly.express as px
px.bar(page[0], x='Eastern Conference', y='W/L%', title='Win/Loss % by Team (Eastern)').show()
px.bar(page[1], x='Western Conference', y='W/L%', title='Win/Loss % by Team (Western)').show()

Let's combine the two data tables together using `concat`.

We'll first need to make sure the columns in each dataframe that contain the team names are both titled `Team` instead of indicating the conference. Then we can add another column to indicate the conference.

In [None]:
# rename the columns to be the same so we can concatenate the dataframes
east = page[0].rename(columns={'Eastern Conference': 'Team'})
west = page[1].rename(columns={'Western Conference': 'Team'})

# add a column to each dataframe to indicate the conference
east['Conference'] = 'Eastern'
west['Conference'] = 'Western'

standings = pd.concat([east, west])
px.bar(standings.sort_values('W/L%', ascending=False), x='Team', y='W/L%', title='Win/Loss % by Team')

We also notice that the team name column has some extra information, let's move that to new columns.

In [None]:
# create a column to indicate if the team made the playoffs
standings['Playoffs'] = standings['Team'].str.contains('\*')
# remove the asterisk from the team name
standings['Team'] = standings['Team'].str.replace('\*', '', regex=True)
# create a column to indicate the conference rank
standings['Conference Rank'] = standings['Team'].str.split('(').str[1].str[:-1].astype(int)
# remove the (n) from the team name
standings['Team'] = standings['Team'].str.split('(').str[0].str.strip()

standings = standings.reset_index(drop=True)
standings

Now if we want to make a bar graph with our favorite team as a different color, we'll need to use Plotly Graph Objects instead of the Plotly Express that we've been using.

In [None]:
import plotly.graph_objects as go

# sort the dataframe by win/loss percentage
sorted = standings.sort_values('W/L%', ascending=False)
# make a list of colours, red for the Raptors, grey for everyone else
colors = ['#CE1141' if x else 'grey' for x in sorted['Team'].str.contains('Raptors')]

go.Figure(data=[go.Bar(x=sorted['Team'], y=sorted['W/L%'], marker_color=colors)]).update_layout(title_text='Win/Loss % by Team')

## Team Color Codes

https://teamcolorcodes.com/nba-team-color-codes/

https://teamcolorcodes.com/contact-us/

In [None]:
color_codes = pd.read_html('https://teamcolorcodes.com/nba-team-color-codes/')[3]
color_codes

The table contains the color names, which we don't want, so we will remove everything that comes before the `#` sign.

In [None]:
def remove_color_name(cell):
    try:
        return '#'+cell.split('#')[1]
    except:
        return cell

color_codes = color_codes.applymap(remove_color_name)
color_codes

Now let's merge this with our standings dataframe. Again we'll need to rename a column in our second dataframe to `Team`.

In [None]:
color_codes = color_codes.rename(columns={'NBA Team Name': 'Team'})
standings = pd.merge(standings, color_codes, on='Team', how='left')
standings

Now we can make the same bar graph with the colors from the `Color 1` column.

In [None]:
sorted = standings.sort_values('W/L%', ascending=False)
go.Figure(data=[go.Bar(x=sorted['Team'], y=sorted['W/L%'], marker_color=sorted['Color 1'])]).update_layout(title_text='Win/Loss % by Team')

## Using an API

Advanced stats using [nba_api](https://pypi.org/project/nba-api/)

https://github.com/swar/nba_api/blob/master/docs/examples/Basics.ipynb

In [None]:
#!pip install nba_api
from nba_api.stats.endpoints import playercareerstats
from nba_api.stats.static import players
player = players.find_players_by_full_name('Pascal Siakam')[0]
career = playercareerstats.PlayerCareerStats(player_id = player['id'])
for i, df in enumerate(career.get_data_frames()):
    print('dataframe', i)
    display(df)

In [None]:
player_data = career.get_data_frames()[0]
player_data

In [None]:
px.line(player_data, x='SEASON_ID', y='PTS', title='Points per Season for ' + player['full_name'])

Shot Charts

In [None]:
from nba_api.stats.endpoints import shotchartdetail
from nba_api.stats.static import teams
team_name = 'Toronto Raptors'
team = teams.find_teams_by_full_name(team_name)[0]
team_id = team['id']
season = '2022-23'
season_type = 'Regular Season'
all_raptors_shots = shotchartdetail.ShotChartDetail(team_id=team_id, player_id=0, season_nullable=season, season_type_all_star=season_type, context_measure_simple='FGA') # FGA = Field Goal Attempts, PTS = Points
raptors_shots = all_raptors_shots.get_data_frames()[0]
raptors_shots

In [None]:
title='Shot Chart by Type for the '+team_name+' '+season+' '+season_type
px.scatter(raptors_shots[raptors_shots['SHOT_MADE_FLAG']==1], x='LOC_X', y='LOC_Y', color='ACTION_TYPE', height=800, title=title)

In [None]:
px.scatter(raptors_shots, x='LOC_X', y='LOC_Y', color='SHOT_MADE_FLAG', height=1200, title='Shot Chart for the '+team_name+' '+season+' '+season_type)

In [None]:
raptors_shots['SHOT_MADE_SIZE'] = raptors_shots['SHOT_MADE_FLAG']+0.1
title='Shot Chart by Player for the '+team_name+' '+season+' '+season_type
px.scatter(raptors_shots, x='LOC_X', y='LOC_Y', color='PLAYER_NAME', size='SHOT_MADE_SIZE', height=1200, title=title)

https://medium.com/@namnguyen93/a-quick-look-into-visualizing-nba-shot-data-24756665565b