In this project, we will analyze the 2018-2019 NBA season average statistics from every player. We will look at these stats to draw conclusions about each players' characteristics and performance this season.

In [0]:
import requests
import json
import pandas as pd
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
import time

We're getting a list of the active players in the 2018-2019 NBA season:

In [0]:
players = []

resp = (requests.get
    ("https://www.nbastuffer.com/2018-2019-nba-player-stats/"))
soup = BeautifulSoup(resp.content, "html.parser")
tables = soup.find_all("table")
table = tables[0]
table.text
names = table.find_all("td", "column-2")

In [0]:
for name in names:
  players.append(name.text)
players

['Marc Gasol',
 'Danny Green',
 'Serge Ibaka',
 'Kawhi Leonard',
 'Kyle Lowry',
 'Pascal Siakam',
 'Fred VanVleet',
 'Norman Powell',
 'Stephen Curry',
 'Draymond Green',
 'Shaun Livingston',
 'Kevon Looney',
 'Alfonzo McKinnie',
 'Klay Thompson',
 'Al-Farouq Aminu',
 'Zach Collins',
 'Seth Curry',
 'Maurice Harkless',
 'Rodney Hood',
 'Enes Kanter',
 'Damian Lillard',
 'CJ McCollum',
 'Evan Turner',
 'Andre Iguodala',
 'Giannis Antetokounmpo',
 'Eric Bledsoe',
 'Pat Connaughton',
 'George Hill',
 'Ersan Ilyasova',
 'Brook Lopez',
 'Khris Middleton',
 'Will Barton',
 'Malik Beasley',
 'Torrey Craig',
 'Gary Harris',
 'Nikola Jokic',
 'Paul Millsap',
 'Monte Morris',
 'Jamal Murray',
 'Mason Plumlee',
 'Andrew Bogut',
 'Nikola Mirotic',
 'Jodie Meeks',
 'Jonas Jerebko',
 'Jimmy Butler',
 'Tobias Harris',
 'JJ Redick',
 'Ben Simmons',
 'Jordan Bell',
 'Quinn Cook',
 'Kevin Durant',
 'Clint Capela',
 'Eric Gordon',
 'Gerald Green',
 'James Harden',
 'Chris Paul',
 'PJ Tucker',
 'Joel Embi

Now, we make calls to the api for each one of the players, by first finding their player id.

In [0]:
df_players = pd.DataFrame()

for player in players:
  # had to consider the case where only a single name exists- like Nene
  if " " not in player:
    resp_player = requests.get("https://www.balldontlie.io/api/v1/players?per_page=500&search=%s" % player)
    data_player = resp_player.json()
    df_players.loc[player, "ID"] = data["id"]
    time.sleep(1.0)
    continue

  name = player.split(" ")
  # Note that some players will have a name array with more than 2 entries
  # However, the balldontlie api works even if we only pass in part of their
  # last name to search
  first_name = name[0]
  last_name = name[1]
  resp_player = requests.get("https://www.balldontlie.io/api/v1/players?per_page=500&search=%s" % last_name)
  data_player = resp_player.json()
  # you can only search by one name (we chose last name), so you also have to
  # iterate through to find the player with the appropriate first name 
  for data in data_player["data"]:
    if data["first_name"] == first_name:
      player_id = data["id"]
      break

  df_players.loc[player, "ID"] = player_id
  time.sleep(1.0)

df_players

Unnamed: 0,ID
Marc Gasol,169.0
Danny Green,184.0
Serge Ibaka,223.0
Kawhi Leonard,274.0
Kyle Lowry,286.0
...,...
Chris Boucher,58.0
TJ Leaf,270.0
Jon Leuer,276.0
Kyle O'Quinn,352.0


Now that we have player IDs, we can use each ID to make a request to the balldontlie API, for season average statistics of each player. 

In [0]:
# url for season averages request
url_avgs = "https://www.balldontlie.io/api/v1/season_averages?season=2018&"

# format ids into string for get request
pids = []

for pid in df_players["ID"]:
  pids.append("player_ids[]=")
  pids.append(str(int(pid)))
  pids.append("&")
pids.pop()

params_avgs = "".join(pids)
params_avgs

'player_ids[]=169&player_ids[]=184&player_ids[]=223&player_ids[]=274&player_ids[]=286&player_ids[]=416&player_ids[]=458&player_ids[]=380&player_ids[]=115&player_ids[]=185&player_ids[]=280&player_ids[]=282&player_ids[]=308&player_ids[]=443&player_ids[]=10&player_ids[]=102&player_ids[]=114&player_ids[]=193&player_ids[]=218&player_ids[]=253&player_ids[]=278&player_ids[]=303&player_ids[]=451&player_ids[]=224&player_ids[]=15&player_ids[]=51&player_ids[]=105&player_ids[]=211&player_ids[]=225&player_ids[]=283&player_ids[]=315&player_ids[]=31&player_ids[]=38&player_ids[]=110&player_ids[]=196&player_ids[]=246&player_ids[]=318&player_ids[]=330&player_ids[]=335&player_ids[]=371&player_ids[]=1593&player_ids[]=321&player_ids[]=311&player_ids[]=239&player_ids[]=79&player_ids[]=200&player_ids[]=389&player_ids[]=417&player_ids[]=41&player_ids[]=106&player_ids[]=140&player_ids[]=83&player_ids[]=178&player_ids[]=186&player_ids[]=192&player_ids[]=367&player_ids[]=450&player_ids[]=145&player_ids[]=146&pla

In [0]:
# Can now make single call for season averages of each player

resp_avgs = requests.get(url_avgs + params_avgs)
data_avgs = resp_avgs.json()

In [0]:
df_avgs = json_normalize(data_avgs.get("data"))
df_avgs

Unnamed: 0,games_played,player_id,season,min,fgm,fga,fg3m,fg3a,ftm,fta,oreb,dreb,reb,ast,stl,blk,turnover,pf,pts,fg_pct,fg3_pct,ft_pct
0,80,3,2018,33:19,6.01,10.09,0.00,0.03,1.83,3.65,4.89,4.61,9.50,1.55,1.48,0.95,1.71,2.55,13.85,0.596,0.000,0.500
1,81,6,2018,33:10,8.44,16.28,0.12,0.52,4.31,5.09,3.11,6.09,9.20,2.40,0.53,1.32,1.78,2.21,21.32,0.519,0.238,0.847
2,48,8,2018,8:40,1.40,3.71,0.67,2.06,0.94,1.25,0.06,0.44,0.50,0.52,0.10,0.13,0.69,0.98,4.40,0.376,0.323,0.750
3,80,9,2018,26:12,4.19,7.10,0.08,0.56,2.46,3.48,2.40,6.01,8.41,1.38,0.55,1.50,1.30,2.30,10.91,0.590,0.133,0.709
4,82,10,2018,27:56,3.13,7.23,1.17,3.41,1.83,2.11,1.37,6.07,7.44,1.27,0.83,0.40,0.88,1.74,9.27,0.433,0.343,0.867
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
205,3,1988,2018,4:15,1.00,2.00,0.00,0.00,0.00,0.67,0.67,0.33,1.00,0.33,0.00,0.33,1.00,2.00,2.00,0.500,0.000,0.000
206,7,2106,2018,6:10,0.43,1.00,0.14,0.14,0.00,0.00,0.57,2.29,2.86,0.57,0.14,0.14,0.71,1.29,1.00,0.429,1.000,0.000
207,30,2158,2018,13:14,0.87,2.10,0.30,0.93,0.43,0.50,0.23,1.37,1.60,0.97,0.77,0.10,0.57,1.27,2.47,0.413,0.321,0.867
208,40,2175,2018,24:28,2.95,6.30,1.85,4.45,1.40,1.78,0.63,2.88,3.50,1.00,0.53,0.28,0.88,2.00,9.15,0.468,0.416,0.789


We now have a data frame of each player's season averages for last season. For clarity, we will add each player's name as well, taken from the df_players data frame.

In [0]:
# set index to player id
df_players["name"] = df_players.index
df_players = df_players.set_index("ID")
df_players["IDno"] = df_players.index.astype(int)
df_players = df_players.set_index("IDno")

In [0]:
# make a function to map player ids to names
def pid_to_name(pid):
  return df_players["name"][pid]

# call function to create player_name column
df_avgs["player_name"] = df_avgs["player_id"].map(pid_to_name)
df_avgs

Unnamed: 0,games_played,player_id,season,min,fgm,fga,fg3m,fg3a,ftm,fta,oreb,dreb,reb,ast,stl,blk,turnover,pf,pts,fg_pct,fg3_pct,ft_pct,player_name
0,80,3,2018,33:19,6.01,10.09,0.00,0.03,1.83,3.65,4.89,4.61,9.50,1.55,1.48,0.95,1.71,2.55,13.85,0.596,0.000,0.500,Steven Adams
1,81,6,2018,33:10,8.44,16.28,0.12,0.52,4.31,5.09,3.11,6.09,9.20,2.40,0.53,1.32,1.78,2.21,21.32,0.519,0.238,0.847,LaMarcus Aldridge
2,48,8,2018,8:40,1.40,3.71,0.67,2.06,0.94,1.25,0.06,0.44,0.50,0.52,0.10,0.13,0.69,0.98,4.40,0.376,0.323,0.750,Grayson Allen
3,80,9,2018,26:12,4.19,7.10,0.08,0.56,2.46,3.48,2.40,6.01,8.41,1.38,0.55,1.50,1.30,2.30,10.91,0.590,0.133,0.709,Jarrett Allen
4,82,10,2018,27:56,3.13,7.23,1.17,3.41,1.83,2.11,1.37,6.07,7.44,1.27,0.83,0.40,0.88,1.74,9.27,0.433,0.343,0.867,Al-Farouq Aminu
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
205,3,1988,2018,4:15,1.00,2.00,0.00,0.00,0.00,0.67,0.67,0.33,1.00,0.33,0.00,0.33,1.00,2.00,2.00,0.500,0.000,0.000,Donatas Motiejunas
206,7,2106,2018,6:10,0.43,1.00,0.14,0.14,0.00,0.00,0.57,2.29,2.86,0.57,0.14,0.14,0.71,1.29,1.00,0.429,1.000,0.000,Eric Moreland
207,30,2158,2018,13:14,0.87,2.10,0.30,0.93,0.43,0.50,0.23,1.37,1.60,0.97,0.77,0.10,0.57,1.27,2.47,0.413,0.321,0.867,Patrick McCaw
208,40,2175,2018,24:28,2.95,6.30,1.85,4.45,1.40,1.78,0.63,2.88,3.50,1.00,0.53,0.28,0.88,2.00,9.15,0.468,0.416,0.789,Danuel House Jr.


In [0]:
resp_ex = requests.get("https://www.balldontlie.io/api/v1/players/2158")
data_ex = resp_ex.json()
data_ex

{'first_name': 'Patrick',
 'height_feet': 6,
 'height_inches': 7,
 'id': 2158,
 'last_name': 'McCaw',
 'position': '',
 'team': {'abbreviation': 'TOR',
  'city': 'Toronto',
  'conference': 'East',
  'division': 'Atlantic',
  'full_name': 'Toronto Raptors',
  'id': 28,
  'name': 'Raptors'},
 'weight_pounds': 185}

In [0]:
# fill info array with json data for each player. stagger requests.
info = []
for pid in df_avgs["player_id"]:
  resp_info = requests.get("https://www.balldontlie.io/api/v1/players/" + str(pid))
  data_info = resp_info.json()
  info.append(data_info)
  time.sleep(1.0)

In [0]:
# create arrays for each pid
heights_feet = []
heights_inches = []
playerids = []
weights = []
positions = []

# fill arrays with data from request json
for entry in info:
  pid = entry.get("id")
  height_feet = entry.get("height_feet")
  height_inches = entry.get("height_inches")
  weight_pounds = entry.get("weight_pounds")
  position = entry.get("position")
  heights_feet.append(height_feet)
  heights_inches.append(height_inches)
  playerids.append(pid)
  weights.append(weight_pounds)
  positions.append(position)

# create new column
df_avgs["player_height_ft"] = heights_feet
df_avgs["player_height_in"] = heights_inches
df_avgs["player_weight"] = weights
df_avgs["player_position"] = positions

In [0]:
#df_avgs["player_position"].fillna("n/a")
df_avgs

Unnamed: 0,games_played,player_id,season,min,fgm,fga,fg3m,fg3a,ftm,fta,oreb,dreb,reb,ast,stl,blk,turnover,pf,pts,fg_pct,fg3_pct,ft_pct,player_name,player_height_ft,player_height_in,player_weight,player_position
0,80,3,2018,33:19,6.01,10.09,0.00,0.03,1.83,3.65,4.89,4.61,9.50,1.55,1.48,0.95,1.71,2.55,13.85,0.596,0.000,0.500,Steven Adams,7.0,0.0,265.0,C
1,81,6,2018,33:10,8.44,16.28,0.12,0.52,4.31,5.09,3.11,6.09,9.20,2.40,0.53,1.32,1.78,2.21,21.32,0.519,0.238,0.847,LaMarcus Aldridge,6.0,11.0,260.0,F
2,48,8,2018,8:40,1.40,3.71,0.67,2.06,0.94,1.25,0.06,0.44,0.50,0.52,0.10,0.13,0.69,0.98,4.40,0.376,0.323,0.750,Grayson Allen,6.0,5.0,198.0,G
3,80,9,2018,26:12,4.19,7.10,0.08,0.56,2.46,3.48,2.40,6.01,8.41,1.38,0.55,1.50,1.30,2.30,10.91,0.590,0.133,0.709,Jarrett Allen,6.0,11.0,237.0,C
4,82,10,2018,27:56,3.13,7.23,1.17,3.41,1.83,2.11,1.37,6.07,7.44,1.27,0.83,0.40,0.88,1.74,9.27,0.433,0.343,0.867,Al-Farouq Aminu,6.0,9.0,220.0,F
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
205,3,1988,2018,4:15,1.00,2.00,0.00,0.00,0.00,0.67,0.67,0.33,1.00,0.33,0.00,0.33,1.00,2.00,2.00,0.500,0.000,0.000,Donatas Motiejunas,,,,
206,7,2106,2018,6:10,0.43,1.00,0.14,0.14,0.00,0.00,0.57,2.29,2.86,0.57,0.14,0.14,0.71,1.29,1.00,0.429,1.000,0.000,Eric Moreland,,,,
207,30,2158,2018,13:14,0.87,2.10,0.30,0.93,0.43,0.50,0.23,1.37,1.60,0.97,0.77,0.10,0.57,1.27,2.47,0.413,0.321,0.867,Patrick McCaw,6.0,7.0,185.0,
208,40,2175,2018,24:28,2.95,6.30,1.85,4.45,1.40,1.78,0.63,2.88,3.50,1.00,0.53,0.28,0.88,2.00,9.15,0.468,0.416,0.789,Danuel House Jr.,6.0,7.0,220.0,


In [0]:
# create new column "height (cm)" for easier calculation
df_avgs["height (cm)"] = (df_avgs["player_height_ft"] * 12 + df_avgs["player_height_in"]) * 2.54

In [0]:
# create new column "minutes/gm" as a float for easier calculation
def min_float(m):
  sep = ":"
  m = m.split(sep)
  mins = float(m[0])
  secs = float(m[1])
  return mins + secs / 60
df_avgs["minutes/gm"] = df_avgs["min"].map(min_float)
df_avgs

Unnamed: 0,games_played,player_id,season,min,fgm,fga,fg3m,fg3a,ftm,fta,oreb,dreb,reb,ast,stl,blk,turnover,pf,pts,fg_pct,fg3_pct,ft_pct,player_name,player_height_ft,player_height_in,player_weight,player_position,height (cm),minutes/gm
0,80,3,2018,33:19,6.01,10.09,0.00,0.03,1.83,3.65,4.89,4.61,9.50,1.55,1.48,0.95,1.71,2.55,13.85,0.596,0.000,0.500,Steven Adams,7.0,0.0,265.0,C,213.36,33.316667
1,81,6,2018,33:10,8.44,16.28,0.12,0.52,4.31,5.09,3.11,6.09,9.20,2.40,0.53,1.32,1.78,2.21,21.32,0.519,0.238,0.847,LaMarcus Aldridge,6.0,11.0,260.0,F,210.82,33.166667
2,48,8,2018,8:40,1.40,3.71,0.67,2.06,0.94,1.25,0.06,0.44,0.50,0.52,0.10,0.13,0.69,0.98,4.40,0.376,0.323,0.750,Grayson Allen,6.0,5.0,198.0,G,195.58,8.666667
3,80,9,2018,26:12,4.19,7.10,0.08,0.56,2.46,3.48,2.40,6.01,8.41,1.38,0.55,1.50,1.30,2.30,10.91,0.590,0.133,0.709,Jarrett Allen,6.0,11.0,237.0,C,210.82,26.200000
4,82,10,2018,27:56,3.13,7.23,1.17,3.41,1.83,2.11,1.37,6.07,7.44,1.27,0.83,0.40,0.88,1.74,9.27,0.433,0.343,0.867,Al-Farouq Aminu,6.0,9.0,220.0,F,205.74,27.933333
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
205,3,1988,2018,4:15,1.00,2.00,0.00,0.00,0.00,0.67,0.67,0.33,1.00,0.33,0.00,0.33,1.00,2.00,2.00,0.500,0.000,0.000,Donatas Motiejunas,,,,,,4.250000
206,7,2106,2018,6:10,0.43,1.00,0.14,0.14,0.00,0.00,0.57,2.29,2.86,0.57,0.14,0.14,0.71,1.29,1.00,0.429,1.000,0.000,Eric Moreland,,,,,,6.166667
207,30,2158,2018,13:14,0.87,2.10,0.30,0.93,0.43,0.50,0.23,1.37,1.60,0.97,0.77,0.10,0.57,1.27,2.47,0.413,0.321,0.867,Patrick McCaw,6.0,7.0,185.0,,200.66,13.233333
208,40,2175,2018,24:28,2.95,6.30,1.85,4.45,1.40,1.78,0.63,2.88,3.50,1.00,0.53,0.28,0.88,2.00,9.15,0.468,0.416,0.789,Danuel House Jr.,6.0,7.0,220.0,,200.66,24.466667


Each player's name is now included in their season average stats. Note that not all players will have height_feet, height_inches, or weight_pounds (see "Considerations" under "Get a Specific Player" in the balldontlie documentation).

We are now ready to save it as a .csv file and explore the data in the next part.

In [0]:
# data collected and cleaned; save to drive as csv
#from google.colab import drive
#drive.mount('drive')

Drive already mounted at drive; to attempt to forcibly remount, call drive.mount("drive", force_remount=True).


In [0]:
#df_avgs.to_csv("nbaseasonavgs.csv")
#!cp nbaseasonavgs.csv "drive/My Drive/"