In [None]:
import pandas as pd 
import numpy as np 

## What are Player Level Team Ratings?

Team ratings are a way to determine the strength of a team based on player level ratings, often statistical plusminus models (SPM). For a primer on those, check out [this link](https://github.com/anpatton/basic-nba-tutorials/blob/main/spm/how_to_make_spm_R.md). Doing team ratings on the player level means that you are not just adding up points scored per 100 and subtracting points against per 100, as that is team performance data. The player level team ratings are often used for projection, gambling, etc. You might have seen versions of these from PIPM (RIP), DARKO, 538, etc. Like most basketball analytics approaches, there are a variety of ways to accomplish the task, and this is but one simplified version. Let's get into it. 

## What are Win Projections?

Simply, win projections are estimates of how many games a team will win over ___ period of time. Can be season long projections, daily projections, etc. Our projections will be based on our player level team ratings that we will calculate first.

## Data

1) player_ratings_minutes.csv" - This is DARKO player ratings from day one of the 2018-19 season with projected minutes as well.  
2) "schedule.csv" - This is the league schedule from 2018-19.


In [None]:
ratings = pd.read_csv("data/player_ratings_minutes.csv")
ratings["dpm"] = ratings["o_dpm"] + ratings["d_dpm"]
ratings["team_name"] = np.where(ratings["team_name"] == "Los Angeles Clippers", "LA Clippers", ratings["team_name"])
ratings = ratings.sort_values(by = ["dpm"], ascending=False)
ratings.head()

You'll note that Russ has an extremely high DPM. It starts to decrease somewhat quickly following this point due to three years of strong on-off data.

## Step 1: Aggregate player ratings to the team level and calculate team ratings

This is a deceptively simple step that does literally does everything! We multiply a player's minutes projection by their O-DPM and D-DPM, and then by team, add up the offensive and defensive values as well as the total minutes projected. Then, we divide the team offensive and defensive values by the teams' total projected minutes and multiply by five (for five players on court). That's it! Add up the offensive and defensive ratings and you have your net team ratings.

In [None]:
team_ratings = (ratings.fillna(0)
                .assign(o_value = ratings["minutes"] * ratings["o_dpm"])
                .assign(d_value = ratings["minutes"] * ratings["d_dpm"])
                .groupby("team_name")[["o_value", "d_value", "minutes"]]
                .sum()
                .reset_index())

team_ratings = (team_ratings 
                .assign(ortg = (team_ratings["o_value"]/team_ratings["minutes"]) * 5)
                .assign(drtg = (team_ratings["d_value"]/team_ratings["minutes"]) * 5))

team_ratings = team_ratings.assign(nrtg = team_ratings["ortg"] + team_ratings["drtg"])
team_ratings = team_ratings[["team_name", "ortg", "drtg", "nrtg"]].sort_values(by = ['nrtg'], ascending=False)
team_ratings

## Step 2: Get What we Need to Project Wins

In order to figure out the number of wins a team should have, we first need the schedule. I have scraped that already for the 2018-19 season in the "schedule.csv" and conveniently formatted it for our purposes. 

In [None]:
schedule = pd.read_csv("data/schedule.csv") 

schedule.head()

You'll note that home and away teams are there, as home teams are more likely to win. The raw team ratings assume a neutral court scenario. Let's join in the team ratings to the schedule.

In [None]:
schedule_with_ratings = schedule.merge(team_ratings, how="left", left_on="home_team", right_on="team_name")
schedule_with_ratings = schedule_with_ratings.drop("team_name", axis=1)
schedule_with_ratings = schedule_with_ratings.merge(team_ratings, how="left", left_on="away_team", right_on="team_name", suffixes=["_home", "_away"])
schedule_with_ratings = schedule_with_ratings.drop("team_name", axis=1)

schedule_with_ratings.head()

Ok, with that done, our setup is ready to go. All we need to do is actually figure out how to translate ratings to win probability. We're going to cheat a bit here and use some previously conducted research on the topic by some of my NBA Twitter colleagues/friends/collaborators.  

The rating difference and win/loss was plugged into a logistic regression across many many games and home/away splits. The coefficients that resulted from that regression are used below as coef1 and coef2. The get_win_prob function takes the coefficients and manually conducts a prediction.

In [None]:
from numpy.random import normal

def get_win_prob(home_rating: float, away_rating: float, sd: float):
    hca = 2.7 ## THIS IS DIFFERENT FOR COVID SEASONS
    home_rating_adj = normal(loc = home_rating, scale = sd, size = 1)
    away_rating_adj = normal(loc = away_rating, scale = sd, size = 1)
    delta = home_rating_adj - away_rating_adj - hca
    coef1 = 0.1483737284
    coef2 = 0.4257284470
    home_win_prob = 1/(1 + np.exp(-(coef2 + coef1 * delta)))
    return home_win_prob

You'll note that there is an "sd" parameter that we have not discussed. This is approximately the standard deviation of the team ratings based on the uncertainty in the player ratings based again on prior research. Essentially, when we say that Steph is a +7.0, we also imply that he is +7.0 +/- some value. No player is *exactly* what they are predicted at. The default here of 2.2 is also a value taken from previous research on the topic by not-me individuals.

Next, we need to build a function that simulates a season of games. This one is kind of hacky but does the trick. Like always, I'm sure there's a better/faster/simpler way to do some of this, but this is my first pass and it seems to work well. It takes in the schedule dataframe and the value for the sd and outputs a list of 1s and 0s where the home team is predicted to win across all the games. Important note - if we assume that the team ratings are normally distributed we can actually skip the simulation. However, I wanted to show the simulation code since that's probably more of use to people than various forms of normal samples.


In [None]:
from numpy.random import uniform

def simulate_season(schedule: pd.DataFrame, sd: float=2.2):
  home_rating_col = schedule.columns.get_loc("nrtg_home")
  away_rating_col = schedule.columns.get_loc("nrtg_away")
  home_win_probs = schedule.apply(lambda x: get_win_prob(x[home_rating_col], x[away_rating_col], sd=2.2), axis=1)
  home_win_checks = uniform(size = len(home_win_probs))
  home_win = (home_win_probs >= home_win_checks).astype(int)
  return home_win.to_numpy()


Now, we're going to do a Monte Carlo (simulation) where we just run the simulation a few hundred times to see what shakes out. If we assumed the team ratings were deterministic, point predictions with no variace, this wouldn't be necessary. This is even more important when you aren't just doing Baby's First Win Projections.

Unlike the R file, we will not be doing this in parallel because notebooks don't always play nice with that and my desire to get that to work perfectly is somewhat low. The good news is that this will not take a ton of time because it's a lightweight function. Next, we tally up wins and losses by each team and do some summing


In [66]:
iterations = 10
season_list = [None] * iterations
for i in np.arange(0, iterations):
    season_list[i] = simulate_season(schedule = schedule_with_ratings)

season_frame = pd.DataFrame(season_list).T


In [80]:
season_list[0]


array([-2, -1, -2, ..., -1, -1, -2])

In [76]:
season_frame_home = season_frame.copy()
season_frame_home['team_name'] = schedule_with_ratings['home_team'] ## this is for the home team
season_frame_home = season_frame_home.groupby('team_name').sum()
season_frame_home.reset_index()




Unnamed: 0,team_name,0,1,2,3,4,5,6,7,8,9
0,Atlanta Hawks,11,8,12,15,11,8,9,8,10,6
1,Boston Celtics,26,25,26,25,21,26,24,23,28,25
2,Brooklyn Nets,14,12,12,18,24,22,17,10,19,16
3,Charlotte Hornets,20,25,19,19,17,16,16,20,17,17
4,Chicago Bulls,14,15,10,11,7,12,13,8,11,10
5,Cleveland Cavaliers,17,13,17,16,8,13,17,17,15,15
6,Dallas Mavericks,14,13,20,10,17,16,16,16,19,18
7,Denver Nuggets,21,23,26,22,24,25,27,25,23,21
8,Detroit Pistons,17,19,16,26,21,16,21,14,22,20
9,Golden State Warriors,35,35,33,33,36,31,37,37,34,34


In [84]:
season_frame_away = 1 * (season_frame == 0) ## flipping 1s and 0s for the away team
season_frame_away['team_name'] = schedule_with_ratings['away_team'] ## this is for the away team
season_frame_away = season_frame_away.groupby('team_name').sum()
season_frame_away.reset_index()

Unnamed: 0,team_name,0,1,2,3,4,5,6,7,8,9
0,Atlanta Hawks,8,15,20,7,11,11,11,11,9,9
1,Boston Celtics,28,27,22,25,23,24,22,22,23,26
2,Brooklyn Nets,15,8,15,10,17,19,15,19,21,15
3,Charlotte Hornets,14,19,17,17,15,22,14,17,16,17
4,Chicago Bulls,9,8,12,12,12,11,9,13,11,11
5,Cleveland Cavaliers,13,10,18,14,15,20,15,18,20,15
6,Dallas Mavericks,15,18,15,15,24,18,10,13,16,15
7,Denver Nuggets,24,25,22,30,27,27,20,25,26,24
8,Detroit Pistons,15,23,18,22,28,14,17,20,17,17
9,Golden State Warriors,35,35,35,32,36,33,36,36,34,34


In [126]:
from scipy.stats import norm

season_frame_total = season_frame_home.append(season_frame_away)
season_frame_total = season_frame_total.groupby("team_name").sum()
season_frame_total = season_frame_total.reset_index()
season_frame_total['mean'] = season_frame_total.mean(axis=1).values
season_frame_total['sd'] = season_frame_total.std(axis=1).values
season_frame_total['best_case'] = norm.ppf(q=0.95, loc=season_frame_total['mean'], scale=season_frame_total['sd'])
season_frame_total['worst_case'] = norm.ppf(q=0.05, loc=season_frame_total['mean'], scale=season_frame_total['sd'])
season_frame_total = season_frame_total[['team_name', 'mean', 'sd', 'best_case', 'worst_case']]
season_frame_total = season_frame_total.sort_values(['mean'], ascending=False)
season_frame_total = season_frame_total.round(1)
season_frame_total

Unnamed: 0,team_name,mean,sd,best_case,worst_case
9,Golden State Warriors,69.1,2.9,73.9,64.3
27,Toronto Raptors,58.7,3.8,65.0,52.4
20,Oklahoma City Thunder,56.8,3.4,62.4,51.2
16,Milwaukee Bucks,56.0,3.5,61.8,50.2
18,New Orleans Pelicans,52.5,4.7,60.2,44.8
28,Utah Jazz,50.0,3.8,56.3,43.7
22,Philadelphia 76ers,50.0,3.9,56.4,43.6
10,Houston Rockets,49.7,4.5,57.1,42.3
17,Minnesota Timberwolves,49.3,2.4,53.3,45.3
1,Boston Celtics,49.1,3.1,54.2,44.0


And there you have it, win projections done as simply as possible!