# Lab On You Own:  Player Efficiency Rating (PER)

## What is PER?  What does it do?


---

_Hat/tip Derek Topper_


So far we have studied offensive metrics for baseball.  One thing we have seen is that a lot of metrics are built as linear sums of positive and negative contributions.  We will dissect PER (Player Efficiency Rating) in this lab and observe how it works as a metric for player performance.

PER is a comprehensive metric that includes defensive statistics as well as offensive statistics.  While we have so far tried to segregate the two parts of play in baseball, we'll ignore that for now.

Basketball has a lot of moving parts and so the challenge is to figure out what the positive and negative contributions a player can make are and how to value them.

This notebook focuses on calculating Player Efficiency Rating given Raw NBA Player Data. PER looks like a complex, nasty equation but this notebook will break it down and show how it's not so complicated and how it makes an elegant attempt at performance measurenment.

For another in-depth look at PER, check out Justin Jacobs' blog about it:
https://squared2020.com/2017/09/01/breaking-down-player-efficiency-rating/


### TO DO

In the rest of the notebook, you will find ellipses in places where you need to fill in the required code (usually a formula).  At the end of the notebook, you'll find a set of questions to answer.

## 1. Setup and Data (Do not change)

In [None]:
%run ../../utils/notebook_setup.py

In [None]:
from datascience import Table, are
import numpy as np

### The Data

We'll be working the total stats of all players from the 2016-17 NBA season. The metrics contain each player's unique totals for statistics like Points, Assists, Rebounds, Blocks and Steals.

Here are the columns in the table below:
* Rk -- Rank
* Pos -- Position
* Age -- Age of Player at the start of February 1st of that season.
* Tm -- Team
* G -- Games
* GS -- Games Started
* MP -- Minutes Played
* FG -- Field Goals
* FGA -- Field Goal Attempts
* FG% -- Field Goal Percentage
* 3P -- 3-Point Field Goals
* 3PA -- 3-Point Field Goal Attempts
* 3P% -- 3-Point Field Goal Percentage
* 2P -- 2-Point Field Goals
* 2PA -- 2-point Field Goal Attempts
* 2P% -- 2-Point Field Goal Percentage
* eFG% -- Effective Field Goal Percentage
* FT -- Free Throws
* FTA -- Free Throw Attempts
* FT% -- Free Throw Percentage
* ORB -- Offensive Rebounds
* DRB -- Defensive Rebounds
* TRB -- Total Rebounds
* AST -- Assists
* STL -- Steals
* BLK -- Blocks
* TOV -- Turnovers
* PF -- Personal Fouls
* PTS -- Points


Our data has some players appearing more than once. That is because that player was traded or moved teams in the middle of the season. 

*For example:* Quincy Acy played 38 games total (TOT). Of those, 32 games were played for the Brooklyn Nets (BRK) and 6 were played for the Dallas Mavericks (DAL)

In [None]:
player_stats = Table().read_table('NBAPlayerStats2017.csv')
player_stats.show(5)

#### Remove TOT entries


In [None]:
player_stats = player_stats.where('Tm', are.not_equal_to('TOT'))

## 2. PER

John Hollinger was an NBA columnist for ESPN.com for eight years and is currently the Vice President of Basketball Operations for the Memphis Grizzlies. While at ESPN, he coined many advanced metrics in order to quantify player and team performance, such as Player Efficiency Rating (PER), Offensive Efficiency, Defensive Efficiency and Pace Factor.

PER is a rating of a player’s per-minute statistical performance that Hollinger developed to make player comparisons easier, and has become a widely used standard over the past decade. Hollinger has described PER as the sum of <a href http://www.espn.com/nba/columns/story?columnist=hollinger_john&id=2850240> "all a player's positive accomplishments, subtracts the negative accomplishments, and returns a per-minute rating of a player's performance."</a>

As PER attempts to be an all-encompassing number that looks at positive accomplishments, such as field goals, free throws, 3-pointers, assists, rebounds, blocks and steals, and negative results, such as missed shots, turnovers and fouls. 

The formula adds positive stats and subtracts negative ones through a statistical point value system. The rating for each player is then adjusted to a per-minute basis so that no player is negatively impcted by lack of playing time. It is also adjusted for pace of play. In the end, PER serves as one number that attempts to create an overall player score.


### PER Formula
$$
    aPER = uPER \times \frac{lgPace}{tmPace}, \quad PER = aPER \times \frac{15}{lgaPER} 
$$

Where:
* ''uPER'' stands for unadjusted PER
* ''aPER'' stands for pace-adjusted PER
* ''tm'', the prefix, indicating of team rather than of player;
* ''lg'', the prefix, indicating of league rather than of player;
* ''Pace'' is related to the style of play of a team. We'll also get there later.

The basic idea behind $uPER$ is the following:
\begin{align*}
uPER & = \frac{1}{min} \times \Bigg(\Bigg.\\
     & \quad\quad \text{Three Pointers Made} \\
     & \quad\quad + \text{Contributions from Assists} \\
     & \quad\quad + \text{Contributions from FGs} \\
     & \quad\quad + \text{Contributions from FTs} \\
     & \quad\quad - \text{Contributions from TOs} \\
     & \quad\quad - \text{Contributions from Missed FGs} \\
     & \quad\quad - \text{Contributions from Missed FTs} \\
     & \quad\quad + \text{Contributions from Def Rebounds} \\
     & \quad\quad + \text{Contributions from Off Rebounds} \\
     & \quad\quad + \text{Contributions from Steals} \\
     & \quad\quad + \text{Contributions from Blocks} \\
     & \quad\quad - \text{Contributions from Fouls} \\
     & \quad \Bigg.\Bigg)
\end{align*}

## 3. The Components of $uPER$
Let's do the computation for a specific player: Steph Curry.  We're not going to go in the order of the formula but rather start with some foundational quantities and then go from easiest to hardest computations.


In [None]:
curry = player_stats.row(109).asdict()
curry

### A. League Quantities

#### Value of Possession
$VOP$ is the value of a possession and is equal to
$$
    VOP = \frac{lgPTS}{lgFGA - lgORB + lgTO + 0.44 \times lgFTA}
$$
The denominator is an approximation to the number of possessions.

In [None]:
lgPTS = sum(player_stats.column('PTS'))
lgFGA = sum(player_stats.column('FGA'))
lgORB = sum(player_stats.column('ORB'))
lgTOV = sum(player_stats.column('TOV'))
lgFTA = sum(player_stats.column('FTA'))

vop = ...
vop  # 1.0685413540268014

### Defensive Rebound Percentage
Percentage of defensive rebounds grabbed is given by 
$$
    DRBP = \frac{lgTRB - lgORB}{lgTRB}
$$

In [None]:
lgTRB = sum(player_stats.column('TRB'))
lgORB = sum(player_stats.column('ORB'))

drbp = ...
drbp  # 0.7670440745100238

### B. Contributions from Points

#### Three Pointers
Since three point shots are worth an extra point, we need to add in the extra point since it's not accounted for elsewhere.
$$
    \text{Three Pointers Made} = \mathit{3P}
$$

In [None]:
three_pt_contr = ...
three_pt_contr  # 324

#### Field Goals

PER values field goals in three ways:
1. You get 2 points for making any field goal
2. Some field goals are assisted.  We credit 2/3 of a point to the assister so we must deduct that from the 2 points in 1.
3. A further league correction is applied (no one seems to know what this does so we'll just have to accept it).

$$
\text{Contributions from FGs} = \left ( 2 - \frac23 \times \frac{tmAST}{tmFG} + K \times \frac{tmAST}{tmFG} \right ) \times FG
$$
where 
$$
    K =  \frac14 \times \frac{lgAST}{lgFG} \times \frac{lgFT}{lgFG}
$$


The usual way this calculation is presented is as,
$$
\text{Contributions from FGs} = \left ( 2 - \text{factor} \times \frac{tmAST}{tmFG} \right ) \times FG
$$
where discount the value of the FG from 2 to account for assists by using $\text{factor}$ multiplied by the team's assist rate.  The term $\text{factor} \times \frac{tmAST}{tmFG}$ is meant to capture the expected number of FGs which were assisted with $\text{factor}$ driving that expected value.  The largest component of $\text{factor}$ will be that $\frac23$ quantity.

In [None]:
# Team values
team = player_stats.where('Tm', "GSW")
tmAST = sum(team['AST'])
tmFG = sum(team['FG'])

# League values
lgAST = sum(player_stats['AST'])
lgFG = sum(player_stats['FG'])
lgFT = sum(player_stats['FT'])

# Factor
factor = ...

# FGs
FG = curry['FG']

fg_contr = ...
fg_contr  # 1064.1325273986868

#### Free Throws

As with FGs, we need to discount FTs by the expected number of times they were assisted.  For a field goal, we discounted by 2/3. For free throws, we discount them by a lower amount: 1/6.

$$
    \text{Contributions from FTs} = \left ( 1 -  \frac{1}{6} \times \frac{tmAST}{tmFG} \right ) \times FT
$$

In [None]:
FT = curry['FT']
ft_contr = ...
ft_contr  # 286.79808418271045

### C. Rebounding and Assists

#### Assists

As we've seen, assists were determined to have value of $2/3$: an assist directly leads to a bucket but you shouldn't get full credit for the bucket. You get two-thirds of a point from the field goal.

$$ \text{Contributions from Assists} = \frac23 \times AST$$

In [None]:
asts_contr = ...
asts_contr  # 349.3333333333333

#### Defensive Rebounds

Since you are gaining a possession for your team, you should be rewarded for your rebounds but at the rate at which teams offensive rebound.

$$
    \text{Contributions from Def Rebounds} = VOP \times \left(1 - DRBP \right) \times \left(TRB - ORB \right)
$$

$VOP \times \left(1 - DRBP \right)$ represents the expected value of a possession for the opposing by their offensive rebounding.  By securing a defensive rebound, you prevent the opposing team from getting that value, hence why it's positively credited to you.  If teams secure all defensive rebounds, then $DRBP = 1$ and you get no credit because you did what everyone else does: secured a defensive rebound.  As $DRBP$ drops, the value of an individual defensive rebound goes up and you get credited more for each defensive rebound.



In [None]:
TRB = curry['TRB']
ORB = curry['ORB']

drb_contr = ...
drb_contr  # 72.68552769507475

#### Offensive Rebounds

Similar to defensive rebounding, you are extending possession for your team and preventing the other team from gaining a possession.  You should be rewarded for the value of a possession at the rate at which teams defensive rebound.
$$
    \text{Contributions from Off Rebounds} = VOP \times DRBP \times ORB 
$$

$VOP \times DRBP$ represents the expected value of a possession for the opposing by their defensive rebounding.  By securing an offensive rebound, you prevent the opposing team from getting that possession value.  If teams secure nearly all defensive rebounds, then $DRBP \sim 1$ and you get a lot of credit for an offensive rebound because you prevented a possession.  As $DRBP$ drops, the value of an individual offensive rebounding drops and you get credited less for each offensive rebound.



In [None]:
orb_contr = ...
orb_contr  # 49.99671715248571

### D. Defense

#### Steals
Steals lead to a possession for the team so reward with $VOP$
$$
\text{Contributions from Steals} = VOP \times STL
$$

In [None]:
STL = curry['STL']
stl_contr = ...
stl_contr  # 151.7328722718058

#### Blocks
Blocks are rewarded for gaining a possession at the rate at which they are rebounded (you shouldn't get rewarded for a block that is recovered by the other team).
$$
    \text{Contributions from Blocks} = VOP \times DRBP \times BLK 
$$

In [None]:
BLK = curry['BLK']
blk_contr = ...
blk_contr  # 13.933511337577983

### E. The Negatives

#### Turnovers
Turnovers prevent a chance at scoring so we need to dock the value of a possession from the player's rating.
$$
    \text{Contributions from TOs} = VOP \times TO
$$

In [None]:
TO = curry['TOV']
to_contr = ...
to_contr  # 255.38138361240553

#### MIssed FGs
We need to dock the player for missed FGs that got rebounded by the defense.  A missed shot and no offensive rebound means a loss in the value of a possession.
$$
    \text{Contributions from Missed FGs} = VOP \times DRBP \times \left(FGA - FG \right) 
$$

In [None]:
FGA = curry['FGA']
missedfg_contr = ...
missedfg_contr  # 629.4668651329348

#### Missed FTs

A missed free throw provides a value to the opposing team if it's not rebounded by the offense.  

We need to account for how missed FTs that didn't get rebounded by the offense led to a diminished value of the possession (not a full loss like a missed FG).  The arithmetic to account for this is given by,
$$
    \text{Contributions from Missed FTs} = VOP \times 0.44 \times \left(0.44 + 0.56 \times DRBP \right)
         \times \left(FTA - FT \right) 
$$

Like other aspects of PER, it's not immediately clear how this is supposed to work.  I'll let Justin Jacobs, who works for the Orlando Magic, try to explain it:


> Here, we calculate the number of missed free throws, $FTA - FT$. Next we have a deceitful term of $0.44 + 0.56 \times DRBP$. Recalling that $DRBP$ is the defensive rebound percentage, we can rewrite this as $0.44 \cdot ( 1 - DRBP) + DRBP$. The second term is the expected percentage of defensive rebounds on missed free throws that terminate possessions. We multiply by the extra 0.44 to ensure the expected terminated possession. The first term is the expected percentage of free throws that are offensively rebounded. There is an extra 0.44 term. The reason for this is due to the possession continuing for the same offensive team. In this case, if a field goal is attempted, the associated value is absorbed in another term. Hence, the free-throw only contributions are multiplied by a second 0.44 factor. Multiply this term by the league average points per possession and we obtain the expected number of points lost due to missed free throws.



In [None]:
FTA = curry['FTA']
missedft_contr = ...
missedft_contr  # 15.126471672013666

#### Fouls
Fouls lead to opposing points so you should be docked for giving up points.  You should only be docked for giving points above the expected value for those possessions.

\begin{align*}
    \text{Contributions from Fouls} & = \text{Total points from commmited fouls} - \text{Points expected on those possessions} \\
    & = 
    PF \times \frac{lgFT}{lgPF} - PF \times 0.44 \times \frac{lgFTA}{lgPF} \times VOP
\end{align*}

In [None]:
lgPF = sum(player_stats['PF'])

PF = curry['PF']
foul_contr = ...
foul_contr  # 64.12348330192515

#### Curry's $\mathit{uPER}$

We put together all the contributions and we get $\mathit{uPER}$.

In [None]:
MP = curry['MP']
curry_uper = (
    three_pt_contr + asts_contr + fg_contr + ft_contr
    - to_contr - missedfg_contr - missedft_contr
    + drb_contr + orb_contr + stl_contr + blk_contr
    - foul_contr
) / MP
curry_uper  # 0.5111881613542061

## 4. Computing PER

Recall the formula
$$
    \mathit{aPER} = \mathit{uPER} \times \frac{\mathit{lgPace}}{\mathit{tmPace}}, \quad \mathit{PER} = \mathit{aPER} \times \frac{15}{\mathit{lgaPER}} 
$$
where we now have $\mathit{uPER}$ for Steph Curry.

#### $\mathit{lguPER}$
We need to compute $\mathit{uPER}$ for every player.  We take the previous code and put it into a function to compute for every player.

In [None]:
def uPER(player, player_stats):
    # Team values
    team = player_stats.where('Tm', player['Tm'])
    tmAST = sum(team['AST'])
    tmFG = sum(team['FG'])

    # League values
    lgPTS = sum(player_stats['PTS'])
    lgFG = sum(player_stats['FG'])
    lgFGA = sum(player_stats['FGA'])
    lgAST = sum(player_stats['AST'])
    lgFT = sum(player_stats['FT'])
    lgFTA = sum(player_stats['FTA'])
    lgTRB = sum(player_stats['TRB'])
    lgORB = sum(player_stats['ORB'])
    lgTOV = sum(player_stats['TOV'])
    lgPF = sum(player_stats['PF'])

    # Values
    factor = (2 / 3) - (0.5 * (lgAST / lgFG)) / (2 * (lgFG / lgFT))
    vop = lgPTS / (lgFGA - lgORB + lgTOV + (.44 * lgFTA))
    drbp = (lgTRB - lgORB) / lgTRB

    # Stats
    MP = player['MP']
    FG3 = player['3P']
    FG = player['FG']
    FGA = player['FGA']
    AST = player['AST']
    FT = player['FT']
    FTA = player['FTA']
    TRB = player['TRB']
    ORB = player['ORB']
    STL = player['STL']
    TO = player['TOV']
    BLK = player['BLK']
    PF = player['PF']

    # Contributions
    three_pt_contr = FG3
    asts_contr = (2/3) * AST
    fg_contr = (2 - factor * tmAST / tmFG) * FG
    ft_contr = .5 * FT * (2 - tmAST / (3 * tmFG))
    to_contr = TO * vop 
    missedfg_contr = vop * drbp * (FGA - FG)
    missedft_contr = vop * .44 * (.44 + .56 * drbp) * (FTA - FT)
    drb_contr = vop * (1 - drbp) * (TRB - ORB)
    orb_contr = vop * drbp * ORB
    stl_contr = vop * STL
    blk_contr = vop * drbp * BLK
    foul_contr = PF * lgFT / lgPF - PF * .44 * lgFTA / lgPF * vop

    player_uper = (
        three_pt_contr + asts_contr + fg_contr + ft_contr
        - to_contr - missedfg_contr - missedft_contr
        + drb_contr + orb_contr + stl_contr + blk_contr
        - foul_contr
    ) / MP
    return player_uper

In [None]:
# verify
uPER(curry, player_stats), uPER(curry, player_stats) == curry_uper

### A. Compute $\mathit{uPER$} for each player

In [None]:
player_upers = []
for player in player_stats.rows:
    player = player.asdict()
    # the function uPER takes two arguments, a dict like player and the full table player_stats
    player_uper = ...
    player_upers.append(player_uper)
player_stats['uPER'] = player_upers

In [None]:
player_stats.show(20)

### B. Pace and Pace Factor

#### Team and League Pace
Team pace is stored in a separate file.

In [None]:
pace = Table().read_table('Pace.csv')
pace.sort('Pace', descending=True).show()

In [None]:
def get_team_pace(pace, tm):
    return pace.where('Team', tm)['Pace'].item()

In [None]:
lgPace = get_team_pace(pace, 'League Average')
lgPace

#### Pace and Pace Factors for Each Player

We extract the player's team pace from the table `pace` and we compute the pace factor as
$$
    \text{Pace Factor} = \frac{lgPace}{tmPace}
$$

In [None]:
player_paces = []
player_pace_factors = []
for player in player_stats.rows:
    player = player.asdict()
    
    player_pace = get_team_pace(pace, player['Tm'])
    player_paces.append(player_pace)
    
    # compute the pace factor
    player_pace_factor = ...
    player_pace_factors.append(player_pace_factor)
    
player_stats['Pace'] = player_paces
player_stats['Pace Factor'] = player_pace_factors

In [None]:
player_stats.show(5)

### C. $\mathit{aPER}$

We adjust $uPER$ by the pace factor to get $aPER$.
$$
    \mathit{aPER} = \mathit{UPER} \times \frac{\mathit{lgPace}}{\mathit{tmPace}}
$$

In [None]:
player_stats['aPER'] = ...

### D. Weighted Average to get $\mathit{lgaPER}$

We need to use a weighted average to get the $\mathit{lgaPER}$.  Why?  Suppose a player in one minute of action earned an extremely high $\mathit{aPER}$ while over a season, Russel Westbrook earned a lower (but still high) $\mathit{aPER}$.  Without weighting by minutes played, we'd naively treat these players evenly.

$$
    \mathit{lgaPER} = \sum_{\text{Players}} \frac{\mathit{MP}_{\text{Player $i$}}}{\text{Total MP by all players}} \times \mathit{aPER}_{\text{Player $i$}}
$$

In [None]:
weights = player_stats['MP'] / np.sum(player_stats['MP'])

# weighted average by minutes played
lg_aper = np.sum(player_stats['aPER'] * weights)
lg_aper

### E. Compute PER

In [None]:
player_stats['PER'] = ...

Display the top 20 in PER, restricting to players with over 1500 MP.

In [None]:
player_stats.where('MP', are.above_or_equal_to(1500)).\
    sort('PER', descending=True).\
    show(20)

## 5. Why we need to pace adjust

Let's consider a non-pace adjusted version of PER using just $\mathit{uPER}$.

In [None]:
# compute weighted avarage lguPER
lg_uper = np.sum(player_stats['uPER'] * weights)

# compute non-pace adjusted PER
player_stats['PER_nopace'] =  player_stats['uPER'] * 15 / lg_uper

# compare to PER
player_stats['PER_diff'] = player_stats['PER_nopace'] - player_stats['PER']
player_stats['PER_ratio'] = player_stats['PER_nopace'] / player_stats['PER']

It shouldn't surprise you that the players with the largest positive difference between PER and the non-pace adjusted PER also play on teams with high pace (ie. pace factors < 1).  Conversely, the players with the largest negative differences play on teams with low pace (ie. pace factors > 1).

Overall differences between the two versions are about 5%.  While not huge, this can definetly rearrange perspective on a player by properly considering their pace adjustment.

In [None]:
player_stats.where('MP', are.above_or_equal_to(1500)).\
    sort('PER_diff', descending=True).\
    show(10)

In [None]:
player_stats.where('MP', are.above_or_equal_to(1500)).\
    sort('PER_diff', descending=False).\
    show(10)

## 6. Questions

1. In your own words, what do you see as the overall goal of PER?
2. Where does PER use a style of expected value modeling in its calculations?
3. PER was developed in an era where basic box score statistics were the only thing really available.  Where does that lead to issues with how PER values a player's contributions to the team?
4. Lay out any criticisms of PER you might have.  Feel free to research further via google, but present the criticism in your own words.  Here's a prompt to get you started: One can argue that the NBA is a league of do everything superstars surrounded by specialists/role players who need to fit into a cog, especially on defense.  How does PER fail to evaluate players outside of the box score-stuffing stars?
5. What does PER do well?  Don't say "nothing".