# Shooting Percentage and Efficiency

All sports are defined as a competition where there are limited resources to achieve the objective.  In baseball, the limited resources are the outs (27 in a 9 inning game).  Plate appearances in baseball are not the limited resource because you can have unlimited of them so long as you aren't out.

In basketball, the limited resource is the possession.  A typical 48 minute NBA game will feature about 100 possessions for each team.  While a team can increase their possessions by increasing how fast they play, that also means the other team gets more possessions.  In fact, if we ignore offensive rebounds and turnovers, the number of possessions the two teams get should be identical (except in the case of some other rare events that can change the possessions). 

Thus, a team must treat possessions as a limited resource that it must use efficiently because it's very difficult to obtain more of them without giving more to your opponent.  

This demo will look at shooting percentage metrics as measures of efficiency.

## Setup

In [None]:
%run ../../utils/notebook_setup.py

## 1. Load Data

We load NBA (and ABA, too, it seems) team season data dating back to 1973.

In [None]:
from datascience import Table, are
import seaborn as sns

from datascience_utils import fill_null, boxplots
from datascience_stats import correlation

nba = Table.read_table('nba_team_season_data.csv')
nba = fill_null(nba, fill_value=0)

### PPG and Eras

We compute Points per Game as well as bucket the team seasons into Eras.  We use three eras: Pre-3pt, Pre-Steph, and Steph.  The Pre-3pt is team seasons before 1979 that had no 3pt attempts.  The ABA adopted the 3pt line before the NBA so consider those seasons to not be Pre-3pt.  Any season not Pre-3pt but before 2012 is considered Pre-Steph.  And any season 2012 or later is in the Steph era.  We use Steph Curry to define the era because that is the season he nearly doubled his 3pt attempts and began our current trajectory to far more 3pt shooting.

In [None]:
# compute ppg
nba['ppg'] = nba['pts'] / nba['g']

# Define some simple eras
# These eras will show a bit of how the teams have evolved
era = []
for row in nba.rows:
    # Pre-3pt: Before 1979 NBA with no 3pt line
    if row.item('fg3a') == 0 and row.item('year') < 1979:
        era.append('Pre-3pt')
    else:
        # Pre-Steph: Before 2012 NBA season
        if row.item('year') < 2012:
            era.append('Pre-Steph')
        # Steph: 2012 NBA season and later
        else:
            era.append('Steph')
            
nba['Era'] = era

In [None]:
nba.show(15)

### 3pt Field Goal Attempts by Era

In [None]:
nba['fg3apg'] = nba['fg3a'] / nba['g']

boxplots(nba, column='fg3apg', by='Era')

## 2. Offensive Rating and Pace

Offensive Rating is defined as,
$$
    \text{Off. Rating} = \text{Points per 100 Possessions}
$$
while Pace* is the number of possessions per 48 minutes.

Offensive Rating informs us about the overall efficiency of a team's offense because it recognizes possessions are limited and we should use them well.  As we'll see, PPG is not a reliable indicator of efficiency because it can be manipulated by increasing the pace.

**A caveat on Pace**: Before play-by-play data was available where one could precisely compute the number of possessions, an estimation formula was needed to compute how many possessions a team had from just the box score performance.  This formula is Basketball Reference.  The core components are pretty simple: possessions are computed from field goal attempts, turnovers, and an estimate based on free throw attempts and offensive rebounds.  Offensive and defensive possessions are each estimated and then averaged to provide a better estimate.
$$
    \text{Possessions} = 
    \frac12 \times \left\{
        \mathrm{Tm FGA} + 
        0.4 \times \mathrm{Tm FTA} - 
        1.07 \times (\mathrm{Tm ORB} / (\mathrm{Tm ORB} + \mathrm{Opp DRB})) \times (\mathrm{Tm FGA} - \mathrm{Tm FG})+ 
        \mathrm{Tm TOV}
    \right\} + 
    \frac12 \times \left\{
        \mathrm{Opp FGA} + 
        0.4 \times \mathrm{Opp FTA} - 
        1.07 \times (\mathrm{Opp ORB} / (\mathrm{Opp ORB} + \mathrm{Tm DRB})) \times (\mathrm{Opp FGA} - \mathrm{Opp FG}) + 
        \mathrm{Opp TOV}
    \right\}
$$

### A. PPG vs Off. Rating

These two should be very related.  And while that's true, the story is nuanced.  The Pre-Steph era shows the weakest relationship and that is probably because we bucketed the 80s, 90s, and 2000s together.  

In [None]:
sns.lmplot(
    x='ppg', y='off_rtg', hue='Era', fit_reg=False,
    data=nba.to_df(), hue_order=['Pre-Steph', 'Steph', 'Pre-3pt']
);

### B. Pace vs PPG

It's quite clear that Pace directly impacts PPG.

In [None]:
sns.lmplot(
    x='pace', y='ppg', hue='Era', fit_reg=False,
    data=nba.to_df(), hue_order=['Pre-Steph', 'Steph', 'Pre-3pt']
);

### C. Pace vs Off. Rating

We should favor Off Rating because while it's related to PPG, it's unaffected by Pace, at least within eras.

In [None]:
sns.lmplot(
    x='pace', y='off_rtg', hue='Era', fit_reg=False,
    data=nba.to_df(), hue_order=['Pre-Steph', 'Steph', 'Pre-3pt']
);

In [None]:
nba.where('Era', 'Steph').scatter('pace', 'off_rtg')

_Questions_

1. What kind of bad things could happen to our analyses if we failed to recognize things like eras in sports?  Why should we try to exploit our knowledge that the NBA evolves and we should not treat a team from 1980 the same as one from 2010?

## 3 Shooting Metrics

### A. Field Goal Percentage

Let's start with the most widely used measure of field goal efficiency: Field Goal Percentage
$$
    \text{FG%} = \frac{\text{Field Goals Made}}{\text{Field Goals Attempted}}
$$

As we'll see, we will begin to think of FG% like batting average.

In [None]:
nba['FG%'] = nba['fg'] / nba['fga']

#### FG% and Off. Rating

It shouldn't come as a shock that teams with higher FG% will have more efficient offenses.  One thing you can see is that with the 3pt shot available, teams have been able to score more efficiently given the same level of FG%.

In [None]:
sns.lmplot(
    x='FG%', y='off_rtg', hue='Era', fit_reg=False,
    data=nba.to_df(), hue_order=['Pre-Steph', 'Steph', 'Pre-3pt']
);

### B. Effective Field Goal Percentage

One of the first "advanced" metrics introduced for basketball is Effective Field Goal Percentage.
Unfortunately, it is not actually a percentage.
$$
    \text{eFG%} = \frac{\text{Field Goals Made} + \frac12\times\text{3pt Field Goals Made}}{\text{Field Goals Attempted}}
$$

The key difference here is that eFG% tries to account for the fact that some shots are worth more.  FG% treated everything the same, hence why it might be like batting average.  Here, eFG% is more like slugging percentage.

In [None]:
nba['eFG%'] = (nba['fg'] + .5 * nba['fg3']) / nba['fga']

#### eFG% and Off. Rating

Now that we account for 3pt shooting with eFG%, we see the little clusters come together and we get a more uniform model for offensive efficiency based on shooting performance.

In [None]:
sns.lmplot(
    x='eFG%', y='off_rtg', hue='Era', fit_reg=False,
    data=nba.to_df(), hue_order=['Pre-Steph', 'Steph', 'Pre-3pt']
);

### C. True Shooting Percentage

True Shooting Percentage, or TS%, is from APBRmetrics, which is an even more advanced version of eFG%.  It is unclear who is directly responsible for TS%.  Like eFG%, it is not actually a percentage.
$$
    \text{TS%} = \frac{\text{Total Points Scored}}{2 \times (\text{Field Goals Attempted} + .44 \times \text{Free Throw Attempts})}
$$

TS% factors in free throw shooting.  TS% is akin to a melding of OBP and SLG in that it properly values a 3pt shot as more than a 2pt shot, but it also doesn't neglect to incorporate free throws like OBP doesn't neglect the walk.

In [None]:
nba['TS%'] = nba['pts'] / (2 * (nba['fga'] + .44 * nba['fta']))

#### TS% and Off. Rating

We'll explore this but it may be hard to tell that TS% has a stronger relationship with Off. Rating than eFG% above.  By not ignoring a huge part of the game, free throws, TS% better captures efficient scoring/usage of possessions and thus better relates with team Off. Rating.

In [None]:
sns.lmplot(
    x='TS%', y='off_rtg', hue='Era', fit_reg=False,
    data=nba.to_df(), hue_order=['Pre-Steph', 'Steph', 'Pre-3pt']
);

## 4. Efficiency across Eras

### The Steph Era

In [None]:
nba_steph = nba.where('Era', 'Steph')
corr_fg_ortg = correlation(nba_steph['FG%'], nba_steph['off_rtg'])
corr_efg_ortg = correlation(nba_steph['eFG%'], nba_steph['off_rtg'])
corr_ts_ortg = correlation(nba_steph['TS%'], nba_steph['off_rtg'])

print("Metric Correlations in the 'Steph' Era")
print("======================================")
print(f"Corr. FG vs Ortg:  {corr_fg_ortg:.3f}")
print(f"Corr. eFG vs Ortg: {corr_efg_ortg:.3f}")
print(f"Corr. TS vs Ortg:  {corr_ts_ortg:.3f}")

### The Pre-Steph Era

In [None]:
nba_presteph = nba.where('Era', 'Pre-Steph')
corr_fg_ortg = correlation(nba_presteph['FG%'], nba_presteph['off_rtg'])
corr_efg_ortg = correlation(nba_presteph['eFG%'], nba_presteph['off_rtg'])
corr_ts_ortg = correlation(nba_presteph['TS%'], nba_presteph['off_rtg'])

print("Metric Correlations in the 'Pre-Steph' Era")
print("==========================================")
print(f"Corr. FG vs Ortg:  {corr_fg_ortg:.3f}")
print(f"Corr. eFG vs Ortg: {corr_efg_ortg:.3f}")
print(f"Corr. TS vs Ortg:  {corr_ts_ortg:.3f}")

### The Pre-3pt Era

In [None]:
nba_pre3 = nba.where('Era', 'Pre-3pt')
corr_fg_ortg = correlation(nba_pre3['FG%'], nba_pre3['off_rtg'])
corr_efg_ortg = correlation(nba_pre3['eFG%'], nba_pre3['off_rtg'])
corr_ts_ortg = correlation(nba_pre3['TS%'], nba_pre3['off_rtg'])

print("Metric Correlations in the 'Pre-3pt' Era")
print("========================================")
print(f"Corr. FG vs Ortg:  {corr_fg_ortg:.3f}")
print(f"Corr. eFG vs Ortg: {corr_efg_ortg:.3f}")
print(f"Corr. TS vs Ortg:  {corr_ts_ortg:.3f}")

_Questions_

1. Why might these correlations have changed?
2. What could be responsible for an increase in correlation?  What else affects Off. Rating if not just shooting performance?

### Turnovers across Eras

Turnovers have a huge impact on Off. Rating: a turnover ends a possession without a shot.  If a team never commits turnovers, then every possession ends in a shot and its Off. Rating should be entirely determined by its shooting.

So if shooting efficiency is more correlated with offensive efficiency more recently, then we perhaps we should expect teams to be turning over the ball less.  This turns out to be the case and is a plausible reason why the correlation of shooting efficiency with offensive efficiency has increased.

In [None]:
nba['tovpg'] = nba['tov'] / nba['g']
boxplots(nba, column='tovpg', by='Era')