
# NBA 3-Point Statistics

## Table of Contents
1. [Introduction](#Introduction)
2. [Data Wrangling](#Data-Wrangling)
3. [New NBA Stats](#New-NBA-Stats)
4. [3NG Rankings](#3NG-Rankings)
5. [Conclusion](#Conclusion)


## Introduction

There has never been a single statistic to determine the best NBA 3-Point shooters. Most fans know the reality from watching the games. Objectively, most fans consider two numbers, 3-point Percentage, and 3-pointers Made. Can these statistics be combined into one? Will the new statistic verify that Steph Curry is the greatest 3-point shooter of all-time? What about the best 3-point shooting team of all-time?

To answer these questions, I present 3NG, short for 3-point Net Gain. 3NG calculates the number of points a team gains per possession when the player makes a 3-pointer, minus the number of points the team loses per possession when the player misses a 3-pointer. The gain is based on the expected value of points per possession in the league.

This Jupyter Notebook contains Exploratory Data Analysis introducing 3NG. It also introduces EM3A, and EM3, Expected Minutes before a 3-point Attempt, and Expected Minutes before a 3-pointer. I use these statistics to rank 3-point shooters throughout NBA History. Data Wrangling steps have been included for those with an interest in pandas. Others may skip directly to [New NBA Stats](#New-NBA-Stats).    


#### References

https://www.kaggle.com/drgilermo/nba-players-stats <br>
https://www.basketball-reference.com/

#### Copyright

Corey J Wade<br>
July 9, 2018

This Jupyter Notebook and the statistics within may be redistributed provided that credit is given to the author, Corey J Wade.


## Data Wrangling

The following csv file is taken from https://www.kaggle.com/drgilermo/nba-players-stats. When I downloaded the file, it contained NBA statistics through 2017. Dr. Guillermo scraped it from https://www.basketball-reference.com/. 

#### NBA Stats Through 2017

In [1]:
# Import pandas
import pandas as pd

# Silence warnings due to chained assignments
pd.options.mode.chained_assignment = None  # default='warn'

# Open file as DataFrame
df_2017 = pd.read_csv('Seasons_Stats.csv')

# Display first five rows
df_2017.head()

Unnamed: 0.1,Unnamed: 0,Year,Player,Pos,Age,Tm,G,GS,MP,PER,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,0,1950.0,Curly Armstrong,G-F,31.0,FTW,63.0,,,,...,0.705,,,,176.0,,,,217.0,458.0
1,1,1950.0,Cliff Barker,SG,29.0,INO,49.0,,,,...,0.708,,,,109.0,,,,99.0,279.0
2,2,1950.0,Leo Barnhorst,SF,25.0,CHS,67.0,,,,...,0.698,,,,140.0,,,,192.0,438.0
3,3,1950.0,Ed Bartels,F,24.0,TOT,15.0,,,,...,0.559,,,,20.0,,,,29.0,63.0
4,4,1950.0,Ed Bartels,F,24.0,DNN,13.0,,,,...,0.548,,,,20.0,,,,27.0,59.0


Basketball statistics were not widely computed before the modern era, hence the null values. The 3-point shot did not exist before 1980, so we can start there.

In [2]:
# Delete unnecessary column
del df_2017['Unnamed: 0']

# Only select years after 1979
df_2017 = df_2017[df_2017['Year']>=1980]

# Display last five rows
df_2017.tail()

Unnamed: 0,Year,Player,Pos,Age,Tm,G,GS,MP,PER,TS%,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
24686,2017.0,Cody Zeller,PF,24.0,CHO,62.0,58.0,1725.0,16.7,0.604,...,0.679,135.0,270.0,405.0,99.0,62.0,58.0,65.0,189.0,639.0
24687,2017.0,Tyler Zeller,C,27.0,BOS,51.0,5.0,525.0,13.0,0.508,...,0.564,43.0,81.0,124.0,42.0,7.0,21.0,20.0,61.0,178.0
24688,2017.0,Stephen Zimmerman,C,20.0,ORL,19.0,0.0,108.0,7.3,0.346,...,0.6,11.0,24.0,35.0,4.0,2.0,5.0,3.0,17.0,23.0
24689,2017.0,Paul Zipser,SF,22.0,CHI,44.0,18.0,843.0,6.9,0.503,...,0.775,15.0,110.0,125.0,36.0,15.0,16.0,40.0,78.0,240.0
24690,2017.0,Ivica Zubac,C,19.0,LAL,38.0,11.0,609.0,17.0,0.547,...,0.653,41.0,118.0,159.0,30.0,14.0,33.0,30.0,66.0,284.0


In [3]:
# Display info
df_2017.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18927 entries, 5727 to 24690
Data columns (total 52 columns):
Year      18927 non-null float64
Player    18927 non-null object
Pos       18927 non-null object
Age       18927 non-null float64
Tm        18927 non-null object
G         18927 non-null float64
GS        18233 non-null float64
MP        18927 non-null float64
PER       18922 non-null float64
TS%       18851 non-null float64
3PAr      18839 non-null float64
FTr       18839 non-null float64
ORB%      18922 non-null float64
DRB%      18922 non-null float64
TRB%      18922 non-null float64
AST%      18922 non-null float64
STL%      18922 non-null float64
BLK%      18922 non-null float64
TOV%      18866 non-null float64
USG%      18922 non-null float64
blanl     0 non-null float64
OWS       18927 non-null float64
DWS       18927 non-null float64
WS        18927 non-null float64
WS/48     18922 non-null float64
blank2    0 non-null float64
OBPM      18927 non-null float64
DBPM    

#### 2018 NBA Stats

The 2018 NBA season recently finished. I used the same link as Dr. Guillermo, https://www.basketball-reference.com/, to scrape the 2018 statistics.

In [4]:
# Read html file
df_2018, = pd.read_html("https://www.basketball-reference.com/leagues/NBA_2018_totals.html", header=0)

# Convert to csv file
df_2018.to_csv("df_2018.csv", index=False)

# Display first five rows
df_2018.head()

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,1,Alex Abrines,SG,24,OKC,75,8,1134,115,291,...,0.848,26,88,114,28,38,8,25,124,353
1,2,Quincy Acy,PF,27,BRK,70,8,1359,130,365,...,0.817,40,216,256,57,33,29,60,149,411
2,3,Steven Adams,C,24,OKC,76,76,2487,448,712,...,0.557,384,301,685,88,92,78,128,215,1056
3,4,Bam Adebayo,C,20,MIA,69,19,1368,174,340,...,0.721,118,263,381,101,32,41,66,138,477
4,5,Arron Afflalo,SG,32,ORL,53,3,682,65,162,...,0.846,4,62,66,30,4,9,21,56,179


Since there is no column for 'Year', I will add one to match the first dataframe.

In [5]:
# Delete unnecessary column
del df_2018['Rk']

# Add column for year, place at index 0
df_2018.insert(0, 'Year', 2018.0)

# Display last five rows
df_2018.tail()

Unnamed: 0,Year,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
685,2018.0,Tyler Zeller,C,28,BRK,42,33,703,125,229,...,0.667,63,131,194,28,8,21,35,78,300
686,2018.0,Tyler Zeller,C,28,MIL,24,1,406,62,105,...,0.895,47,64,111,19,7,14,12,48,141
687,2018.0,Paul Zipser,SF,23,CHI,54,12,824,81,234,...,0.731,13,118,131,46,20,15,43,86,218
688,2018.0,Ante Zizic,C,21,CLE,32,2,214,49,67,...,0.724,24,36,60,5,2,13,11,30,119
689,2018.0,Ivica Zubac,C,20,LAL,43,0,410,61,122,...,0.765,45,78,123,25,8,15,26,47,161


In [6]:
# Display info
df_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 690 entries, 0 to 689
Data columns (total 30 columns):
Year      690 non-null float64
Player    690 non-null object
Pos       690 non-null object
Age       690 non-null object
Tm        690 non-null object
G         690 non-null object
GS        690 non-null object
MP        690 non-null object
FG        690 non-null object
FGA       690 non-null object
FG%       686 non-null object
3P        690 non-null object
3PA       690 non-null object
3P%       625 non-null object
2P        690 non-null object
2PA       690 non-null object
2P%       672 non-null object
eFG%      686 non-null object
FT        690 non-null object
FTA       690 non-null object
FT%       632 non-null object
ORB       690 non-null object
DRB       690 non-null object
TRB       690 non-null object
AST       690 non-null object
STL       690 non-null object
BLK       690 non-null object
TOV       690 non-null object
PF        690 non-null object
PTS       690 non-null o

#### Concatenating DataFrames

Note that the dataframes have a different number of columns. Before concatenating, I select only relevant columns for computing 3-point statistics.

In [7]:
# Select relevant columns
tp_2017 = df_2017[['Year', 'Tm', 'Player', 'G','MP', 'PTS', '3P', '3PA', '3P%']]
tp_2018 = df_2018[['Year', 'Tm', 'Player', 'G','MP', 'PTS', '3P', '3PA', '3P%']]

# Concatenate dataframes
tp = pd.concat([tp_2017, tp_2018], ignore_index=True)

# Show last five rows
tp.tail()

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%
19612,2018.0,BRK,Tyler Zeller,42,703,300,10,26,0.385
19613,2018.0,MIL,Tyler Zeller,24,406,141,0,2,0.0
19614,2018.0,CHI,Paul Zipser,54,824,218,37,110,0.336
19615,2018.0,CLE,Ante Zizic,32,214,119,0,0,
19616,2018.0,LAL,Ivica Zubac,43,410,161,0,1,0.0


#### Column Consistency

In [8]:
# Display info
tp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19617 entries, 0 to 19616
Data columns (total 9 columns):
Year      19617 non-null float64
Tm        19617 non-null object
Player    19617 non-null object
G         19617 non-null object
MP        19617 non-null object
PTS       19617 non-null object
3P        19617 non-null object
3PA       19617 non-null object
3P%       16041 non-null object
dtypes: float64(1), object(8)
memory usage: 1.3+ MB


With the exception of 'Year', the data has not been rendered as numbers. They must be converted to floats for mathematical operations.

In [9]:
# Convert numeric columns to decimals
tp.G = pd.to_numeric(tp.G, errors='coerce')
tp.MP = pd.to_numeric(tp.MP, errors='coerce')
tp.PTS = pd.to_numeric(tp.PTS, errors='coerce')
tp['3P'] = pd.to_numeric(tp['3P'], errors='coerce')
tp['3PA'] = pd.to_numeric(tp['3PA'], errors='coerce')
tp['3P%'] = pd.to_numeric(tp['3P%'], errors='coerce')

# Check columns
tp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19617 entries, 0 to 19616
Data columns (total 9 columns):
Year      19617 non-null float64
Tm        19617 non-null object
Player    19617 non-null object
G         19591 non-null float64
MP        19591 non-null float64
PTS       19591 non-null float64
3P        19591 non-null float64
3PA       19591 non-null float64
3P%       16015 non-null float64
dtypes: float64(7), object(2)
memory usage: 1.3+ MB


#### Points Per Possession

Another piece of Data Wrangling is points per possession. It will be used to compute the expected value of points each time a team has the ball. I obtained the team ratings at https://www.basketball-reference.com/leagues/NBA_stats.html.

In [10]:
# Read html file
df_teams, = pd.read_html("https://www.basketball-reference.com/leagues/NBA_stats.html", header=0)

# Display first five rows
df_teams.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Per Game,Shooting,Advanced,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31
0,Rk,Season,Lg,Age,Ht,Wt,G,MP,FG,FGA,...,PTS,FG%,3P%,FT%,Pace,eFG%,TOV%,ORB%,FT/FGA,ORtg
1,1,2017-18,NBA,26.4,6-7,218,1230,241.4,39.6,86.1,...,106.3,.460,.362,.767,97.3,.521,13.0,22.3,.193,108.6
2,2,2016-17,NBA,26.6,6-7,220,1230,241.6,39.0,85.4,...,105.6,.457,.358,.772,96.4,.514,12.7,23.3,.209,108.8
3,3,2015-16,NBA,26.7,6-7,221,1230,241.8,38.2,84.6,...,102.7,.452,.354,.757,95.8,.502,13.2,23.8,.209,106.4
4,4,2014-15,NBA,26.7,6-7,222,1230,242.0,37.5,83.6,...,100.0,.449,.350,.750,93.9,.496,13.3,25.1,.205,105.6


In [11]:
# Drop first row
df_teams.drop(df_teams.index[0], inplace=True)

# Choose relevant columns
df_PPP = df_teams[['Unnamed: 1','Unnamed: 31']]

# Rename columns
df_PPP.columns = ['Year', 'PPP']

# Show first five rows
df_PPP.head()

Unnamed: 0,Year,PPP
1,2017-18,108.6
2,2016-17,108.8
3,2015-16,106.4
4,2014-15,105.6
5,2013-14,106.6


In [12]:
# Show column info
df_PPP.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 78 entries, 1 to 78
Data columns (total 2 columns):
Year    75 non-null object
PPP     48 non-null object
dtypes: object(2)
memory usage: 1.8+ KB


In [13]:
# Convert 'Year' to year listed before hyphen
df_PPP['Year'] = df_PPP.loc[:,'Year'].str.split('-').str[0]

# Convert columns to numbers
df_PPP['Year'] = pd.to_numeric(df_PPP['Year'], errors='coerce')
df_PPP['PPP'] = pd.to_numeric(df_PPP['PPP'], errors='coerce')

# Drop NaN values
df_PPP = df_PPP.dropna()

# Add 1 to each year, since NBA seasons are maked by the second, not first year
df_PPP['Year'] = df_PPP['Year'] + 1

# Offensive rating is defined by points per 100 possession
# Divide by 100 to convert to points per possession
df_PPP['PPP'] = df_PPP['PPP']/100

# Only choose years with 3-pointers
df_PPP = df_PPP[df_PPP.Year>=1980]

# View DataFrame
df_PPP

Unnamed: 0,Year,PPP
1,2018.0,1.086
2,2017.0,1.088
3,2016.0,1.064
4,2015.0,1.056
5,2014.0,1.066
6,2013.0,1.058
7,2012.0,1.046
8,2011.0,1.073
9,2010.0,1.076
10,2009.0,1.083


In [14]:
# Convert years to ints
tp['Year'] = tp['Year'].astype(int)

#### Duplicate Entries

What if a player gets traded? Basketball Reference lists both their separate team statistics and their individual total statistics per year. This makes sense. When looking at team statistics, the stats only matter for that team, but when looking at individual statistics, only the total stats should count. We need two separate dataframes to handle the two separate cases.

In [15]:
# Copy the dataframe
tp_team = tp.copy()

# Drop rows with TOT as 'Tm'
tp_team = tp_team.drop(tp_team[tp_team['Tm']=='TOT'].index)

# Show last rows
tp_team.tail()

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%
19612,2018,BRK,Tyler Zeller,42.0,703.0,300.0,10.0,26.0,0.385
19613,2018,MIL,Tyler Zeller,24.0,406.0,141.0,0.0,2.0,0.0
19614,2018,CHI,Paul Zipser,54.0,824.0,218.0,37.0,110.0,0.336
19615,2018,CLE,Ante Zizic,32.0,214.0,119.0,0.0,0.0,
19616,2018,LAL,Ivica Zubac,43.0,410.0,161.0,0.0,1.0,0.0


Notice that Tyler Zeller is listed for both teams.

In [16]:
# Copy original dataframe for individuals
tp_ind = tp.copy()

# Drop all rows that list players more than once per year
tp_no_duplicates = tp_ind.drop_duplicates(['Year','Player'], keep=False)

# Create dataframe that only includes 'TOT' as team
tp_Tot = tp_ind[tp_ind['Tm']=='TOT']

# Combine dataframe with no duplicates with dataframe that has TOT as team
tp_ind = pd.concat([tp_no_duplicates, tp_Tot])

# Sort index
tp_ind = tp_ind.sort_index()

# Show last 5 entries
tp_ind.tail()

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%
19610,2018,CHO,Cody Zeller,33.0,627.0,233.0,2.0,3.0,0.667
19611,2018,TOT,Tyler Zeller,66.0,1109.0,441.0,10.0,28.0,0.357
19614,2018,CHI,Paul Zipser,54.0,824.0,218.0,37.0,110.0,0.336
19615,2018,CLE,Ante Zizic,32.0,214.0,119.0,0.0,0.0,
19616,2018,LAL,Ivica Zubac,43.0,410.0,161.0,0.0,1.0,0.0


Notice that Tyler Zeller is only listed for his total.

#### Minimum Requirements

It's not necessary to examine data from all players. If a player was never recorded as taking a 3-pointer, he can be excluded. Players who only took a few 3's may also be excluded so as not to skew the data. My minimum requirements for 3NG are less stringent than other "qualified" statistics. See https://stats.nba.com/help/statminimums/.

In [17]:
# Define functions that establishes minimum requirements for qualified individuals
def min_requirements(data, threes, mins, games):

    # Select players who have made more than a certain number of 3s
    data = data[(data['3P'] > threes)]

    # Select players with a certain number of minutes
    data = data[(data['MP'] > mins)]

    # Select players who have played a certain number of games
    data = data[(data['G'] > games)]

    # return dataframe
    return data

In [18]:
# Create dataframe for qualified individuals
tp_ind_qual = min_requirements(tp_ind, 20, 320, 42)

# Create team dataframe with qualified individuals
tp_team_qual = min_requirements(tp_team, 20, 320, 42)

In [19]:
# Check info for qualified individuals
tp_ind_qual.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4752 entries, 10 to 19614
Data columns (total 9 columns):
Year      4752 non-null int64
Tm        4752 non-null object
Player    4752 non-null object
G         4752 non-null float64
MP        4752 non-null float64
PTS       4752 non-null float64
3P        4752 non-null float64
3PA       4752 non-null float64
3P%       4752 non-null float64
dtypes: float64(6), int64(1), object(2)
memory usage: 371.2+ KB


In [20]:
# Check info for qualified teams
tp_team_qual.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4537 entries, 10 to 19614
Data columns (total 9 columns):
Year      4537 non-null int64
Tm        4537 non-null object
Player    4537 non-null object
G         4537 non-null float64
MP        4537 non-null float64
PTS       4537 non-null float64
3P        4537 non-null float64
3PA       4537 non-null float64
3P%       4537 non-null float64
dtypes: float64(6), int64(1), object(2)
memory usage: 354.5+ KB


Now all columns have the same number of rows.

## New NBA Stats

### Expected Minutes Before 3's

This first group of statistics computes the number of minutes players are on the court before attemping and making 3's.

#### AM3A : Average Minutes per 3-point Attempt

A player's Average Minutes per 3-Point Attempt is total minutes played divided by total 3-pointers attempted.

In [21]:
# Define AM3A, Average Minutes per 3-point Attempt 
tp_ind_qual['AM3A'] = tp_ind_qual['MP'] / tp_ind_qual['3PA']

# Round to 2 decimal places
tp_ind_qual['AM3A']=round(tp_ind_qual['AM3A'], 2)

# Show last five entrants
tp_ind_qual.tail()

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,AM3A
19604,2018,TOR,Delon Wright,69.0,1433.0,555.0,56.0,153.0,0.366,9.37
19607,2018,IND,Joe Young,53.0,558.0,207.0,25.0,66.0,0.379,8.45
19608,2018,GSW,Nick Young,80.0,1393.0,581.0,123.0,326.0,0.377,4.27
19609,2018,IND,Thaddeus Young,81.0,2607.0,955.0,58.0,181.0,0.32,14.4
19614,2018,CHI,Paul Zipser,54.0,824.0,218.0,37.0,110.0,0.336,7.49


#### EM3A : Expected Minutes before 3-point Attempt

The expected value of a continuous interval of time is at the halfway mark. Will Nick Young (listed above) take a 3 once he checks in, or after 4.27 minutes? His most likely value is halfway between, at 2.135 minutes. This is his expected minutes played before attempting a 3.

In [22]:
# Define EM3A, Expected Minutes before 3-point Attempt
tp_ind_qual['EM3A'] = tp_ind_qual['AM3A'] / 2

# Round to 2 decimal places
tp_ind_qual['EM3A']=round(tp_ind_qual['EM3A'], 2)

# Sort DataFrame by new category
tp_EM3A = tp_ind_qual.sort_values('EM3A', ascending=True)

# Reset index
tp_EM3A = tp_EM3A.reset_index(drop=True)

# Start index at 1 instead of 0
tp_EM3A.index = tp_EM3A.index + 1

#### EM3A Top Twenty

In [23]:
# View players who attempt 3s faster than anyone in NBA history
tp_EM3A.head(20)

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,AM3A,EM3A
1,2018,ORL,Marreese Speights,52.0,675.0,402.0,86.0,233.0,0.369,2.9,1.45
2,2018,TOR,C.J. Miles,70.0,1337.0,699.0,164.0,454.0,0.361,2.94,1.47
3,2016,GSW,Stephen Curry,79.0,2700.0,2375.0,402.0,886.0,0.454,3.05,1.52
4,2015,DAL,Charlie Villanueva,64.0,678.0,403.0,83.0,221.0,0.376,3.07,1.54
5,2018,GSW,Stephen Curry,51.0,1631.0,1346.0,212.0,501.0,0.423,3.26,1.63
6,2017,MEM,Troy Daniels,67.0,1183.0,551.0,138.0,355.0,0.389,3.33,1.66
7,2017,GSW,Stephen Curry,79.0,2638.0,1999.0,324.0,789.0,0.411,3.34,1.67
8,2015,TOT,Troy Daniels,47.0,397.0,176.0,43.0,118.0,0.364,3.36,1.68
9,2017,HOU,Eric Gordon,75.0,2323.0,1217.0,246.0,661.0,0.372,3.51,1.76
10,2018,MIA,Wayne Ellington,77.0,2041.0,864.0,227.0,579.0,0.392,3.53,1.76


EM3A Notes<ul>
    <li> Many players on the list come off the bench. EM3A does not distinguish between starters and reserves.</li>
    <li> Most top performers are from the last few years, due to the meteoric rise of NBA 3-pointers. </li>
     <li> Joe Hassett from 1982 is a shocker! </li>
    </ul>

#### AM3 : Average Minutes per 3-Pointer

AM3 computes the average minutes played per 3-pointer Made.

In [24]:
# Define AM3, Average Minutes per 3-pointer made
tp_ind_qual['AM3'] = tp_ind_qual['MP']/tp_ind_qual['3P']

# Round to 2 decimal places
tp_ind_qual['AM3']=round(tp_ind_qual['AM3'], 2)

# Show last five rows
tp_ind_qual.tail()

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,AM3A,EM3A,AM3
19604,2018,TOR,Delon Wright,69.0,1433.0,555.0,56.0,153.0,0.366,9.37,4.68,25.59
19607,2018,IND,Joe Young,53.0,558.0,207.0,25.0,66.0,0.379,8.45,4.22,22.32
19608,2018,GSW,Nick Young,80.0,1393.0,581.0,123.0,326.0,0.377,4.27,2.13,11.33
19609,2018,IND,Thaddeus Young,81.0,2607.0,955.0,58.0,181.0,0.32,14.4,7.2,44.95
19614,2018,CHI,Paul Zipser,54.0,824.0,218.0,37.0,110.0,0.336,7.49,3.74,22.27


#### EM3 : Expected Minutes Before a 3

This is my favorite statistic of the group. It's how long a player is expected to be on the court before making a 3. EM3 is AM3 divided by two.

In [25]:
# Define EM3, Expected Minutes before 3-pointer
tp_ind_qual['EM3'] = tp_ind_qual['AM3'] / 2

# Round to 2 decimal places
tp_ind_qual['EM3']=round(tp_ind_qual['EM3'], 2)

# Sort DataFrame by new category
tp_EM3 = tp_ind_qual.sort_values('EM3', ascending=True)

# Reset index
tp_EM3 = tp_EM3.reset_index(drop=True)

# Start index at 1 instead of 0
tp_EM3.index = tp_EM3.index + 1

#### EM3 Top Twenty

In [26]:
# Display top twenty seasons of all-time
tp_EM3.head(20)

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,AM3A,EM3A,AM3,EM3
1,2016,GSW,Stephen Curry,79.0,2700.0,2375.0,402.0,886.0,0.454,3.05,1.52,6.72,3.36
2,2018,GSW,Stephen Curry,51.0,1631.0,1346.0,212.0,501.0,0.423,3.26,1.63,7.69,3.84
3,2012,NYK,Steve Novak,54.0,1020.0,477.0,133.0,282.0,0.472,3.62,1.81,7.67,3.84
4,2018,ORL,Marreese Speights,52.0,675.0,402.0,86.0,233.0,0.369,2.9,1.45,7.85,3.92
5,2016,CHO,Troy Daniels,43.0,476.0,242.0,59.0,122.0,0.484,3.9,1.95,8.07,4.04
6,2017,GSW,Stephen Curry,79.0,2638.0,1999.0,324.0,789.0,0.411,3.34,1.67,8.14,4.07
7,2015,DAL,Charlie Villanueva,64.0,678.0,403.0,83.0,221.0,0.376,3.07,1.54,8.17,4.08
8,2018,TOR,C.J. Miles,70.0,1337.0,699.0,164.0,454.0,0.361,2.94,1.47,8.15,4.08
9,2017,MEM,Troy Daniels,67.0,1183.0,551.0,138.0,355.0,0.389,3.33,1.66,8.57,4.28
10,2018,PHO,Troy Daniels,79.0,1622.0,703.0,183.0,458.0,0.4,3.54,1.77,8.86,4.43


EMB3 Statistical Notes:<ul>
    <li> EMB3 measures how quickly shooters make 3-pointers upon taking the court.</li> 
    <li> More restrictive minimum requirements could eliminate reserves. I prefer leaving them in. </li>
     <li> Steph Curry's legendary 2016 MVP season is a clear # 1. </li>
    </ul>

I prefer EM3A and EM3 to AM3A and AM3. They are shorter, more informative, and have a better ring. Since AM3A and AM3 are just doubles of EM3A and EM3, they can be eliminated without losing any valuable information.

In [27]:
# Delete extraneous columns
del tp_ind_qual['AM3A'] 
del tp_ind_qual['AM3']

I have reindexed twice, and expect to do so again. It's always better to write a function instead of copying and pasting.

In [28]:
# Define reindex function that starts at 1
def reindex_start_1(data):
    
    # Reset index
    data = data.reset_index(drop=True)

    # Start index at 1 instead of 0
    data.index = data.index + 1
    
    # Return new index
    return data.index

### 3NG

The 3-point statistics above are compelling, but they do not a provide a single statistic to rank all 3-point shooters. This is where 3NG, or 3-point Net Gain comes in. 3NG adds what the team gains beyond the expected value, and subtracts what the team loses beyond the expected value, for each 3-pointer attempted. 

#### Points Per Possession

3NG depends on the expected value. Should the expected value be points per possession? Or points per field goal attempt? I have chosen points per possession since each time a team has the ball, this is what they are expected to earn. I will use mean points per possession throughout NBA history. The statistic was first computed in 1974.

In [29]:
# Define ev, expected value, as points per possession
ev = df_PPP['PPP'].mean()

# Display ev
print('Avg. Points Per Possession:', ev)

Avg. Points Per Possession: 1.0644871794871795


This is very close to what current teams average at 1.08

#### 3NG Formula

When a player makes a 3-pointer, the team gains an extra 3 points minus the expected value. When a player misses a 3-pointer, the team loses the expected value.

In [30]:
# Function that returns a dataframe orded by a new column, 3NG
def threeNG(data, threesMade, threesAttempted, totalGames, expectedValue, teams=False):

    # Compute 3PG, 3-pointers per Game
    data['3PG']=threesMade/totalGames

    # Round to 2 decimal places
    data['3PG']=round(data['3PG'], 2)

    # Compute 3PAG, 3-point Attempts per Game
    data['3PAG']= threesAttempted/totalGames

    # Round to 2 decimal places
    data['3PAG']=round(data['3PAG'], 2)

    # Compute 3-point Misses per Game
    tp_misses = data['3PAG'] - data['3PG']
    
    # Shorten notation for expectedValue
    ev = expectedValue
                          
    # Compute 3NG, 3-point Advantage
    data['3NG']=data['3PG'] * (3 - ev) - tp_misses * ev

    # (3 - ev) is what the team gains per 3-pointer made
    # -ev is what the team loses per 3-pointer missed
    
    # Round to 2 decimal places
    data['3NG']=round(data['3NG'], 2)
    
    # Sort dataframe by 3NG
    data = data.sort_values('3NG', ascending=False)

    # Reset index for individuals only (not teams)
    if teams != True:
        data.index = reindex_start_1(data)
    
    return data

#### Apply 3NG to Individuals

In [31]:
# Define expected value as the mean points per possession during 3-point era
expected_value = df_PPP['PPP'].mean()

# Apply 3NG to qualified individual stats
tp_ind_qual = threeNG(tp_ind_qual, tp_ind_qual['3P'], tp_ind_qual['3PA'], tp_ind_qual['G'], expected_value)

# Apply 3NG to individual stats
tp_ind = threeNG(tp_ind, tp_ind['3P'], tp_ind['3PA'], tp_ind['G'], expected_value)

#### Apply 3NG to Teams

In [32]:
# Group by Team and Year, and sum colums
tp_teams = tp_team.groupby(['Tm','Year']).sum()

# Give correct 3-point percentage, not sum
tp_teams['3P%'] = tp_teams['3P']/tp_teams['3PA']

# Apply 3NG to team statistics
tp_teams = threeNG(tp_teams, tp_teams['3P'], tp_teams['3PA'], 82, expected_value, teams=True)

## 3NG Rankings

#### The Top 20

In [33]:
# Display top 20 3-point shooting seasons of all-time
tp_ind_qual.head(20)

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,EM3A,EM3,3PG,3PAG,3NG
1,2016,GSW,Stephen Curry,79.0,2700.0,2375.0,402.0,886.0,0.454,1.52,3.36,5.09,11.22,3.33
2,2015,ATL,Kyle Korver,75.0,2418.0,911.0,221.0,449.0,0.492,2.7,5.47,2.95,5.99,2.47
3,2013,GSW,Stephen Curry,78.0,2983.0,1786.0,272.0,600.0,0.453,2.48,5.48,3.49,7.69,2.28
4,2015,GSW,Stephen Curry,80.0,2613.0,1900.0,286.0,646.0,0.443,2.02,4.57,3.58,8.07,2.15
5,2016,LAC,J.J. Redick,75.0,2097.0,1226.0,200.0,421.0,0.475,2.49,5.24,2.67,5.61,2.04
6,2018,GSW,Stephen Curry,51.0,1631.0,1346.0,212.0,501.0,0.423,1.63,3.84,4.16,9.82,2.03
7,2014,ATL,Kyle Korver,71.0,2408.0,850.0,185.0,392.0,0.472,3.07,6.51,2.61,5.52,1.95
8,1997,CHH,Glen Rice,79.0,3362.0,2115.0,207.0,440.0,0.47,3.82,8.12,2.62,5.57,1.93
9,2018,GSW,Klay Thompson,73.0,2506.0,1461.0,229.0,520.0,0.44,2.41,5.47,3.14,7.12,1.84
10,2012,NYK,Steve Novak,54.0,1020.0,477.0,133.0,282.0,0.472,1.81,3.84,2.46,5.22,1.82


3NG Statistical Notes:<ul>
    <li> Steph Curry's legendary MVP season is heads and shoulders above the rest, and he dominates the list as a player.</li> 
    <li> 3NG does a nice job of comparing 3-point shooters over the years. </li>
    <li> 3NG has real meaning. It conveys the actual points a team gains beyond the average by the player shooting 3-pointers. </li>
    </ul>

#### Weighted

It's telling to use the same measure, mean points per possession, across all years. But is it justifiable? Teams score more points per possession these days, so it could be argued that 3-pointers were more valuable in years past. The expected value can be weighted, by taking the mean points per possession for each given year. 

In [34]:
# Function to compute weighted 3NG from dataframe that already contains 3NG
def threeNG_weighted(data, teams=False):

    # Merge df_PPP, dataframe with 'Year' and 'PPP', with the current dataframe
    data = data.merge(df_PPP)

    # Compute 3-point Misses per Game
    tp_misses =data['3PAG'] - data['3PG']
                          
    # Compute 3NG using weighted expected value
    data['3NG/w'] = data['3PG'] * (3 - data['PPP']) - tp_misses * data['PPP']

    # Round to 2 decimal places
    data['3NG/w']=round(data['3NG/w'], 2)
    
    # Sort dataframe by 3NG/w
    data = data.sort_values('3NG/w', ascending=False)
    
    # Reset index for individuals only (not teams)
    if teams != True:
        data.index = reindex_start_1(data)
    
    # Keep dataframe tight by eliminating unnecessary columns
    data.drop(['MP','PPP'], axis=1, inplace=True)
    
    # Return dataframe with 3NG/w
    return data

#### The Top 20, Weighted

In [35]:
# Created dataframe that includes weighted 3NG
tp_ind_qual_w = threeNG_weighted(tp_ind_qual)

# Show top twenty weighted 3NG
tp_ind_qual_w.head(20)

Unnamed: 0,Year,Tm,Player,G,PTS,3P,3PA,3P%,EM3A,EM3,3PG,3PAG,3NG,3NG/w
1,2016,GSW,Stephen Curry,79.0,2375.0,402.0,886.0,0.454,1.52,3.36,5.09,11.22,3.33,3.33
2,2015,ATL,Kyle Korver,75.0,911.0,221.0,449.0,0.492,2.7,5.47,2.95,5.99,2.47,2.52
3,2013,GSW,Stephen Curry,78.0,1786.0,272.0,600.0,0.453,2.48,5.48,3.49,7.69,2.28,2.33
4,2015,GSW,Stephen Curry,80.0,1900.0,286.0,646.0,0.443,2.02,4.57,3.58,8.07,2.15,2.22
5,2016,LAC,J.J. Redick,75.0,1226.0,200.0,421.0,0.475,2.49,5.24,2.67,5.61,2.04,2.04
6,2002,MIL,Ray Allen,69.0,1503.0,229.0,528.0,0.434,2.39,5.52,3.32,7.65,1.82,1.97
7,2014,ATL,Kyle Korver,71.0,850.0,185.0,392.0,0.472,3.07,6.51,2.61,5.52,1.95,1.95
8,2012,NYK,Steve Novak,54.0,477.0,133.0,282.0,0.472,1.81,3.84,2.46,5.22,1.82,1.92
9,1997,CHH,Glen Rice,79.0,2115.0,207.0,440.0,0.47,3.82,8.12,2.62,5.57,1.93,1.92
10,2004,SAC,Peja Stojakovic,81.0,1964.0,240.0,554.0,0.433,2.94,6.8,2.96,6.84,1.6,1.84


The values are very close. Some players from earlier eras, like Ray Allen, move up the list, but others, like Glen Rice, actually move down. It depends on how many points per possession the league averaged that year. Consider Klay Thompson's 2018 drop from 9 to 16. Was his 3-point season not as valuable because the league was better at shooting 3s?

I prefer unweighted as the default statistic because it provides one basis of comparison. I do not think that shooters lose value because others have improved. For now, I will return to unweighted as the default ranking.

#### 2018 League Leaders

We can check the league leaders for any given year. Note that for a particular year, weighted and unweighted will provide the same order.

In [36]:
# Create 2018 dataframe
tp_2018 = tp_ind_qual[tp_ind_qual['Year']==2018.0]

# Reset index
tp_2018.index = reindex_start_1(tp_2018)

# Show top 10 3NG
tp_2018.head(10)

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,EM3A,EM3,3PG,3PAG,3NG
1,2018,GSW,Stephen Curry,51.0,1631.0,1346.0,212.0,501.0,0.423,1.63,3.84,4.16,9.82,2.03
2,2018,GSW,Klay Thompson,73.0,2506.0,1461.0,229.0,520.0,0.44,2.41,5.47,3.14,7.12,1.84
3,2018,UTA,Joe Ingles,82.0,2578.0,940.0,204.0,464.0,0.44,2.78,6.32,2.49,5.66,1.45
4,2018,PHI,J.J. Redick,70.0,2116.0,1198.0,193.0,460.0,0.42,2.3,5.48,2.76,6.57,1.29
5,2018,CLE,Kyle Korver,73.0,1574.0,672.0,164.0,376.0,0.436,2.1,4.8,2.25,5.15,1.27
6,2018,DET,Reggie Bullock,62.0,1732.0,698.0,125.0,281.0,0.445,3.08,6.93,2.02,4.53,1.24
7,2018,SAC,Buddy Hield,80.0,2024.0,1079.0,176.0,408.0,0.431,2.48,5.75,2.2,5.1,1.17
8,2018,GSW,Kevin Durant,68.0,2325.0,1792.0,173.0,413.0,0.419,2.82,6.72,2.54,6.07,1.16
9,2018,DET,Anthony Tolliver,79.0,1757.0,703.0,159.0,365.0,0.436,2.4,5.52,2.01,4.62,1.11
10,2018,BOS,Kyrie Irving,60.0,1931.0,1466.0,166.0,407.0,0.408,2.37,5.82,2.77,6.78,1.09


The Golden State Warriors dominate the list. What about the Houston Rockets? They made more 3-pointers in 2018 than any team in NBA history.

#### 2018 Warriors v Rockets

In [37]:
# Create 2018 dataframe for GSW and HOU with qualified individuals only
tp_2018_GSW_HOU_qual = tp_2018[(tp_2018['Tm']=='GSW') | (tp_2018['Tm']=='HOU')]

# Display DataFrame with 2018 rankings 
tp_2018_GSW_HOU_qual

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,EM3A,EM3,3PG,3PAG,3NG
1,2018,GSW,Stephen Curry,51.0,1631.0,1346.0,212.0,501.0,0.423,1.63,3.84,4.16,9.82,2.03
2,2018,GSW,Klay Thompson,73.0,2506.0,1461.0,229.0,520.0,0.44,2.41,5.47,3.14,7.12,1.84
8,2018,GSW,Kevin Durant,68.0,2325.0,1792.0,173.0,413.0,0.419,2.82,6.72,2.54,6.07,1.16
47,2018,HOU,Chris Paul,58.0,1847.0,1081.0,144.0,379.0,0.38,2.44,6.42,2.48,6.53,0.49
52,2018,HOU,Ryan Anderson,66.0,1725.0,617.0,131.0,339.0,0.386,2.54,6.58,1.98,5.14,0.47
67,2018,HOU,James Harden,72.0,2551.0,2191.0,265.0,722.0,0.367,1.76,4.82,3.68,10.03,0.36
77,2018,GSW,Nick Young,80.0,1393.0,581.0,123.0,326.0,0.377,2.13,5.66,1.54,4.08,0.28
78,2018,HOU,Trevor Ariza,67.0,2269.0,782.0,170.0,462.0,0.368,2.46,6.68,2.54,6.9,0.28
97,2018,HOU,P.J. Tucker,82.0,2281.0,502.0,115.0,310.0,0.371,3.68,9.91,1.4,3.78,0.18
120,2018,HOU,Eric Gordon,69.0,2154.0,1243.0,218.0,608.0,0.359,1.77,4.94,3.16,8.81,0.1


Golden State is at the top and bottom, while Houston dominates the middle. MVP James Harden is 67th overall. To find out which team is the best at shooting 3's with at full strength, we can sum 3NG from the qualified shooters.

In [38]:
# Sum 3NG for GSW and HOU
tp_2018_GSW_HOU_qual.groupby('Tm')['3NG'].sum()

Tm
GSW    4.32
HOU    1.95
Name: 3NG, dtype: float64

Trades, injuries and unqualified players complicate matters. If we sum three-pointers over the entire season, we will get different results.   

In [39]:
# Limit team statistics to 2018
tp_teams_2018 = tp_teams.iloc[tp_teams.index.get_level_values('Year') == 2018]

# Only choose GSW or HOU
tp_teams_2018_GSW_HOU = tp_teams_2018.iloc[(tp_teams_2018.index.get_level_values('Tm') == 'GSW') | (tp_teams_2018.index.get_level_values('Tm') == 'HOU')]

# Display 3NG only
tp_teams_2018_GSW_HOU['3NG']

Tm   Year
GSW  2018    3.12
HOU  2018    0.91
Name: 3NG, dtype: float64

Golden State is the clear winner, more than doubling Houston on both counts. How do the Warriors rank historically?

#### Best 3-Point Shooting Teams of All-Time

In [40]:
# Convert teams dataframe to top 20 showing 3NG only
pd.DataFrame(tp_teams['3NG'].head(20))

Unnamed: 0_level_0,Unnamed: 1_level_0,3NG
Tm,Year,Unnamed: 2_level_1
GSW,2016,5.74
PHO,2010,3.72
CHH,1997,3.69
GSW,2015,3.53
PHO,2006,3.41
PHO,2007,3.2
GSW,2018,3.12
CLE,2017,2.95
GSW,2013,2.88
PHO,2005,2.83


The Warriors and the 7-seconds-or-less Suns dominate the list. The Charlotte Hornets from '97 are a surprise until one recalls that they had Dell Curry and Glen Rice.

The top twenty has the following distribution:

NBA Champions: 6/20 - 30%

NBA Finals: 8/20 - 40%

Conference Finalists: 12/20 - 60%

Conference Semi-finalists: 16/20 - 80%

Made Playoffs: 20/20 - 100%

It's not a stretch to say that 3NG can be a predictor of playoff success. 

See https://basketball.realgm.com/nba/playoffs/history.

#### Best of the 90s

In [41]:
# Create 90s DataFrame
tp_90s = tp_ind_qual[(tp_ind_qual['Year']<2000) & (tp_ind_qual['Year']>1989)]

# Reset index
tp_90s.index = reindex_start_1(tp_90s)

# Display top 20 3-point shooting seasons of all-time for the 90s 
tp_90s.head(20)

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,EM3A,EM3,3PG,3PAG,3NG
1,1997,CHH,Glen Rice,79.0,3362.0,2115.0,207.0,440.0,0.47,3.82,8.12,2.62,5.57,1.93
2,1995,PHI,Dana Barros,82.0,3318.0,1686.0,197.0,425.0,0.464,3.9,8.42,2.4,5.18,1.69
3,1996,ORL,Dennis Scott,82.0,3041.0,1431.0,267.0,628.0,0.425,2.42,5.7,3.26,7.66,1.63
4,1996,WSB,Tim Legler,77.0,1775.0,726.0,128.0,245.0,0.522,3.62,6.94,1.66,3.18,1.59
5,1996,SAC,Mitch Richmond*,81.0,2946.0,1872.0,225.0,515.0,0.437,2.86,6.54,2.78,6.36,1.57
6,1997,IND,Reggie Miller*,81.0,2966.0,1751.0,229.0,536.0,0.427,2.76,6.48,2.83,6.62,1.44
7,1996,CHI,Steve Kerr,82.0,1919.0,688.0,122.0,237.0,0.515,4.05,7.86,1.49,2.89,1.39
8,1996,NYK,Hubert Davis,74.0,1773.0,789.0,127.0,267.0,0.476,3.32,6.98,1.72,3.61,1.32
9,1997,SAC,Mitch Richmond*,81.0,3125.0,2095.0,204.0,477.0,0.428,3.28,7.66,2.52,5.89,1.29
10,1998,CLE,Wesley Person,82.0,3198.0,1204.0,192.0,447.0,0.43,3.58,8.33,2.34,5.45,1.22


The big 90s shooters. Glen Rice, Reggie Miller, Dell Curry, Dennis Scott, Mitch Ritchmond, Dale Ellis.

#### Best of the 80s

In [42]:
# Create 80s dataframe
tp_80s = tp_ind_qual[tp_ind_qual['Year']<1990]
# The first 3-point shot was recorded in 1980.

# Reset index
tp_80s.index = reindex_start_1(tp_80s)

# Display top 20
tp_80s.head(20)

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,EM3A,EM3,3PG,3PAG,3NG
1,1989,SEA,Dale Ellis,82.0,3190.0,2253.0,162.0,339.0,0.478,4.7,9.85,1.98,4.13,1.54
2,1988,TOT,Craig Hodges,66.0,1445.0,629.0,86.0,175.0,0.491,4.13,8.4,1.3,2.65,1.08
3,1988,BOS,Danny Ainge,81.0,3018.0,1270.0,148.0,357.0,0.415,4.22,10.2,1.83,4.41,0.8
4,1988,CLE,Mark Price,80.0,2626.0,1279.0,72.0,148.0,0.486,8.87,18.24,0.9,1.85,0.73
5,1987,BOS,Danny Ainge,71.0,2499.0,1053.0,85.0,192.0,0.443,6.51,14.7,1.2,2.7,0.73
6,1989,CLE,Mark Price,75.0,2728.0,1414.0,93.0,211.0,0.441,6.46,14.66,1.24,2.81,0.73
7,1986,MIL,Craig Hodges,66.0,1739.0,716.0,73.0,162.0,0.451,5.36,11.91,1.11,2.45,0.72
8,1989,MIA,Jon Sundvold,68.0,1338.0,709.0,48.0,92.0,0.522,7.27,13.94,0.71,1.35,0.69
9,1988,SEA,Dale Ellis,75.0,2790.0,1938.0,107.0,259.0,0.413,5.38,13.04,1.43,3.45,0.62
10,1989,TOT,Craig Hodges,59.0,1204.0,529.0,75.0,180.0,0.417,3.34,8.02,1.27,3.05,0.56


Craig Hodges. Mark Price. Larry Bird. Danny Ainge. More Dale Ellis.

#### Career Totals

How about the most points gained by shooting 3's over their entire career?

In [43]:
# Create 3NG/c using same formula as 3NG, but use totals instead of per game
tp_ind['3NG/c'] = tp_ind['3P'] * (3 - ev) - (tp_ind['3PA'] - tp_ind['3P']) * ev

# Round to 2 decimal places
tp_ind['3NG/c'] = round(tp_ind['3NG/c'], 2)

# Group by player, and sum over their career
tp_player = tp_ind.groupby('Player', as_index=False)['3NG/c'].sum()

# Order from the top
tp_player = tp_player.sort_values('3NG/c', ascending=False)

# Reindex
tp_player.index = reindex_start_1(tp_player)

# Display top 25
tp_player.head(25)

Unnamed: 0,Player,3NG/c
1,Stephen Curry,1192.3
2,Kyle Korver,1178.17
3,Ray Allen,1010.91
4,Steve Nash,861.98
5,Reggie Miller*,775.75
6,Klay Thompson,741.98
7,J.J. Redick,640.76
8,Dale Ellis,615.9
9,Mike Miller,607.86
10,Peja Stojakovic,604.77


The only players on this list who are not retired, or at the end of their careers are Steph Curry and Klay Thompson. It's astonishing that Steph Curry is already number one. 

This list is much better than 3-pointers Made. Unless you think Jason Kidd was better than Peja Stojakovich, and Vince Carter was better than Dale Ellis at 3-pointers. 

See https://www.basketball-reference.com/leaders/fg3_career.html for the comparison.

#### Career Averages

We can also examine career averages by taking a player's 3NG for each year and dividing by the total number of years.

In [44]:
# Aggregate 3NG sum and total years for each player
tp_3Py = tp_ind_qual.groupby('Player', as_index=False).agg({'3NG':'sum','Year':'count'})

# Only select players with at least 5 years in the league
tp_3Py = tp_3Py[tp_3Py['Year']>=5]

# To obtain career averages, divide 3NG sum by total years
tp_3Py['3NG/y'] = tp_3Py['3NG']/tp_3Py['Year']

# Round to 2 decimal places
tp_3Py['3NG/y'] = round(tp_3Py['3NG/y'], 2)

# Sort values in descending order
tp_3Py = tp_3Py[['Player','3NG/y']].sort_values('3NG/y', ascending=False) 

# Reset index
tp_3Py.index = reindex_start_1(tp_3Py)

# Show top 25 career 3NG averages
tp_3Py.head(25)

Unnamed: 0,Player,3NG/y
1,Stephen Curry,1.94
2,Klay Thompson,1.38
3,Kyle Korver,1.08
4,J.J. Redick,0.91
5,Hubert Davis,0.83
6,Ray Allen,0.78
7,Steve Novak,0.76
8,Steve Nash,0.71
9,Peja Stojakovic,0.69
10,Dennis Scott,0.68


Steph Curry doubles everyone on the list except for teammate Klay Thompson, and Kyle Korver. By all counts, he's the greatest 3-point shooter of all-time.

## Conclusion

Three new NBA statistics have been presented, EM3A, EM3, and 3NG. EM3A, Expected Minutes before a 3-point Attempt could be of value to coaches preparing for opponents and working with their own players. EM3, Expected Minutes before a 3, is a fun statistic that could be used for similar reasons. 3NG, 3-point Net Gain, is a powerful statistic that provides a single number to rank 3-point shooters across all seasons.

3NG rewards players for making 3-point shots, and penalizes them for missing. Players that make a lot of 3s, but shoot a low percentage are exposed as making slight contributions to their teams. Players who shoot a high percentage need to make a high volume to be competitive. 3NG rankings are statisically verifiable while simultaneously communicating valuable information.

3NG reveals the net gain in points beyond the league average that a player adds to his team by shooting 3-pointers. It can be weighted, summed, or displayed as per game averages. It can be used as a barometer to determine whether a player shooting 3-pointers results in a net gain or net loss for the team. It can also be used to predict playoff success.

3NG can be further used to analyze playoff performers and clutch 3-point shooters. It can be used during basketball seasons past and future. It can be used in any league, WNBA, college, high school, etc., provided that an appropiate expected value, like points per possession, is used as a baseline of comparison.