
# NBA 3-Point Statistics

## Table of Contents
1. [Introduction](#Introduction)
2. [Data Wrangling](#Data-Wrangling)
3. [New NBA Stats](#New-NBA-Stats)
4. [Rankings](#Rankings)
5. [Conclusion](#Conclusion)


## Introduction

There has never been a single metric to determine the best NBA 3-Point shooters. Most fans know the reality from watching the games. Since Steph Curry makes the most, and he shoots a very high percentage, he's widely considered the greatest 3-point shooter of all-time.

Objectively, most fans consider two numbers, 3-Point Percentage, and 3 Pointers Made. Antoine Walker made a ton of threes for Boston in the 90s, but he shot a very low percentage. Steve has the highest shooting percentage of all-time, but he took a very low volume of shots by today's standards.

Is there an empirical way to combine 3-point Percentage and 3-pointers Made into one statistic? Will the statistic verify that Steph Curry is the greatest 3-point shooter of all-time? What about the best 3-point shooting team of all-time?

To answer these questions, I develped 3-point Net Gain, 3PNG. This metric computes the number of points a team gains per possession when the player makes a 3-pointer, minus the number of points a team loses when the player misses a 3-pointer. This metric is evaluated based on the expected value of points on an average possession.

This Jupyter Notebook contains Exploratory Data Analysis of 3-point shooters throughout NBA History. Data Wrangling steps are included for those with an interest in learning pandas, Python's framework for data analysis. Those not interested in pandas can skip directly to [New NBA Stats](#New-NBA-Stats).


## Outline
    


#### References

https://www.kaggle.com/drgilermo/nba-players-stats <br>
https://www.basketball-reference.com/leagues/NBA_2018_totals.html

#### Copyright

Corey J Wade<br>
May 29, 2018

This Jupyter Notebook and the statistics within may be redistributed with credit given to the author, Corey J Wade.


## Data Wrangling

The following csv file is taken from https://www.kaggle.com/drgilermo/nba-players-stats. When I downloaded the file, it contained standard statistics through 2017. Dr. Guillermo scraped it from https://www.basketball-reference.com/. 

#### NBA Stats Through 2017

In [207]:
# import pandas
import pandas as pd

# open file as dataframe, courtesy of Dr. Guillermo via Kaggle
df_2017 = pd.read_csv('Seasons_Stats.csv')

# display first five rows
df_2017.head()

Unnamed: 0.1,Unnamed: 0,Year,Player,Pos,Age,Tm,G,GS,MP,PER,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,0,1950.0,Curly Armstrong,G-F,31.0,FTW,63.0,,,,...,0.705,,,,176.0,,,,217.0,458.0
1,1,1950.0,Cliff Barker,SG,29.0,INO,49.0,,,,...,0.708,,,,109.0,,,,99.0,279.0
2,2,1950.0,Leo Barnhorst,SF,25.0,CHS,67.0,,,,...,0.698,,,,140.0,,,,192.0,438.0
3,3,1950.0,Ed Bartels,F,24.0,TOT,15.0,,,,...,0.559,,,,20.0,,,,29.0,63.0
4,4,1950.0,Ed Bartels,F,24.0,DNN,13.0,,,,...,0.548,,,,20.0,,,,27.0,59.0


Statistics were not widely computed before the modern era, hence the null values. Also, the 3-point shot did not exist before 1979, so we can start there.

In [208]:
# delete unnecessary column
del df_2017['Unnamed: 0']

# only select years after 1979
df_2017 = df[df['Year']>=1979]

# display last five rows
df_2017.tail()

Unnamed: 0.1,Unnamed: 0,Year,Player,Pos,Age,Tm,G,GS,MP,PER,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
24686,24686,2017.0,Cody Zeller,PF,24.0,CHO,62.0,58.0,1725.0,16.7,...,0.679,135.0,270.0,405.0,99.0,62.0,58.0,65.0,189.0,639.0
24687,24687,2017.0,Tyler Zeller,C,27.0,BOS,51.0,5.0,525.0,13.0,...,0.564,43.0,81.0,124.0,42.0,7.0,21.0,20.0,61.0,178.0
24688,24688,2017.0,Stephen Zimmerman,C,20.0,ORL,19.0,0.0,108.0,7.3,...,0.6,11.0,24.0,35.0,4.0,2.0,5.0,3.0,17.0,23.0
24689,24689,2017.0,Paul Zipser,SF,22.0,CHI,44.0,18.0,843.0,6.9,...,0.775,15.0,110.0,125.0,36.0,15.0,16.0,40.0,78.0,240.0
24690,24690,2017.0,Ivica Zubac,C,19.0,LAL,38.0,11.0,609.0,17.0,...,0.653,41.0,118.0,159.0,30.0,14.0,33.0,30.0,66.0,284.0


As expected, all visible columns are full of data.

#### 2018 NBA Stats

The 2018 NBA season recently finished. I used the same link, https://www.basketball-reference.com/, to scrape the 2018 statistics.

In [209]:
# read html file
df_2018, = pd.read_html("https://www.basketball-reference.com/leagues/NBA_2018_totals.html", header=0)

# convert to csv file
df_2018.to_csv("tp_2016.csv", index=False)

# display first five rows
df_2018.head()

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,1,Alex Abrines,SG,24,OKC,75,8,1134,115,291,...,0.848,26,88,114,28,38,8,25,124,353
1,2,Quincy Acy,PF,27,BRK,70,8,1359,130,365,...,0.817,40,216,256,57,33,29,60,149,411
2,3,Steven Adams,C,24,OKC,76,76,2487,448,712,...,0.557,384,301,685,88,92,78,128,215,1056
3,4,Bam Adebayo,C,20,MIA,69,19,1368,174,340,...,0.721,118,263,381,101,32,41,66,138,477
4,5,Arron Afflalo,SG,32,ORL,53,3,682,65,162,...,0.846,4,62,66,30,4,9,21,56,179


Since there is no column for year, I will add one.

In [210]:
# delete unnecessary column
del df_2018['Rk']

# add column for year, place at index 0
df_2018.insert(0, 'Year', 2018.0)

# display last five rows
df_2018.tail()

Unnamed: 0,Year,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
685,2018.0,Tyler Zeller,C,28,BRK,42,33,703,125,229,...,0.667,63,131,194,28,8,21,35,78,300
686,2018.0,Tyler Zeller,C,28,MIL,24,1,406,62,105,...,0.895,47,64,111,19,7,14,12,48,141
687,2018.0,Paul Zipser,SF,23,CHI,54,12,824,81,234,...,0.731,13,118,131,46,20,15,43,86,218
688,2018.0,Ante Zizic,C,21,CLE,32,2,214,49,67,...,0.724,24,36,60,5,2,13,11,30,119
689,2018.0,Ivica Zubac,C,20,LAL,43,0,410,61,122,...,0.765,45,78,123,25,8,15,26,47,161


#### Concatenating Dataframes

Since the dataframes have a different number of columns, and I clearly do not need all of the columns since my focus is on 3-pointers, I will select the relevant columns before concatenating.

In [211]:
# select relevant columns
tp_2017 = df_2017[['Year', 'Tm', 'Player', 'G','MP', 'PTS', '3P', '3PA', '3P%']]
tp_2018 = df_2018[['Year', 'Tm', 'Player', 'G','MP', 'PTS', '3P', '3PA', '3P%']]

# concatenate dataframes
tp = pd.concat([tp_2017, tp_2018], ignore_index=True, )

# show last five rows
tp.tail()

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%
19956,2018.0,BRK,Tyler Zeller,42,703,300,10,26,0.385
19957,2018.0,MIL,Tyler Zeller,24,406,141,0,2,0.0
19958,2018.0,CHI,Paul Zipser,54,824,218,37,110,0.336
19959,2018.0,CLE,Ante Zizic,32,214,119,0,0,
19960,2018.0,LAL,Ivica Zubac,43,410,161,0,1,0.0


#### Column Consistency

In [212]:
# display column info
tp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19961 entries, 0 to 19960
Data columns (total 9 columns):
Year      19961 non-null float64
Tm        19961 non-null object
Player    19961 non-null object
G         19961 non-null object
MP        19961 non-null object
PTS       19961 non-null object
3P        19617 non-null object
3PA       19617 non-null object
3P%       16041 non-null object
dtypes: float64(1), object(8)
memory usage: 1.4+ MB


With the exception of 'Year', the data has not been rendered as numbers. They must be converted to floats for mathematical operations.

In [213]:
# convert numeric columns to decimals
tp.G = pd.to_numeric(tp.G, errors='coerce')
tp.MP = pd.to_numeric(tp.MP, errors='coerce')
tp.PTS = pd.to_numeric(tp.PTS, errors='coerce')
tp['3P'] = pd.to_numeric(tp['3P'], errors='coerce')
tp['3PA'] = pd.to_numeric(tp['3PA'], errors='coerce')
tp['3P%'] = pd.to_numeric(tp['3P%'], errors='coerce')

# check columns
tp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19961 entries, 0 to 19960
Data columns (total 9 columns):
Year      19961 non-null float64
Tm        19961 non-null object
Player    19961 non-null object
G         19935 non-null float64
MP        19935 non-null float64
PTS       19935 non-null float64
3P        19591 non-null float64
3PA       19591 non-null float64
3P%       16015 non-null float64
dtypes: float64(7), object(2)
memory usage: 1.4+ MB


#### Minimum Requirements

It's not necessary to examine data from all players. For instance, if a player was never recorded as taking a 3-pointer, he should be excluded from the dataframe. The same holds for a player who only took one 3-pointer and made it. The purpose of the minimum requirements is to eliminate non-3-point shooters and very low outliers. Note that my minimum requirements are less stringent than many other NBA "qualified" statistics online.

In [214]:
# choose players with more than 20 3's
tp = tp[(tp['3P'] > 20)]

# chosee players with more than 320 mintes played
tp = tp[(tp['MP'] > 320)]

# choose players with at least 41 games
tp = tp[(tp['G'] > 41)]

# display last 5 rows
tp.tail()

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%
19948,2018.0,TOR,Delon Wright,69.0,1433.0,555.0,56.0,153.0,0.366
19951,2018.0,IND,Joe Young,53.0,558.0,207.0,25.0,66.0,0.379
19952,2018.0,GSW,Nick Young,80.0,1393.0,581.0,123.0,326.0,0.377
19953,2018.0,IND,Thaddeus Young,81.0,2607.0,955.0,58.0,181.0,0.32
19958,2018.0,CHI,Paul Zipser,54.0,824.0,218.0,37.0,110.0,0.336


In [215]:
tp.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5036 entries, 354 to 19958
Data columns (total 9 columns):
Year      5036 non-null float64
Tm        5036 non-null object
Player    5036 non-null object
G         5036 non-null float64
MP        5036 non-null float64
PTS       5036 non-null float64
3P        5036 non-null float64
3PA       5036 non-null float64
3P%       5036 non-null float64
dtypes: float64(7), object(2)
memory usage: 393.4+ KB


Now all the columns have the same number of rows.

#### Points Per Possession

The last piece of Data Wrangling is points per possession. It will be used to compute the expected value of points each time a team has the ball. I obtained the team ratings through NBA history at https://www.basketball-reference.com/leagues/NBA_stats.html.

In [216]:
# read html file
df_teams, = pd.read_html("https://www.basketball-reference.com/leagues/NBA_stats.html", header=0)

# display first five rows
df_teams.drop(df_teams.index[0], inplace=True)

df_teams.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Per Game,Shooting,Advanced,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31
1,1,2017-18,NBA,26.4,6-7,218,1230,241.4,39.6,86.1,...,106.3,0.46,0.362,0.767,97.3,0.521,13.0,22.3,0.193,108.6
2,2,2016-17,NBA,26.6,6-7,220,1230,241.6,39.0,85.4,...,105.6,0.457,0.358,0.772,96.4,0.514,12.7,23.3,0.209,108.8
3,3,2015-16,NBA,26.7,6-7,221,1230,241.8,38.2,84.6,...,102.7,0.452,0.354,0.757,95.8,0.502,13.2,23.8,0.209,106.4
4,4,2014-15,NBA,26.7,6-7,222,1230,242.0,37.5,83.6,...,100.0,0.449,0.35,0.75,93.9,0.496,13.3,25.1,0.205,105.6
5,5,2013-14,NBA,26.5,6-7,223,1230,242.0,37.7,83.0,...,101.0,0.454,0.36,0.756,93.9,0.501,13.6,25.5,0.215,106.6


In [217]:
# choose relevant columns
df_PPP = df_teams[['Unnamed: 1','Unnamed: 31']]

# rename columns
df_PPP.columns = ['Year', 'PPP']

# show first give rows
df_PPP.head()

Unnamed: 0,Year,PPP
1,2017-18,108.6
2,2016-17,108.8
3,2015-16,106.4
4,2014-15,105.6
5,2013-14,106.6


In [218]:
df_PPP.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 78 entries, 1 to 78
Data columns (total 2 columns):
Year    75 non-null object
PPP     48 non-null object
dtypes: object(2)
memory usage: 1.8+ KB


In [219]:
# convert year column to year listed before the hyphen
df_PPP['Year'] = df_PPP['Year'].str.split('-').str[0]

# convert columns to numbers
df_PPP['Year'] = pd.to_numeric(df_PPP['Year'], errors='coerce')
df_PPP['PPP'] = pd.to_numeric(df_PPP['PPP'], errors='coerce')

# drop NaN values
df_PPP = df_PPP.dropna()

df_PPP

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,Year,PPP
1,2017.0,108.6
2,2016.0,108.8
3,2015.0,106.4
4,2014.0,105.6
5,2013.0,106.6
6,2012.0,105.8
7,2011.0,104.6
8,2010.0,107.3
9,2009.0,107.6
10,2008.0,108.3


This is exactly what I want except for one minor issues. NBA year are usually evaluated at the end of the year, not the beginning. I need to add one to each row in the 'Year' column.

In [220]:
# add 1 to each year to match tp dataframe
df_PPP['Year'] = df_PPP['Year'] + 1

# show first five entries
df_PPP.head()

Unnamed: 0,Year,PPP
1,2018.0,108.6
2,2017.0,108.8
3,2016.0,106.4
4,2015.0,105.6
5,2014.0,106.6


## New NBA Stats

### Expected Minutes

This first group of statistics computes the number of minutes players are on the court before attemping and making 3's.

#### 1) AM3A : Average Minutes per 3-point Attempt

A player's Average Minutes per 3-Point Attempt is total minutes played divided by total 3-pointers attempted.

In [221]:
# define new column, AM3A: Average Minutes per 3-point Attempt 
tp['AM3A'] = tp['MP'] / tp['3PA']

# show last five entrants
tp.tail()

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,AM3A
19948,2018.0,TOR,Delon Wright,69.0,1433.0,555.0,56.0,153.0,0.366,9.366013
19951,2018.0,IND,Joe Young,53.0,558.0,207.0,25.0,66.0,0.379,8.454545
19952,2018.0,GSW,Nick Young,80.0,1393.0,581.0,123.0,326.0,0.377,4.273006
19953,2018.0,IND,Thaddeus Young,81.0,2607.0,955.0,58.0,181.0,0.32,14.403315
19958,2018.0,CHI,Paul Zipser,54.0,824.0,218.0,37.0,110.0,0.336,7.490909


#### 2) EM3A : Expected Minutes before 3-point Attempt

The expected value of a continuous interval of time is typically at the halfway mark. Will Nick Young (listed above) take a 3 once he checks in, or after 4.27 minutes? His most likely value is halfway between, at 2.135 minutes. This is his expected minutes played before attempting a 3. 

In [222]:
# define new column, EM3A: Expected Minutes before 3-point Attempt
tp['EM3A'] = tp['AM3A'] / 2

# sort dataframe by new category
tp_EM3A = tp.sort_values('EM3A', ascending=True)

# view players who attempt 3s faster than anyone in NBA history
tp_EM3A.head(20)

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,AM3A,EM3A
19859,2018.0,ORL,Marreese Speights,52.0,675.0,402.0,86.0,233.0,0.369,2.896996,1.448498
19699,2018.0,TOR,C.J. Miles,70.0,1337.0,699.0,164.0,454.0,0.361,2.944934,1.472467
18214,2016.0,GSW,Stephen Curry,79.0,2700.0,2375.0,402.0,886.0,0.454,3.047404,1.523702
18045,2015.0,DAL,Charlie Villanueva,64.0,678.0,403.0,83.0,221.0,0.376,3.067873,1.533937
19422,2018.0,GSW,Stephen Curry,51.0,1631.0,1346.0,212.0,501.0,0.423,3.255489,1.627745
18797,2017.0,MEM,Troy Daniels,67.0,1183.0,551.0,138.0,355.0,0.389,3.332394,1.666197
18796,2017.0,GSW,Stephen Curry,79.0,2638.0,1999.0,324.0,789.0,0.411,3.343473,1.671736
17584,2015.0,TOT,Troy Daniels,47.0,397.0,176.0,43.0,118.0,0.364,3.364407,1.682203
18870,2017.0,HOU,Eric Gordon,75.0,2323.0,1217.0,246.0,661.0,0.372,3.514372,1.757186
19710,2018.0,CHO,Malik Monk,63.0,854.0,421.0,83.0,243.0,0.342,3.514403,1.757202


Statistical Notes<ul>
    <li> Many players on the list come off the bench. EM3A does not distinguish between starters and reserves.</li>
    <li> Most top performers are from the last few years, due to the meteoric rise of NBA 3-pointers. </li>
     <li> Joe Hassett from 1982 is a shocker. </li>
    </ul>

#### 3) AM3P : Average Minutes per 3-Pointer

AMP3 is the player's total minutes divided by the number of 3-pointers made. This statistic is more interesting because we are now looking at makes instead of just attempts.

In [223]:
# define new column, AM3P: Average Minutes per 3-pointer
tp['AM3P'] = tp['MP']/tp['3P']

# show last five rows
tp.tail()

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,AM3A,EM3A,AM3P
19948,2018.0,TOR,Delon Wright,69.0,1433.0,555.0,56.0,153.0,0.366,9.366013,4.683007,25.589286
19951,2018.0,IND,Joe Young,53.0,558.0,207.0,25.0,66.0,0.379,8.454545,4.227273,22.32
19952,2018.0,GSW,Nick Young,80.0,1393.0,581.0,123.0,326.0,0.377,4.273006,2.136503,11.325203
19953,2018.0,IND,Thaddeus Young,81.0,2607.0,955.0,58.0,181.0,0.32,14.403315,7.201657,44.948276
19958,2018.0,CHI,Paul Zipser,54.0,824.0,218.0,37.0,110.0,0.336,7.490909,3.745455,22.27027


#### 4) EMB3 : Expected Minutes Before a 3

This is the best statistic of the group. It's how long a player is expected to be on the court before making a 3. As before, Average Minutes per 3-pointer is simply divided by two.

In [224]:
# define new column, EM3P: Expected Minutes before 3-pointer
tp['EM3P'] = tp['AM3P'] / 2

# sort dataframe by new category
tp_EM3P = tp.sort_values('EM3P', ascending=True)

# display top twenty seasons of all-time
tp_EM3P.head(20)

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,AM3A,EM3A,AM3P,EM3P
18214,2016.0,GSW,Stephen Curry,79.0,2700.0,2375.0,402.0,886.0,0.454,3.047404,1.523702,6.716418,3.358209
16089,2012.0,NYK,Steve Novak,54.0,1020.0,477.0,133.0,282.0,0.472,3.617021,1.808511,7.669173,3.834586
19422,2018.0,GSW,Stephen Curry,51.0,1631.0,1346.0,212.0,501.0,0.423,3.255489,1.627745,7.693396,3.846698
19859,2018.0,ORL,Marreese Speights,52.0,675.0,402.0,86.0,233.0,0.369,2.896996,1.448498,7.848837,3.924419
18215,2016.0,CHO,Troy Daniels,43.0,476.0,242.0,59.0,122.0,0.484,3.901639,1.95082,8.067797,4.033898
18796,2017.0,GSW,Stephen Curry,79.0,2638.0,1999.0,324.0,789.0,0.411,3.343473,1.671736,8.141975,4.070988
19699,2018.0,TOR,C.J. Miles,70.0,1337.0,699.0,164.0,454.0,0.361,2.944934,1.472467,8.152439,4.07622
18045,2015.0,DAL,Charlie Villanueva,64.0,678.0,403.0,83.0,221.0,0.376,3.067873,1.533937,8.168675,4.084337
18797,2017.0,MEM,Troy Daniels,67.0,1183.0,551.0,138.0,355.0,0.389,3.332394,1.666197,8.572464,4.286232
19424,2018.0,PHO,Troy Daniels,79.0,1622.0,703.0,183.0,458.0,0.4,3.541485,1.770742,8.863388,4.431694


EM3P Statistical Notes:<ul>
    <li> EMB3, like 3-Point Percentage, levels the playing field between starters and reserves.</li> 
    <li> More restrictive minimum requirements would eliminate many players from this list. </li>
     <li> Steph Curry's legendary 2016 MVP season is a clear # 1. </li>
    </ul>

### 3-Point Average

Although the 3-point statistics above are compelling, they do not a provide a single metric to rank 3-point shooters. This is where 3-point Advantage comes in. Simply put, 3-Point Advantage adds the net gain per 3-pointer made, and subtracts the net loss per 3-pointer missed. 

#### Points Per Possession

Computing net gain and net loss depends on the expected value. Should the expected value be points per possession? Or points per field goal attempt? There is no clear answer. I have decided to utilize the strategy of points per possession. To level the rankings, I will use the mean average of points per possession throughout NBA history. Note that this statistic was first computed in 1974.

In [225]:
ev = df_PPP['PPP'].mean()/100
print(ev)

1.055422222222222


#### Computation

The computation is determined by expected value. When a player makes a 3-pointer, the team gains more than the expected value. How much more? An additional 3 points minus the expected value. When a player misses a 3-pointer, the team does worse than the expected value. How much worse? They lose the expected value.

In [226]:
# 3PAd, 3-Point Advantage

# Compute 3PG, 3-pointers per Game
tp['3PG']=tp['3P']/tp['G']

# Compute 3PAG, 3-point Attempts per Game
tp['3PAG']= tp['3PA']/tp['G']

# Compute 3PMi, 3-point Misses per Game
tp['3PMiG']=tp['3PAG']-tp['3PG']

# Declare expected value
ev = df_PPP['PPP'].mean()/100
                          
# Compute 3PAd, 3-point Advantage
tp['3PAd']=tp['3PG'] * (3 - ev) - tp['3PMiG'] * ev

# (3 - ev) is how much the team gains per 3-pointer made
# -ev is how much the team loses per 3-pointer missed

#### Weighted

It's nice to use the same measure, the mean points per possession, across all years. But is it really justifiable? Teams score more points per possession these days, so it could be argued that a 3-point shot is slightly less valuable than in years past. The expected value can be weighted, by taking the mean points per possession for each year. By changing the expected value, we have a different basis of comparison. 

In [227]:
# merge df_PPP, dataframe including 'Year' and 'PPP', with tp
tpw = tp.merge(df_PPP)

# Declare weighed expected value
evw = tpw['PPP']/100
                          
# Compute 3PAd, 3-point Advantage
tpw['3PAd/w']=tpw['3PG'] * (3 - evw) - tpw['3PMiG'] * evw

In [228]:
# merge df_PPP, dataframe including 'Year' and 'PPP', with tp
tp = tp.merge(df_PPP)

# Declare weighed expected value
evw = tp['PPP']/100
                          
# Compute 3PAd, 3-point Advantage
tp['3PAd/w']=tp['3PG'] * (3 - evw) - tp['3PMiG'] * evw

## Rankings

#### Top 40 Seasons

First, let's consider unweighted.

In [229]:
# sort dataframe by 3PAd
tp=tp.sort_values('3PAd', ascending=False)

# reset index
tp = tp.reset_index(drop=True)

# start index at 1 instead of 0
tp.index = tp.index + 1

In [230]:
# Display top 40 3-point shooting seasons of all-time
tp.head(40)

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,AM3A,EM3A,AM3P,EM3P,3PG,3PAG,3PMiG,3PAd,PPP,3PAd/w
1,2016.0,GSW,Stephen Curry,79.0,2700.0,2375.0,402.0,886.0,0.454,3.047404,1.523702,6.716418,3.358209,5.088608,11.21519,6.126582,3.429062,106.4,3.332861
2,2015.0,ATL,Kyle Korver,75.0,2418.0,911.0,221.0,449.0,0.492,5.385301,2.69265,10.941176,5.470588,2.946667,5.986667,3.04,2.521539,105.6,2.51808
3,2013.0,GSW,Stephen Curry,78.0,2983.0,1786.0,272.0,600.0,0.453,4.971667,2.485833,10.966912,5.483456,3.487179,7.692308,4.205128,2.342906,105.8,2.323077
4,2015.0,GSW,Stephen Curry,80.0,2613.0,1900.0,286.0,646.0,0.443,4.044892,2.022446,9.136364,4.568182,3.575,8.075,4.5,2.202466,105.6,2.1978
5,2018.0,GSW,Stephen Curry,51.0,1631.0,1346.0,212.0,501.0,0.423,3.255489,1.627745,7.693396,3.846698,4.156863,9.823529,5.666667,2.102617,108.6,1.802235
6,2016.0,LAC,J.J. Redick,75.0,2097.0,1226.0,200.0,421.0,0.475,4.980998,2.490499,10.485,5.2425,2.666667,5.613333,2.946667,2.075563,106.4,2.027413
7,2014.0,ATL,Kyle Korver,71.0,2408.0,850.0,185.0,392.0,0.472,6.142857,3.071429,13.016216,6.508108,2.605634,5.521127,2.915493,1.989782,106.6,1.93138
8,1997.0,CHH,Glen Rice,79.0,3362.0,2115.0,207.0,440.0,0.47,7.640909,3.820455,16.241546,8.120773,2.620253,5.56962,2.949367,1.982459,106.7,1.917975
9,2018.0,GSW,Klay Thompson,73.0,2506.0,1461.0,229.0,520.0,0.44,4.819231,2.409615,10.943231,5.471616,3.136986,7.123288,3.986301,1.892883,108.6,1.675068
10,2002.0,MIL,Ray Allen,69.0,2525.0,1503.0,229.0,528.0,0.434,4.782197,2.391098,11.026201,5.5131,3.318841,7.652174,4.333333,1.880247,104.5,1.96


3PAd Statistical Notes:<ul>
    <li> Steph Curry's legendary MVP season is heads and shoulders above the rest, and he dominates the list.</li> 
    <li> The metric does an adequate job of comparing 3-point shooters over the years. </li>
     <li> Different expected values will produce different lists. </li>
    <li> The metric has real meaning. It conveys the points a team gains per game by the player shooting 3-pointers. </li>
    </ul>

#### Top 40 Seasons, Weighted

In [232]:
# sort dataframe by 3PAd
tpw=tp.sort_values('3PAd/w', ascending=False)

# Display top 40 3-point shooting weighted seasons of all-time
tpw.head(40)

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,AM3A,EM3A,AM3P,EM3P,3PG,3PAG,3PMiG,3PAd,PPP,3PAd/w
1,2016.0,GSW,Stephen Curry,79.0,2700.0,2375.0,402.0,886.0,0.454,3.047404,1.523702,6.716418,3.358209,5.088608,11.21519,6.126582,3.429062,106.4,3.332861
2,2015.0,ATL,Kyle Korver,75.0,2418.0,911.0,221.0,449.0,0.492,5.385301,2.69265,10.941176,5.470588,2.946667,5.986667,3.04,2.521539,105.6,2.51808
3,2013.0,GSW,Stephen Curry,78.0,2983.0,1786.0,272.0,600.0,0.453,4.971667,2.485833,10.966912,5.483456,3.487179,7.692308,4.205128,2.342906,105.8,2.323077
4,2015.0,GSW,Stephen Curry,80.0,2613.0,1900.0,286.0,646.0,0.443,4.044892,2.022446,9.136364,4.568182,3.575,8.075,4.5,2.202466,105.6,2.1978
6,2016.0,LAC,J.J. Redick,75.0,2097.0,1226.0,200.0,421.0,0.475,4.980998,2.490499,10.485,5.2425,2.666667,5.613333,2.946667,2.075563,106.4,2.027413
10,2002.0,MIL,Ray Allen,69.0,2525.0,1503.0,229.0,528.0,0.434,4.782197,2.391098,11.026201,5.5131,3.318841,7.652174,4.333333,1.880247,104.5,1.96
7,2014.0,ATL,Kyle Korver,71.0,2408.0,850.0,185.0,392.0,0.472,6.142857,3.071429,13.016216,6.508108,2.605634,5.521127,2.915493,1.989782,106.6,1.93138
11,2012.0,NYK,Steve Novak,54.0,1020.0,477.0,133.0,282.0,0.472,3.617021,1.808511,7.669173,3.834586,2.462963,5.222222,2.759259,1.87724,104.6,1.926444
8,1997.0,CHH,Glen Rice,79.0,3362.0,2115.0,207.0,440.0,0.47,7.640909,3.820455,16.241546,8.120773,2.620253,5.56962,2.949367,1.982459,106.7,1.917975
21,2004.0,SAC,Peja Stojakovic,81.0,3264.0,1964.0,240.0,554.0,0.433,5.891697,2.945848,13.6,6.8,2.962963,6.839506,3.876543,1.670322,102.9,1.851037


It's an interesting comparison. As expected, players from earlier eras, before the 3-point shoot took over, move up the list. Players from higher scoring eras, like the last couple of years, however, move down the list. Consider Steph Curry's drop from 15 to 34. Was his 3-point season not as good because the whole league was better at shooting 3s? 

#### 2018 League Leaders

We can check the league leaders for any given year. Note that for a particular year, weighted and unweighted will provide the same order.

In [146]:
# Create 2018 dataframe
tp_2018 = tp[tp['Year']==2018.0]

# Show top 20 3PAd
tp_2018.head(20)

# Note that weighted v unweighted won't matter as it's the same year

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,3PG,3PAG,3PMiG,3PAd,3PNG/C,seasons
5,2018.0,GSW,Stephen Curry,51.0,1631.0,1346.0,212.0,501.0,0.423,4.156863,9.823529,5.666667,2.102617,-52240.653333,8.0
9,2018.0,GSW,Klay Thompson,73.0,2506.0,1461.0,229.0,520.0,0.44,3.136986,7.123288,3.986301,1.892883,-54194.955556,7.0
36,2018.0,UTA,Joe Ingles,82.0,2578.0,940.0,204.0,464.0,0.44,2.487805,5.658537,3.170732,1.491269,-48359.591111,4.0
50,2018.0,PHI,J.J. Redick,70.0,2116.0,1198.0,193.0,460.0,0.42,2.757143,6.571429,3.814286,1.335797,-47970.422222,11.0
56,2018.0,CLE,Kyle Korver,73.0,1574.0,672.0,164.0,376.0,0.436,2.246575,5.150685,2.90411,1.303579,-39191.875556,16.0
66,2018.0,DET,Reggie Bullock,62.0,1732.0,698.0,125.0,281.0,0.445,2.016129,4.532258,2.516129,1.264941,-29282.364444,2.0
76,2018.0,GSW,Kevin Durant,68.0,2325.0,1792.0,173.0,413.0,0.419,2.544118,6.073529,3.529412,1.222215,-43069.937778,10.0
77,2018.0,SAC,Buddy Hield,80.0,2024.0,1079.0,176.0,408.0,0.431,2.2,5.1,2.9,1.217347,-42533.226667,3.0
91,2018.0,DET,Anthony Tolliver,79.0,1757.0,703.0,159.0,365.0,0.436,2.012658,4.620253,2.607595,1.161657,-38045.911111,11.0
97,2018.0,OKC,Paul George,79.0,2891.0,1734.0,244.0,608.0,0.401,3.088608,7.696203,4.607595,1.14308,-63437.671111,7.0


The Golden State Warriors dominate the list. What about the Houston Rockets? They have the reputation of being a great 3-point shooting team. Let's make a comparison.

#### Warriors v Rockets

In [147]:
# Create 2018 dataframe for GSW and HOU
tp_2018_GSW_HOU = tp_2018[(tp_2018['Tm']=='GSW') | (tp_2018['Tm']=='HOU')]

# Display dataframe
tp_2018_GSW_HOU

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,3PG,3PAG,3PMiG,3PAd,3PNG/C,seasons
5,2018.0,GSW,Stephen Curry,51.0,1631.0,1346.0,212.0,501.0,0.423,4.156863,9.823529,5.666667,2.102617,-52240.653333,8.0
9,2018.0,GSW,Klay Thompson,73.0,2506.0,1461.0,229.0,520.0,0.44,3.136986,7.123288,3.986301,1.892883,-54194.955556,7.0
76,2018.0,GSW,Kevin Durant,68.0,2325.0,1792.0,173.0,413.0,0.419,2.544118,6.073529,3.529412,1.222215,-43069.937778,10.0
639,2018.0,HOU,Chris Paul,58.0,1847.0,1081.0,144.0,379.0,0.38,2.482759,6.534483,4.051724,0.551638,-39568.502222,13.0
675,2018.0,HOU,Ryan Anderson,66.0,1725.0,617.0,131.0,339.0,0.386,1.984848,5.136364,3.151515,0.533513,-35385.813333,9.0
856,2018.0,HOU,James Harden,72.0,2551.0,2191.0,265.0,722.0,0.367,3.680556,10.027778,6.347222,0.458127,-75406.484444,9.0
1200,2018.0,HOU,Trevor Ariza,67.0,2269.0,782.0,170.0,462.0,0.368,2.537313,6.895522,4.358209,0.334253,-48250.506667,9.0
1277,2018.0,GSW,Nick Young,80.0,1393.0,581.0,123.0,326.0,0.377,1.5375,4.075,2.5375,0.311654,-34037.764444,11.0
1658,2018.0,HOU,P.J. Tucker,82.0,2281.0,502.0,115.0,310.0,0.371,1.402439,3.780488,2.378049,0.217306,-32373.088889,7.0
1831,2018.0,HOU,Eric Gordon,69.0,2154.0,1243.0,218.0,608.0,0.359,3.15942,8.811594,5.652174,0.178309,-63515.671111,9.0


Golden State is at the top and bottom, with Houston dominating the middle. To clarify which team is the best at shooting 3's, we can sum 3Pave.

In [148]:
tp_2018_GSW_HOU.groupby('Tm')['3PAd'].sum()

Tm
GSW    4.586971
HOU    2.378259
Name: 3PAd, dtype: float64

Golden State is the clear winner. But how does this Warriors team rank historically?

#### Best 3-Point Shooting Teams of All-Time

In [235]:
tp_teams = tp[tp['Tm'] != 'TOT']

tp_teams = tp_teams.groupby(['Tm','Year'])['3PAd'].sum().sort_values(ascending=False)

tp_teams.head(25)

Tm   Year  
GSW  2016.0    6.374410
     2018.0    4.586971
     2015.0    4.250061
PHO  2006.0    4.203874
     2010.0    3.986143
CHH  1997.0    3.874711
PHO  2007.0    3.837297
MIA  2013.0    3.747401
GSW  2013.0    3.730750
SAS  2014.0    3.441617
GSW  2017.0    3.325918
SEA  2001.0    3.262487
ORL  2009.0    3.211291
SAS  2017.0    3.171001
WSB  1996.0    3.165538
SAC  2004.0    3.157775
CHH  1995.0    3.153557
ORL  2008.0    3.075983
SAS  2011.0    3.046533
BOS  2018.0    3.010927
CHI  1996.0    2.898952
PHO  2008.0    2.894175
IND  2009.0    2.869539
DET  1996.0    2.868601
GSW  2011.0    2.853393
Name: 3PAd, dtype: float64

It's no surprise that the Warriors take the top 3 spots, and that the Suns are close behind. Both teams are responsible for the meteoric rise of the 3-pointer. The Charlotte Hornets from 1997 are a surprise #7, although they had Dell Curry and Glen Rice, two of the all-time greats. Also, 4 of the last 5 NBA champions made the top 10.

#### Best 3-Point Shooting Teams of All-Time, Weighted

In [203]:
tp_teams = tpw[tpw['Tm'] != 'TOT']

tp_teams = tp_teams.groupby(['Tm','Year'])['3PAd/w'].sum().sort_values(ascending=False)

tp_teams.head(25)

Tm   Year  
GSW  2016.0    6.082173
     2015.0    4.233746
PHO  2006.0    4.041437
CHH  1997.0    3.696312
MIA  2013.0    3.687041
GSW  2013.0    3.683147
SAC  2004.0    3.671448
SEA  2001.0    3.630873
PHO  2007.0    3.620564
GSW  2018.0    3.590728
PHO  2010.0    3.510403
SAS  2014.0    3.200167
     2001.0    3.156882
SEA  2004.0    3.061071
MIL  2003.0    2.907489
WSB  1996.0    2.904475
ORL  2012.0    2.877611
IND  2000.0    2.839553
SEA  1998.0    2.818695
SAS  2011.0    2.691125
CHH  1995.0    2.678011
NOP  2015.0    2.654854
SAS  2012.0    2.639373
DAL  2003.0    2.584842
CHI  1996.0    2.560414
Name: 3PAd/w, dtype: float64

Notice the inclusion of more Spurs teams, a Reggie Miller Pacer team that made the finals, and a Ray Allen Bucks team. Also, the Warriors team that won an NBA record 73 games is still way ahead of the rest.

#### Best of the 90s

In [150]:
tp_before2000 = tp[(tp['Year']<2000) & (tp['Year']>1989)]

tp_before2000.head(25)

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,3PG,3PAG,3PMiG,3PAd,3PNG/C,seasons
8,1997.0,CHH,Glen Rice,79.0,3362.0,2115.0,207.0,440.0,0.47,2.620253,5.56962,2.949367,1.982459,-45817.577778,11.0
17,1995.0,PHI,Dana Barros,82.0,3318.0,1686.0,197.0,425.0,0.464,2.402439,5.182927,2.780488,1.737141,-44264.444444,11.0
20,1996.0,ORL,Dennis Scott,82.0,3041.0,1431.0,267.0,628.0,0.425,3.256098,7.658537,4.402439,1.685303,-65479.515556,9.0
23,1996.0,WSB,Tim Legler,77.0,1775.0,726.0,128.0,245.0,0.522,1.662338,3.181818,1.519481,1.628851,-25473.844444,2.0
24,1996.0,SAC,Mitch Richmond*,81.0,2946.0,1872.0,225.0,515.0,0.437,2.777778,6.358025,3.580247,1.622933,-53679.244444,12.0
34,1997.0,IND,Reggie Miller*,81.0,2966.0,1751.0,229.0,536.0,0.427,2.82716,6.617284,3.790123,1.497453,-55883.631111,18.0
40,1996.0,CHI,Steve Kerr,82.0,1919.0,688.0,122.0,237.0,0.515,1.487805,2.890244,1.402439,1.412987,-24647.506667,12.0
48,1996.0,NYK,Hubert Davis,74.0,1773.0,789.0,127.0,267.0,0.476,1.716216,3.608108,1.891892,1.340571,-27798.773333,9.0
49,1997.0,SAC,Mitch Richmond*,81.0,3125.0,2095.0,204.0,477.0,0.428,2.518519,5.888889,3.37037,1.340291,-49731.64,12.0
60,1999.0,MIL,Dell Curry,42.0,864.0,423.0,69.0,145.0,0.476,1.642857,3.452381,1.809524,1.284852,-15096.622222,14.0


The big 90s shooters. Glen Rice, Reggie Miller, Dell Curry, Dennis Scott, Mitch Ritchmond, Dale Ellis. What about the 80s?

#### Best of the 80s

In [151]:
tp_before90 = tp[tp['Year']<1990]

tp_before90.head(25)

Unnamed: 0,Year,Tm,Player,G,MP,PTS,3P,3PA,3P%,3PG,3PAG,3PMiG,3PAd,3PNG/C,seasons
26,1989.0,SEA,Dale Ellis,82.0,3190.0,2253.0,162.0,339.0,0.478,1.97561,4.134146,2.158537,1.563559,-35292.813333,16.0
110,1988.0,TOT,Craig Hodges,66.0,1445.0,629.0,86.0,175.0,0.491,1.30303,2.651515,1.348485,1.110623,-18211.888889,10.0
192,1988.0,MIL,Craig Hodges,43.0,983.0,397.0,55.0,118.0,0.466,1.27907,2.744186,1.465116,0.940934,-12288.982222,10.0
271,1988.0,BOS,Danny Ainge,81.0,3018.0,1270.0,148.0,357.0,0.415,1.82716,4.407407,2.580247,0.829806,-37234.573333,11.0
350,1989.0,CLE,Mark Price,75.0,2728.0,1414.0,93.0,211.0,0.441,1.24,2.813333,1.573333,0.750745,-21990.408889,10.0
358,1988.0,CLE,Mark Price,80.0,2626.0,1279.0,72.0,148.0,0.486,0.9,1.85,0.95,0.747469,-15404.248889,10.0
370,1987.0,BOS,Danny Ainge,71.0,2499.0,1053.0,85.0,192.0,0.443,1.197183,2.704225,1.507042,0.73745,-20009.106667,11.0
384,1989.0,CHI,Craig Hodges,49.0,1112.0,490.0,71.0,168.0,0.423,1.44898,3.428571,1.979592,0.728348,-17518.093333,10.0
386,1986.0,MIL,Craig Hodges,66.0,1739.0,716.0,73.0,162.0,0.451,1.106061,2.454545,1.348485,0.7276,-16878.84,10.0
436,1989.0,MIA,Jon Sundvold,68.0,1338.0,709.0,48.0,92.0,0.522,0.705882,1.352941,0.647059,0.689723,-9565.884444,5.0


Craig Hodges. Mark Price. Larry Bird. More Dale Ellis. The first 3-point shot was made in 1979, so we can't go back much further.

#### Career Totals

How about the most points gained by shooting 3's over their entire career?

In [152]:
# Create new category, use totals instead of per game
tp['3PNG/C'] = tp['3P'] * (3 - ev) - (tp['3PA'] - tp['3P']) * ev

# group by player, and sum over their career
tp_player = tp.groupby('Player')['3PNG/C'].sum()

# order from the top
tp_player = tp_player.sort_values(ascending=False)

# display top 25
tp_player.head(25)

Player
Kyle Korver         1245.264622
Stephen Curry       1199.245644
Ray Allen           1119.032000
Steve Nash           890.282311
Reggie Miller*       834.531467
Klay Thompson        775.436578
J.J. Redick          677.701689
Dale Ellis           649.176044
Chauncey Billups     642.698400
Peja Stojakovic      603.778489
Glen Rice            598.480000
Mike Miller          589.125644
Brent Barry          551.976622
Wesley Person        514.299778
Hubert Davis         506.109956
Rashard Lewis        505.718044
Dirk Nowitzki        501.380489
Jason Terry          494.620933
Allan Houston        479.477822
Dell Curry           478.675511
Steve Kerr           478.481111
Jose Calderon        467.637333
Dana Barros          455.774578
Mike Bibby           440.990578
Ben Gordon           436.718933
Name: 3PNG/C, dtype: float64

An outstanding list. The only players on this list who are not retired, or at the end of their careers are Steph Curry and Klay Thompson. It's amazing to think that Steph Curry has already surpassed Ray Allen. How about comparing this list to the traditional 3 pointers made?

In [153]:
tp.groupby('Player')['3P'].sum().sort_values(ascending=False).head(25)

Player
Ray Allen           3096.0
Reggie Miller*      2560.0
Tim Hardaway        2289.0
Kyle Korver         2286.0
Vince Carter        2284.0
Jason Terry         2243.0
Jamal Crawford      2234.0
Paul Pierce         2128.0
Joe Johnson         2087.0
Stephen Curry       2074.0
Jason Kidd          2072.0
Chauncey Billups    2057.0
Dirk Nowitzki       1904.0
J.R. Smith          1886.0
Rashard Lewis       1770.0
Kobe Bryant         1770.0
Jason Richardson    1749.0
Dale Ellis          1707.0
Peja Stojakovic     1669.0
James Harden        1647.0
Steve Nash          1628.0
LeBron James        1616.0
Mike Bibby          1588.0
Nick Van            1567.0
Klay Thompson       1557.0
Name: 3P, dtype: float64

Most fans would agree that Kobe Bryant, Lebron James and Nick Van Exel are not as good at 3-pointers as Glen Rice, Del Curry and Steve Kerr. Finally, we have a statistic to prove it.

#### Career Averages

In [160]:
#tp['3PAv'] = tp.groupby('Player')['3PAd'].sum()/tp.groupby('Player')['3PAd'].count()

# count number of seasons
tp['seasons'] = tp.groupby('Player')['3PAd'].transform('count')

# require at least 4 seasons
tp_seasons = tp[tp['seasons']>=4]

# compute average, divide sum of player's 3Pave by the number of seasons
tp_av = tp_seasons.groupby('Player')['3PAd'].sum()/tp_seasons.groupby('Player')['3PAd'].count()

# sort and display the top 25
tp_av.sort_values(ascending=False).head(25)

Player
Stephen Curry      2.001557
Klay Thompson      1.437085
Kyle Korver        1.079106
J.J. Redick        0.866753
C.J. McCollum      0.846514
Ray Allen          0.831735
Hubert Davis       0.831196
Steve Novak        0.787593
Steve Nash         0.739509
Peja Stojakovic    0.739333
Wesley Person      0.696190
Glen Rice          0.692731
Joe Ingles         0.674871
Anthony Morrow     0.671677
Bradley Beal       0.659325
Troy Daniels       0.658564
Brent Barry        0.640704
Dennis Scott       0.626870
Danny Green        0.622657
Raja Bell          0.610438
Mike Miller        0.595527
Daniel Gibson      0.593131
Allan Houston      0.593099
Jose Calderon      0.588219
Reggie Miller*     0.587470
Name: 3PAd, dtype: float64

Absolutely incredible. Steph Curry doubles everyone on the list except for his teammate Klay Thompson, and Kyle Korver. I think we can safely answer the question posed at the beginning of this notebook. There is ample statistical evidence that Stephen Curry is the greatest 3-point shooter of all-time. 

## Conclusion

3PNG is a fun, informative and comparative statistic that uses objective criteria to rank 3-point shooters. It is a single metric that combines the number of 3-pointers made with a players 3-point percentage. It stands up well to general expectations about who the best 3-point shooters are in reality. 