# NBA Shot Expectation Value Analysis

Not all shots are created equal.

If you've ever yelled at your TV, "Don't Shoot, don't shoot... nice shot." then you have an intuitive sense of the difference between the outcome and decision quality for each shot. With access to NBA shot log data we are able to quantify the difference between process and outcomes for player in the aggregate to answer the questions: who takes good shots, who makes tough shots?

To collect the shot log data, you can run these programs from the command line:

`python get_player_ids.py`
`python get_shot_data.py`

The first program collects the player ID for each player in the 2018-19 NBA season which is an input for collecting total shot logs for each player.

## Analysis

Steph Curry, James Harden, Kevin Durant, and Klay Thompson are really good. So is Buddy Hield.

#### Load and Munge the Data

In [1]:
import glob
import pandas as pd

def clean_action_type(x):
    if x['ACTION_TYPE'] == 'Pullup Jump shot':
        return 'Pullup Jump shot'    
    if x['ACTION_TYPE'] == 'Step Back Jump shot':
        return 'Step Back Jump shot' 
    if x['ACTION_TYPE'] == 'Driving Layup Shot':
        return 'Driving Layup Shot'
    if 'Layup' in x['ACTION_TYPE']:
        return 'Layup'
    if 'Jump' in x['ACTION_TYPE']:
        return 'Jump Shot - Other'
    if 'Dunk' in x['ACTION_TYPE']:
        return 'Dunk'
    else:
        return 'Other Shot'
    
def clean_shot_type(x):
    if x['SHOT_TYPE'] == '3PT Field Goal':
        return 3
    else:
        return 2
    
def clean_shot_zone(x):
    if 'Corner' in x['SHOT_ZONE_BASIC']:
        return 'Corner 3'
    elif '3' in x['SHOT_ZONE_BASIC']:
        return 'Above The Break 3'
    else:
        return x['SHOT_ZONE_BASIC']
    
def identify_errors(x):
    if '2' in x['SHOT_TYPE'] and ('3' in x['SHOT_ZONE_BASIC'] or x['SHOT_ZONE_BASIC'] == 'Backcourt'):
        return 'Error'
    elif '3' in x['SHOT_TYPE'] and (x['SHOT_ZONE_BASIC'] == 'Mid-Range'):
        return 'Error'
    else:
        return 'Clean'

data_files = glob.glob('../data/shot_data*')
raw_data = []

for data_file in data_files:
    with open(data_file) as json_file:
        data = json.load(json_file)['data']
        for player in data:
            for shot in player['shots']:
                raw_data.append(shot)
            
df_raw = pd.DataFrame(raw_data)
    
df = df_raw\
    .assign(
        shot_type=df_raw.apply(clean_action_type, axis=1),
        shot_value=df_raw.apply(clean_shot_type, axis=1),
        shot_zone=df_raw.apply(clean_shot_zone, axis=1),
        has_error=df_raw.apply(identify_errors, axis=1)
    )\
    .query('has_error != "Error"')

### Which Players are the Best and Worst Shooters?

In [2]:
results = df\
    .groupby(['shot_type', 'shot_zone', 'shot_value'])\
    .agg({'SHOT_MADE_FLAG': ['mean', 'count']})\
    .reset_index()

results.columns = ['shot_type', 'shot_zone', 'shot_value', 'league_average', 'frequency']

results = results\
    .assign(expected_value=lambda x: x['league_average'] * x['shot_value'])\
    .sort_values(by=['expected_value'], ascending=False)

player_aggregates = df[['PLAYER_NAME', 'shot_value', 'shot_zone', 'shot_type', 'SHOT_MADE_FLAG']]\
    .merge(results, how='right', on=['shot_zone', 'shot_type'])\
    .assign(
        points = lambda x: x['SHOT_MADE_FLAG'] * x['shot_value_y'],
        points_above_expectation = lambda x: x['points'] - x['expected_value']
    )\
    .groupby(['PLAYER_NAME'])\
    .agg({
        'expected_value': ['mean','sum', 'count'],
        'points': ['mean','sum', 'count'],
        'points_above_expectation': ['mean','sum', 'count'],    
    })\
    .reset_index()

player_aggregates.columns = ['player_name', 
                            'expected_value_mean', 'expected_value_sum', 'expected_value_count',
                            'points_mean', 'points_sum', 'points_count',
                            'points_above_expectation_mean', 'points_above_expectation_sum', 'points_above_expectation_count']

player_aggregates = player_aggregates\
    .sort_values(by = ['points_above_expectation_sum'], ascending=False)\
    .drop(['expected_value_count', 'points_above_expectation_count', 'points_count'], axis = 1)\
    .round({'expected_value_mean':3, 'expected_value_sum':1, 'points_mean':2, 'points_above_expectation_mean':3, 'points_above_expectation_sum':1})

#### Best Shooters

In [4]:
player_aggregates.head(10)

Unnamed: 0,player_name,expected_value_mean,expected_value_sum,points_mean,points_sum,points_above_expectation_mean,points_above_expectation_sum
451,Stephen Curry,1.018,1364.1,1.21,1618,0.189,253.9
302,Kevin Durant,1.004,1388.9,1.14,1579,0.137,190.1
319,Kyrie Irving,0.982,1217.4,1.11,1382,0.133,164.6
310,Klay Thompson,0.989,1386.8,1.11,1551,0.117,164.2
255,Joe Harris,1.029,770.0,1.24,931,0.215,161.0
54,Buddy Hield,1.012,1375.5,1.12,1521,0.107,145.5
387,Nikola Vucevic,0.999,1353.3,1.1,1486,0.098,132.7
292,Karl-Anthony Towns,1.043,1369.5,1.14,1501,0.1,131.5
93,Danny Green,1.043,656.9,1.24,784,0.202,127.1
53,Bryn Forbes,0.978,773.9,1.14,898,0.157,124.1


#### Worst Shooters

In [6]:
player_aggregates.tail(10).sort_values(by=['points_above_expectation_sum'])

Unnamed: 0,player_name,expected_value_mean,expected_value_sum,points_mean,points_sum,points_above_expectation_mean,points_above_expectation_sum
17,Andre Drummond,1.189,1250.5,1.07,1127,-0.117,-123.5
430,Russell Westbrook,1.019,1499.8,0.94,1379,-0.082,-120.8
304,Kevin Knox,0.994,906.7,0.88,798,-0.119,-108.7
277,Josh Jackson,1.039,873.4,0.91,767,-0.127,-106.4
426,Rondae Hollis-Jefferson,1.046,509.3,0.84,409,-0.206,-100.3
278,Josh Okogie,1.082,549.4,0.89,452,-0.192,-97.4
497,Tyreke Evans,1.041,687.2,0.9,591,-0.146,-96.2
22,Andrew Wiggins,0.998,1203.3,0.92,1111,-0.077,-92.3
514,Willie Cauley-Stein,1.224,907.3,1.11,825,-0.111,-82.3
9,Alex Len,1.227,795.3,1.1,714,-0.125,-81.3


### What Makes These Players Good or Bad?

In [7]:
def player_details(player_name):
    player = df[['PLAYER_NAME', 'shot_value', 'shot_zone', 'shot_type', 'SHOT_MADE_FLAG']]\
        .query(f'PLAYER_NAME == "{player_name}"')\
        .assign(
            points = lambda x: x['SHOT_MADE_FLAG'] * x['shot_value'],
        )\
        .groupby(['shot_type', 'shot_zone', 'shot_value'])\
        .agg({'SHOT_MADE_FLAG': ['mean', 'count']})\
        .reset_index()

    player.columns = ['shot_type', 'shot_zone', 'shot_value', 'shooting_percentage', 'frequency']

    player = player\
        .assign(
            expected_value=lambda x: x['shooting_percentage'] * x['shot_value']
        )\
        .merge(results, how='left', on = ['shot_type', 'shot_zone'])\
        [['shot_type', 'shot_zone', 'shot_value_x', 'shooting_percentage', 'league_average', 'frequency_x']]\
        .assign(
            excess_value=lambda x: x['frequency_x'] * (x['shooting_percentage'] - x['league_average'])
        )\
        .sort_values(by=['excess_value'], ascending=False)

    return player

In [9]:
player_details('Stephen Curry').query('frequency_x > 20')

Unnamed: 0,shot_type,shot_zone,shot_value_x,shooting_percentage,league_average,frequency_x,excess_value
17,Pullup Jump shot,Above The Break 3,3,0.611111,0.363399,126,31.211726
22,Step Back Jump shot,Above The Break 3,3,0.639344,0.388802,61,15.283048
6,Jump Shot - Other,Corner 3,3,0.482143,0.380583,112,11.374667
21,Pullup Jump shot,Mid-Range,2,0.604167,0.430426,48,8.339555
10,Layup,In The Paint (Non-RA),2,0.566667,0.317551,30,7.473485
12,Layup,Restricted Area,2,0.649425,0.606739,174,7.427385
4,Jump Shot - Other,Above The Break 3,3,0.354906,0.344504,479,4.982534
2,Driving Layup Shot,Restricted Area,2,0.607843,0.518521,51,4.555447
7,Jump Shot - Other,In The Paint (Non-RA),2,0.490566,0.412443,53,4.140519
25,Step Back Jump shot,Mid-Range,2,0.425,0.431525,40,-0.260982


In [10]:
player_details('Andre Drummond').query('frequency_x > 20')

Unnamed: 0,shot_type,shot_zone,shot_value_x,shooting_percentage,league_average,frequency_x,excess_value
1,Driving Layup Shot,Restricted Area,2,0.575,0.518521,40,2.259174
14,Other Shot,Restricted Area,2,0.567568,0.532106,37,1.312079
7,Jump Shot - Other,In The Paint (Non-RA),2,0.431818,0.412443,44,0.852507
8,Jump Shot - Other,Mid-Range,2,0.333333,0.376479,24,-1.035506
12,Other Shot,In The Paint (Non-RA),2,0.45045,0.462296,111,-1.31491
4,Jump Shot - Other,Above The Break 3,3,0.190476,0.344504,21,-3.234586
10,Layup,In The Paint (Non-RA),2,0.265625,0.317551,64,-3.323232
3,Dunk,Restricted Area,2,0.868421,0.898484,190,-5.711961
11,Layup,Restricted Area,2,0.523504,0.606739,468,-38.953929


In [11]:
player_details('Kevin Durant').query('frequency_x > 20')

Unnamed: 0,shot_type,shot_zone,shot_value_x,shooting_percentage,league_average,frequency_x,excess_value
8,Jump Shot - Other,Mid-Range,2,0.530864,0.376479,243,37.515499
20,Pullup Jump shot,Mid-Range,2,0.585185,0.430426,135,20.8925
4,Dunk,Restricted Area,2,0.94958,0.898484,119,6.080403
24,Step Back Jump shot,Mid-Range,2,0.642857,0.431525,28,5.917313
2,Driving Layup Shot,Restricted Area,2,0.614035,0.518521,57,5.444323
7,Jump Shot - Other,In The Paint (Non-RA),2,0.449275,0.412443,138,5.082862
15,Other Shot,Mid-Range,2,0.490566,0.414327,53,4.040688
5,Jump Shot - Other,Above The Break 3,3,0.358209,0.344504,268,3.6729
14,Other Shot,In The Paint (Non-RA),2,0.552632,0.462296,38,3.432734
19,Pullup Jump shot,In The Paint (Non-RA),2,0.512821,0.449476,39,2.470425


In [12]:
player_details('Russell Westbrook').query('frequency_x > 20')

Unnamed: 0,shot_type,shot_zone,shot_value_x,shooting_percentage,league_average,frequency_x,excess_value
2,Driving Layup Shot,Restricted Area,2,0.579151,0.518521,259,15.703154
10,Layup,Restricted Area,2,0.661224,0.606739,245,13.348904
5,Jump Shot - Other,Corner 3,3,0.372093,0.380583,43,-0.365083
6,Jump Shot - Other,In The Paint (Non-RA),2,0.391892,0.412443,74,-1.520784
15,Pullup Jump shot,In The Paint (Non-RA),2,0.394737,0.449476,38,-2.080099
0,Driving Layup Shot,In The Paint (Non-RA),2,0.047619,0.231177,21,-3.854719
3,Dunk,Restricted Area,2,0.785714,0.898484,42,-4.736328
7,Jump Shot - Other,Mid-Range,2,0.324786,0.376479,117,-6.048093
13,Pullup Jump shot,Above The Break 3,3,0.285714,0.363399,119,-9.244481
4,Jump Shot - Other,Above The Break 3,3,0.276596,0.344504,235,-15.958464
