# NBA Shot Expectation Value Analysis

Not all shots are created equal.

Anyone who has ever yelled, "Don't Shoot, don't shoot... nice shot." at their TV has an intuitive sense that there is a difference between the outcome and decision quality for each shot. With access to NBA shot log data we are able to quantify the difference between process and outcomes for player in the aggregate to answer the questions: who takes good shots, who makes tough shots?

To collect the shot log data, you can run these programs from the command line:

`python get_player_ids.py`
`python get_shot_data.py`

The first program collects the player ID for each player in the 2018-19 NBA season which is an input for collecting total shot logs for each player.

## Analysis

Steph Curry, James Harden, Kevin Durant, and Klay Thompson are really good. So is Buddy Hield.

In [8]:
import itertools as it
import json
import numpy as np
import pandas as pd

with open('../data/all_shot_data.json') as json_file:  
    data = json.load(json_file)
    
df_raw = pd.DataFrame(list(it.chain.from_iterable([player['shots'] for player in data['data']])))

conditions_action = [
    (df_raw['ACTION_TYPE'] == 'Pullup Jump shot'),
    (df_raw['ACTION_TYPE'] == 'Step Back Jump shot'),
    (df_raw['ACTION_TYPE'] == 'Driving Layup Shot'),
    (df_raw['ACTION_TYPE'].str.contains('Layup')),
    (df_raw['ACTION_TYPE'].str.contains('Jump')),
    (df_raw['ACTION_TYPE'].str.contains('Dunk')),
]

choices_action = ['Pullup Jump Shot', 'Step Back Jump Shot', 'Driving Layup Shot', 'Layup', 'Other Jump Shot', 'Dunk']

conditions_value = [
    (df_raw['SHOT_TYPE'] == '3PT Field Goal')
]

choices_value = [3]

def clean_threes(x):
    if 'Corner' in x:
        return 'Corner Three'
    else:
        return x

df = df_raw\
    .assign(
        action_type_clean= np.select(conditions_action, choices_action, default='Other'),
        shot_value = np.select(conditions_value, choices_value, default = 2),
        shot_zone = lambda x: x['SHOT_ZONE_BASIC'].apply(clean_threes))

results = df\
    .groupby(['action_type_clean', 'shot_zone', 'shot_value'])\
    .agg({'SHOT_MADE_FLAG': ['mean', 'count']})\
    .reset_index()

results.columns = ['action_type_clean', 'shot_zone', 'shot_value', 'league_average', 'frequency']

results = results\
    .assign(expected_value=lambda x: x['league_average'] * x['shot_value'])\
    .sort_values(by=['expected_value'], ascending=False)

player_aggregates = df[['PLAYER_NAME', 'shot_value', 'shot_zone', 'action_type_clean', 'SHOT_MADE_FLAG']]\
    .merge(results, how='left', on=['shot_zone', 'action_type_clean'])\
    .assign(
        points = lambda x: x['SHOT_MADE_FLAG'] * x['shot_value_y'],
        points_above_expectation = lambda x: x['points'] - x['expected_value']
    )\
    .groupby(['PLAYER_NAME'])\
    .agg({
        'expected_value': ['mean','sum', 'count'],
        'points': ['mean','sum', 'count'],
        'points_above_expectation': ['mean','sum', 'count'],    
    })\
    .reset_index()

player_aggregates.columns = ['player_name', 
                            'expected_value_mean', 'expected_value_sum', 'expected_value_count',
                            'points_mean', 'points_sum', 'points_count',
                            'points_above_expectation_mean', 'points_above_expectation_sum', 'points_above_expectation_count']

player_aggregates\
    .sort_values(by = ['points_above_expectation_sum'], ascending=False)\
    .drop(['expected_value_count', 'points_above_expectation_count', 'points_count'], axis = 1)\
    .head(10)

Unnamed: 0,player_name,expected_value_mean,expected_value_sum,points_mean,points_sum,points_above_expectation_mean,points_above_expectation_sum
451,Stephen Curry,0.762223,1668.506704,1.108725,2427,0.346502,758.493296
224,James Harden,0.738551,2160.260926,0.974359,2850,0.235808,689.739074
54,Buddy Hield,0.819023,1762.537309,1.071097,2305,0.252074,542.462691
302,Kevin Durant,0.928549,1990.810094,1.173974,2517,0.245424,526.189906
310,Klay Thompson,0.853897,1952.007515,1.082677,2475,0.228781,522.992485
255,Joe Harris,0.762451,839.458042,1.155313,1272,0.392863,432.541958
319,Kyrie Irving,0.874951,1636.15905,1.104278,2065,0.229327,428.84095
297,Kemba Walker,0.819412,2158.33001,0.981777,2586,0.162365,427.66999
404,Paul George,0.84408,2157.46747,1.007042,2574,0.162963,416.53253
208,JJ Redick,0.844166,1541.44642,1.068456,1951,0.22429,409.55358


In [10]:
%%html
<style>
div.input {
    display:none;
}
</style>

player_aggregates\
    .sort_values(by = ['points_above_expectation_sum'], ascending=False)\
    .drop(['expected_value_count', 'points_above_expectation_count', 'points_count'], axis = 1)\
    .tail(10)



Don't shoot Russel