# Determining a Point System for Fantasy Basketball


### Background
PER is an advanced statistic in basketball created by John Hollinger to try to create a single statistic that represents how good a player is and encompasses all other statistics.  The goal behind this project was to determine a fair scoring method for the upcoming 2017-2018 fantasy basketball season. Four different scoring models/point systems were created and were each scored against PER. The four point systems were:

    1. Last year's point system - what we used last year for fantasy basketball 
        'FGA': -0.9,
        'FG': 2.0,
        'FTA': -1.50,
        'FT': 2.0,
        '3P': 6.0,
        'PTS': 1.0,
        'TRB': 3.0,
        'AST': 4.0,
        'STL': 6.0,
        'BLK': 6.0,
        'TOV': 4.0
        
    2. Bleacher Report's version of PER - a bleacher report article attempted to linearize per 
        'FG': 1.591,
        'STL': 0.998,
        '3P': 0.958,
        'FT': 0.868,
        'BLK': 0.726,
        'ORB': 0.726,
        'AST': 0.642,
        'DRB': 0.272,
        'PF': 0.318,
        'FTM': 0.372,
        'FGM': 0.726,
        'TOV': 0.998
        
    3. John Hollinger's Game Score method - a simplified version of per
        'PTS': 1,
        'FG': 0.4 
        'FGA': -0.7
        'FTM': -0.4
        'ORB': 0.7
        'DRB': 0.3
        'STL': 1
        'AST': 0.7
        'BLK': 0.7
        'PF': 0.4
        'TOV': -1
        
    4. A linear regression against PER - a linear regression with the stats listed below against PER 
        'TOV': -0.0216,
        'Threes': 0.0097,
        'AST': 0.0110,
        'FG': 0.0419,
        'FGA': -0.0131,
        'FT': 0.0307,
        'FTA': -0.0195,
        'TRB': 0.0069,
        'ORB': 0.0097,
        'DRB': -0.0029,
        'BLK': 0.0004,
        'PF': 0.0468,
        'STL': 0.0087



PER Wikipedia Page: https://en.wikipedia.org/wiki/Player_efficiency_rating

PER Rankings from 2016-17: http://insider.espn.com/nba/hollinger/statistics/_/year/2017

Bleacher Report Method:  http://bleacherreport.com/articles/113144-cracking-the-code-how-to-calculate-hollingers-per-without-all-the-mess

Game Score Method: https://www.nbastuffer.com/analytics101/game-score/
    
    
    


In [331]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import warnings
from IPython.display import display, HTML
warnings.filterwarnings('ignore')
def first2(s):
    return s[:2]
player_df = pd.read_csv('./data/Players.csv')
stats_df = pd.read_csv('./data/Seasons_Stats.csv')
team_df = pd.read_csv('./data/team_totals.csv').rename(index = str, columns={'3P': 'Threes'})
record_df = pd.read_csv('./data/records.csv', header=1)
record_df['Overall'] = record_df['Overall'].apply(first2)
record_dict = dict(zip(record_df.Team, record_df.Overall))
team_df['Wins'] = team_df['Team'].map(record_dict)
player_df = player_df.drop(['Unnamed: 0'], axis = 1)
stats_df = stats_df.drop(['Unnamed: 0'], axis = 1)
stats_df['FTM'] = stats_df['FTA'] - stats_df['FT']
stats_df['FGM'] = stats_df['FGA'] - stats_df['FG']
stats_df = stats_df[stats_df['Year'] == 2017]
stats_df['MPG'] = stats_df['MP']/82
qualified_per = stats_df[stats_df['MPG'] > 6.9].drop_duplicates('Player', 'last').rename(index = str, columns={'3P': 'Threes'})


In [332]:
qualified_per.columns

Index(['Year', 'Player', 'Pos', 'Age', 'Tm', 'G', 'GS', 'MP', 'PER', 'TS%',
       '3PAr', 'FTr', 'ORB%', 'DRB%', 'TRB%', 'AST%', 'STL%', 'BLK%', 'TOV%',
       'USG%', 'blanl', 'OWS', 'DWS', 'WS', 'WS/48', 'blank2', 'OBPM', 'DBPM',
       'BPM', 'VORP', 'FG', 'FGA', 'FG%', 'Threes', '3PA', '3P%', '2P', '2PA',
       '2P%', 'eFG%', 'FT', 'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL',
       'BLK', 'TOV', 'PF', 'PTS', 'FTM', 'FGM', 'MPG'],
      dtype='object')

### Last Year Scoring
Using last year's weights ignoring DD and TD

In [333]:
lastyear_weights = {
    'FGA': -0.9,
    'FG': 2.0,
    'FTA': -1.50,
    'FT': 2.0,
    '3P': 6.0,
    'PTS': 1.0,
    'TRB': 3.0,
    'AST': 4.0,
    'STL': 6.0,
    'BLK': 6.0,
    'TOV': 4.0
}
qualified_per['lastyear'] = (qualified_per['FGA'] * lastyear_weights['FGA'] + qualified_per['FTA'] * lastyear_weights['FTA'] + 
    qualified_per['FT'] * lastyear_weights['FT'] + qualified_per['FG'] * lastyear_weights['FG'] +
    qualified_per['Threes'] * lastyear_weights['3P'] + qualified_per['PTS'] * lastyear_weights['PTS'] +
    qualified_per['TRB'] * lastyear_weights['TRB'] + qualified_per['AST'] * lastyear_weights['AST'] +
    qualified_per['BLK'] * lastyear_weights['BLK'] +  qualified_per['TOV'] * lastyear_weights['TOV'] +
    qualified_per['STL'] * lastyear_weights['STL'])


### Bleacher Report Version of Per

In [334]:
br_weight = {
    'FG': 1.591,
    'STL': 0.998,
    '3P': 0.958,
    'FT': 0.868,
    'BLK': 0.726,
    'ORB': 0.726,
    'AST': 0.642,
    'DRB': 0.272,
    'PF': 0.318,
    'FTM': 0.372,
    'FGM': 0.726,
    'TOV': 0.998
}

qualified_per['br_per'] = (qualified_per['FG'] * br_weight['FG'] + qualified_per['STL'] * br_weight['STL'] + 
    qualified_per['Threes'] * br_weight['3P'] + qualified_per['FT'] * br_weight['FT'] +
    qualified_per['BLK'] * br_weight['BLK'] + qualified_per['ORB'] * br_weight['ORB'] +
    qualified_per['AST'] * br_weight['AST'] + qualified_per['DRB'] * br_weight['DRB'] +
    qualified_per['PF'] * br_weight['PF'] +  qualified_per['TOV'] * br_weight['TOV'] +
    qualified_per['FGM'] * br_weight['FGM'] +  qualified_per['FTM'] * br_weight['FTM'])

### Game Score Version of Per

In [335]:
qualified_per['game-score'] = (qualified_per['PTS'] + 0.4 * qualified_per['FG'] - 0.7 * 
    qualified_per['FGA'] - 0.4 * (qualified_per['FTA'] - qualified_per['FT']) + 0.7 * qualified_per['ORB'] + 
    0.3 * qualified_per['DRB'] + qualified_per['STL'] + 0.7 * qualified_per['AST'] + 0.7 * qualified_per['BLK'] - 
    0.4 * qualified_per['PF'] - qualified_per['TOV'])

### Linear Regression against Per

In [336]:
X = qualified_per[['TOV', 'AST', 'FG', 'TRB', 'PF', 'FGA']]
y = qualified_per['PER']
model = sm.OLS(y, X).fit()
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                    PER   R-squared:                       0.906
Model:                            OLS   Adj. R-squared:                  0.904
Method:                 Least Squares   F-statistic:                     536.0
Date:                Fri, 13 Oct 2017   Prob (F-statistic):          9.05e-168
Time:                        22:24:22   Log-Likelihood:                -1001.3
No. Observations:                 339   AIC:                             2015.
Df Residuals:                     333   BIC:                             2038.
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
TOV           -0.0204      0.012     -1.637      0.1

In [337]:
linreg_weights = {
    'TOV': -0.0216,
    'Threes': 0.0097,
    'AST': 0.0110,
    'FG': 0.0419,
    'FGA': -0.0131,
    'FT': 0.0307,
    'FTA': -0.0195,
    'TRB': 0.0069,
    'ORB': 0.0097,
    'DRB': -0.0029,
    'BLK': 0.0004,
    'PF': 0.0468,
    'STL': 0.0087
}
qualified_per['linreg_per'] = (qualified_per['FG'] * linreg_weights['FG'] + 
                                qualified_per['STL'] * linreg_weights['STL'] + 
                                qualified_per['Threes'] * linreg_weights['Threes'] + 
                                qualified_per['FT'] * linreg_weights['FT'] +
                                qualified_per['BLK'] * linreg_weights['BLK'] + 
                                qualified_per['ORB'] * linreg_weights['ORB'] +
                                qualified_per['AST'] * linreg_weights['AST'] + 
                                qualified_per['DRB'] * linreg_weights['DRB'] +
                                qualified_per['PF'] * linreg_weights['PF'] +  
                                qualified_per['TOV'] * linreg_weights['TOV'] +
                                qualified_per['FTA'] * linreg_weights['FTA'] +
                                qualified_per['FGA'] * linreg_weights['FGA'] +
                                qualified_per['TRB'] * linreg_weights['TRB']) 

## Version Rankings
Below is how each model would have ranked the players during the 2016 season

In [338]:
game_score = qualified_per.sort(['game-score'], ascending=False).head(150)['Player'].tolist()
true_per = qualified_per.sort(['PER'], ascending=False).head(150)['Player'].tolist()
br_per = qualified_per.sort(['br_per'], ascending=False).head(150)['Player'].tolist()
linreg_per = qualified_per.sort(['linreg_per'], ascending=False).head(150)['Player'].tolist()
lastyear = qualified_per.sort(['lastyear'], ascending=False).head(150)['Player'].tolist()
version_rankings = pd.DataFrame(
    {'true_per': true_per,
     'br_per': br_per,
     'game_score': game_score,
     'linreg_per': linreg_per,
     'lastyear': lastyear
     })
display(version_rankings)

Unnamed: 0,br_per,game_score,lastyear,linreg_per,true_per
0,Russell Westbrook,Russell Westbrook,Russell Westbrook,Karl-Anthony Towns,Russell Westbrook
1,James Harden,James Harden,James Harden,Russell Westbrook,Kevin Durant
2,John Wall,Karl-Anthony Towns,LeBron James,James Harden,Anthony Davis
3,LeBron James,Anthony Davis,John Wall,Giannis Antetokounmpo,Kawhi Leonard
4,Anthony Davis,LeBron James,Stephen Curry,Anthony Davis,James Harden
5,Karl-Anthony Towns,Giannis Antetokounmpo,Giannis Antetokounmpo,Stephen Curry,LeBron James
6,Stephen Curry,Isaiah Thomas,Karl-Anthony Towns,Nikola Jokic,Isaiah Thomas
7,Isaiah Thomas,Stephen Curry,Anthony Davis,Rudy Gobert,Nikola Jokic
8,Giannis Antetokounmpo,Jimmy Butler,Isaiah Thomas,Isaiah Thomas,Chris Paul
9,Damian Lillard,John Wall,Damian Lillard,LeBron James,Giannis Antetokounmpo


## Ranking
In order to rank how good each model was at predicting, I looked at how accurately it ranked the top 150 players compared to PER.  I looked at rows 5 below and 5 above to determine a match.  That means, if the model ranked Rajon Rondo as the 64th best player in the league, I deemed it accurate if PER had Rondo somewhere in the 59 to 69 range. 

In [339]:
version_rankings['per_minS1'] = version_rankings['true_per'].shift(-1)
version_rankings['per_s1'] = version_rankings['true_per'].shift(1)
version_rankings['per_minS2'] = version_rankings['true_per'].shift(-2)
version_rankings['per_s2'] = version_rankings['true_per'].shift(2)
version_rankings['per_minS3'] = version_rankings['true_per'].shift(-3)
version_rankings['per_s3'] = version_rankings['true_per'].shift(3)
version_rankings['per_mins4'] = version_rankings['true_per'].shift(-4)
version_rankings['per_s4'] = version_rankings['true_per'].shift(4)
version_rankings['per_mins5'] = version_rankings['true_per'].shift(-5)
version_rankings['per_s5'] = version_rankings['true_per'].shift(5)
print('BR Per Matches:', ((version_rankings['br_per']==version_rankings['true_per']) |(version_rankings['br_per']==version_rankings['per_minS1']) | (version_rankings['br_per']==version_rankings['per_s1']) | (version_rankings['br_per']==version_rankings['per_minS2']) |(version_rankings['br_per']==version_rankings['per_s2']) |(version_rankings['br_per']==version_rankings['per_minS3']) |(version_rankings['br_per']==version_rankings['per_s3']) |(version_rankings['br_per']==version_rankings['per_mins4']) |(version_rankings['br_per']==version_rankings['per_s4']) |(version_rankings['br_per']==version_rankings['per_mins5']) |(version_rankings['br_per']==version_rankings['per_s5'])).sum())
print('Linreg Matches:', ((version_rankings['linreg_per']==version_rankings['true_per']) |(version_rankings['linreg_per']==version_rankings['per_minS1']) | (version_rankings['linreg_per']==version_rankings['per_s1']) | (version_rankings['linreg_per']==version_rankings['per_minS2']) |(version_rankings['linreg_per']==version_rankings['per_s2']) |(version_rankings['linreg_per']==version_rankings['per_minS3']) |(version_rankings['linreg_per']==version_rankings['per_s3']) |(version_rankings['linreg_per']==version_rankings['per_mins4']) |(version_rankings['linreg_per']==version_rankings['per_s4']) |(version_rankings['linreg_per']==version_rankings['per_mins5']) |(version_rankings['linreg_per']==version_rankings['per_s5'])).sum())
print('Game Score Matches:', ((version_rankings['game_score']==version_rankings['true_per']) |(version_rankings['game_score']==version_rankings['per_minS1']) | (version_rankings['game_score']==version_rankings['per_s1']) | (version_rankings['game_score']==version_rankings['per_minS2']) |(version_rankings['game_score']==version_rankings['per_s2']) |(version_rankings['game_score']==version_rankings['per_minS3']) |(version_rankings['game_score']==version_rankings['per_s3']) |(version_rankings['game_score']==version_rankings['per_mins4']) |(version_rankings['game_score']==version_rankings['per_s4']) |(version_rankings['game_score']==version_rankings['per_mins5']) |(version_rankings['game_score']==version_rankings['per_s5'])).sum())
print('Last year Matches:', ((version_rankings['lastyear']==version_rankings['true_per']) |(version_rankings['lastyear']==version_rankings['per_minS1']) | (version_rankings['lastyear']==version_rankings['per_s1']) | (version_rankings['lastyear']==version_rankings['per_minS2']) |(version_rankings['lastyear']==version_rankings['per_s2']) |(version_rankings['lastyear']==version_rankings['per_minS3']) |(version_rankings['lastyear']==version_rankings['per_s3']) |(version_rankings['lastyear']==version_rankings['per_mins4']) |(version_rankings['lastyear']==version_rankings['per_s4']) |(version_rankings['lastyear']==version_rankings['per_mins5']) |(version_rankings['lastyear']==version_rankings['per_s5'])).sum())

BR Per Matches: 16
Linreg Matches: 15
Game Score Matches: 22
Last year Matches: 19
