# Project 01 - Write a Data Science Blog Post

## Part 2: Data Analysis

### Data
 - NBA 2018-2019 Player Box Scores 
 - NBA 2018-2019 Daily Fantasy Scores (DFS)
 
### Business Questions
1. What are the key drivers for top fantasy scores?
2. What effect, if any, does seasonality play during the NBA season? 
3. Which positions are the most valuable from a fantasy score perspective? 

### Import Packages

In [1]:
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
os.chdir('../lib')

In [3]:
from helpers import min_games_filter

### Import Data

In [4]:
os.chdir('../data')

In [5]:
df = pd.read_csv("Cleaned_NBA1819_PlayerStats-DFS.csv")

# Dropping unncessary columns
del_cols = ['Unnamed: 0', 'MIN']

df.drop(del_cols, axis=1, inplace=True)
df.head()

Unnamed: 0,DATASET,GAME-ID,DATE,PLAYER-ID,PLAYER,OWNTEAM,OPPONENTTEAM,STARTER (Y/N),VENUE (R/H),MINUTES,...,A,PF,ST,TO,BL,PTS,POSITION,DRAFTKINGS_CLASSIC_SALARY,FANDUEL_FULLROSTER_SALARY,YAHOO_FULLSLATE_SALARY
0,NBA 2018-2019 Regular Season,21800001,2018-10-16,203967,Dario Saric,Philadelphia,Boston,Y,R,22.9,...,1,5,0,3,0,6,PF,5500.0,6400.0,25.0
1,NBA 2018-2019 Regular Season,21800001,2018-10-16,203496,Robert Covington,Philadelphia,Boston,Y,R,34.22,...,0,1,2,2,1,8,SF,4700.0,6500.0,23.0
2,NBA 2018-2019 Regular Season,21800001,2018-10-16,203954,Joel Embiid,Philadelphia,Boston,Y,R,36.82,...,2,3,1,5,2,23,C,8800.0,10400.0,41.0
3,NBA 2018-2019 Regular Season,21800001,2018-10-16,1628365,Markelle Fultz,Philadelphia,Boston,Y,R,24.33,...,2,1,1,3,0,5,PG,5000.0,5700.0,16.0
4,NBA 2018-2019 Regular Season,21800001,2018-10-16,1627732,Ben Simmons,Philadelphia,Boston,Y,R,42.73,...,8,5,4,3,2,19,PG,8400.0,10000.0,46.0


In [6]:
df.shape

(27855, 34)

In [7]:
# Add additional date columns for analysis
df['DATE'] = pd.to_datetime(df['DATE'])
df['DATE_Month'] = pd.DatetimeIndex(df['DATE']).month_name()

# Re-order the Months based on the NBA season
df['DATE_Month'] = df['DATE_Month'].replace({'October':'NBA-01_October',
                                             'November':'NBA-02_November',
                                             'December':'NBA-03_December',
                                             'January':'NBA-04_January',
                                             'February':'NBA-05_February',
                                             'March':'NBA-06_March',
                                             'April':'NBA-07_April',
                                             'May':'NBA-08_May',
                                             'June':'NBA-09_June'})

In [8]:
df.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27855 entries, 0 to 27854
Data columns (total 35 columns):
DATASET                      27855 non-null object
GAME-ID                      27855 non-null int64
DATE                         27855 non-null datetime64[ns]
PLAYER-ID                    27855 non-null int64
PLAYER                       27855 non-null object
OWNTEAM                      27855 non-null object
OPPONENTTEAM                 27855 non-null object
STARTER (Y/N)                27855 non-null object
VENUE (R/H)                  27855 non-null object
MINUTES                      27855 non-null float64
USAGE RATE                   27855 non-null float64
DAYSREST                     27855 non-null int64
DRAFTKINGS_FANTASYPOINTS     27855 non-null float64
FANDUEL_FANTASYPOINTS        27855 non-null float64
YAHOO_FANTASYPOINTS          27855 non-null float64
FG                           27855 non-null int64
FGA                          27855 non-null int64
3P              

### Exploratory Data Analysis

In [9]:
test_df = min_games_filter(df, games_played=40)
#test_df.head()

In [15]:
starter_mask = (test_df['STARTER (Y/N)'] == 'Y')

test_df[starter_mask].head()

Unnamed: 0,DATASET,GAME-ID,DATE,PLAYER-ID,PLAYER,OWNTEAM,OPPONENTTEAM,STARTER (Y/N),VENUE (R/H),MINUTES,...,PF,ST,TO,BL,PTS,POSITION,DRAFTKINGS_CLASSIC_SALARY,FANDUEL_FULLROSTER_SALARY,YAHOO_FULLSLATE_SALARY,DATE_Month
0,NBA 2018-2019 Regular Season,21800001,2018-10-16,203967,Dario Saric,Philadelphia,Boston,Y,R,22.9,...,5,0,3,0,6,PF,5500.0,6400.0,25.0,NBA-01_October
2,NBA 2018-2019 Regular Season,21800001,2018-10-16,203954,Joel Embiid,Philadelphia,Boston,Y,R,36.82,...,3,1,5,2,23,C,8800.0,10400.0,41.0,NBA-01_October
4,NBA 2018-2019 Regular Season,21800001,2018-10-16,1627732,Ben Simmons,Philadelphia,Boston,Y,R,42.73,...,5,4,3,2,19,PG,8400.0,10000.0,46.0,NBA-01_October
11,NBA 2018-2019 Regular Season,21800001,2018-10-16,1628369,Jayson Tatum,Boston,Philadelphia,Y,H,28.93,...,2,1,1,0,23,PF,5600.0,6200.0,23.0,NBA-01_October
12,NBA 2018-2019 Regular Season,21800001,2018-10-16,202330,Gordon Hayward,Boston,Philadelphia,Y,H,24.62,...,1,4,0,0,10,SF,6500.0,7400.0,21.0,NBA-01_October


In [10]:
test_df.shape

(24758, 35)

In [17]:
pd.pivot_table(test_df[starter_mask], index=['POSITION'],
                                      values=['YAHOO_FANTASYPOINTS'], 
                                      columns=['DATE_Month'], 
                                      aggfunc='mean') #.plot(kind='line')

Unnamed: 0_level_0,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS
DATE_Month,NBA-01_October,NBA-02_November,NBA-03_December,NBA-04_January,NBA-05_February,NBA-06_March,NBA-07_April,NBA-08_May,NBA-09_June
POSITION,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
C,34.238756,34.012264,33.974091,34.200237,33.405128,33.258004,31.646494,32.356,23.58
C-F,,,,,,,48.5,,20.6
F,,,,,,,37.35,39.725,42.21875
F-G,,,,,,,,24.7,26.94
G,,,,,,,27.727273,31.133333,39.466667
G-F,,,,,,,,12.666667,15.56
PF,28.9375,27.72209,28.858392,28.444522,28.956954,27.569249,29.036019,34.286441,
PG,31.073545,31.085789,30.602247,31.541593,33.703896,33.273964,33.195437,36.071154,
SF,26.539024,25.775592,27.706965,26.736253,28.940468,25.979665,27.315517,33.006818,
SG,27.283258,27.549543,27.278935,27.746756,28.318447,28.745773,25.343256,30.551786,


In [24]:
df.groupby(['PLAYER', 'OWNTEAM'])['YAHOO_FANTASYPOINTS'].mean().sort_values(ascending=False).head(50)

PLAYER                 OWNTEAM      
James Harden           Houston          58.180899
Anthony Davis          New Orleans      56.114286
Giannis Antetokounmpo  Milwaukee        55.766667
Russell Westbrook      Oklahoma City    54.716667
Joel Embiid            Philadelphia     52.129333
LeBron James           LA Lakers        52.036364
Paul George            Oklahoma City    49.039024
Karl-Anthony Towns     Minnesota        48.692208
Nikola Jokic           Denver           48.419149
Kawhi Leonard          Toronto          46.689286
Andre Drummond         Detroit          46.003614
Kevin Durant           Golden State     45.341111
Nikola Vucevic         Orlando          44.270588
Stephen Curry          Golden State     44.062637
Damian Lillard         Portland         43.842708
Bradley Beal           Washington       43.624390
Kyrie Irving           Boston           43.280263
Jrue Holiday           New Orleans      42.937313
Jimmy Butler           Minnesota        42.790000
Kemba Walker 