# Project 01 - Write a Data Science Blog Post

## Part 2: Data Analysis

### Data
 - NBA 2018-2019 Player Box Scores 
 - NBA 2018-2019 Daily Fantasy Scores (DFS)
 
### Business Questions
1. What are the key drivers for top fantasy scores?
2. What effect, if any, does seasonality play during the NBA season? 
3. Which positions are the most valuable from a fantasy score perspective? 

### Import Packages

In [1]:
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Use 4 decimal places in output display
pd.set_option("display.precision", 2)

In [2]:
os.chdir('../lib')

In [3]:
from helpers import min_games_filter

### Import Data

In [4]:
os.chdir('../data')

In [5]:
df = pd.read_csv("Cleaned_NBA1819_PlayerStats-DFS.csv")

# Dropping unncessary columns
del_cols = ['Unnamed: 0', 'MIN']

df.drop(del_cols, axis=1, inplace=True)
df.head()

Unnamed: 0,DATASET,GAME-ID,DATE,PLAYER-ID,PLAYER,OWNTEAM,OPPONENTTEAM,STARTER (Y/N),VENUE (R/H),MINUTES,...,A,PF,ST,TO,BL,PTS,POSITION,DRAFTKINGS_CLASSIC_SALARY,FANDUEL_FULLROSTER_SALARY,YAHOO_FULLSLATE_SALARY
0,NBA 2018-2019 Regular Season,21800001,2018-10-16,203967,Dario Saric,Philadelphia,Boston,Y,R,22.9,...,1,5,0,3,0,6,PF,5500.0,6400.0,25.0
1,NBA 2018-2019 Regular Season,21800001,2018-10-16,203496,Robert Covington,Philadelphia,Boston,Y,R,34.22,...,0,1,2,2,1,8,SF,4700.0,6500.0,23.0
2,NBA 2018-2019 Regular Season,21800001,2018-10-16,203954,Joel Embiid,Philadelphia,Boston,Y,R,36.82,...,2,3,1,5,2,23,C,8800.0,10400.0,41.0
3,NBA 2018-2019 Regular Season,21800001,2018-10-16,1628365,Markelle Fultz,Philadelphia,Boston,Y,R,24.33,...,2,1,1,3,0,5,PG,5000.0,5700.0,16.0
4,NBA 2018-2019 Regular Season,21800001,2018-10-16,1627732,Ben Simmons,Philadelphia,Boston,Y,R,42.73,...,8,5,4,3,2,19,PG,8400.0,10000.0,46.0


In [6]:
df.shape

(27855, 34)

In [7]:
df.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27855 entries, 0 to 27854
Data columns (total 34 columns):
DATASET                      27855 non-null object
GAME-ID                      27855 non-null int64
DATE                         27855 non-null object
PLAYER-ID                    27855 non-null int64
PLAYER                       27855 non-null object
OWNTEAM                      27855 non-null object
OPPONENTTEAM                 27855 non-null object
STARTER (Y/N)                27855 non-null object
VENUE (R/H)                  27855 non-null object
MINUTES                      27855 non-null float64
USAGE RATE                   27855 non-null float64
DAYSREST                     27855 non-null int64
DRAFTKINGS_FANTASYPOINTS     27855 non-null float64
FANDUEL_FANTASYPOINTS        27855 non-null float64
YAHOO_FANTASYPOINTS          27855 non-null float64
FG                           27855 non-null int64
FGA                          27855 non-null int64
3P                      

### Data Cleaning

In [8]:
# Add additional date columns for analysis
df['DATE'] = pd.to_datetime(df['DATE'])
df['DATE_Month'] = pd.DatetimeIndex(df['DATE']).month_name()

# Re-order the Months based on the NBA season
df['DATE_Month'] = df['DATE_Month'].replace({'October':'NBA-01_October',
                                             'November':'NBA-02_November',
                                             'December':'NBA-03_December',
                                             'January':'NBA-04_January',
                                             'February':'NBA-05_February',
                                             'March':'NBA-06_March',
                                             'April':'NBA-07_April',
                                             'May':'NBA-08_May',
                                             'June':'NBA-09_June'})

In [9]:
# Institute a minimum games played to reduce noise
test_df = min_games_filter(df, games_played=40)

# Create a filter to look at just starters
starter_mask = (test_df['STARTER (Y/N)'] == 'Y')

In [10]:
test_df.shape

(24758, 35)

### Exploratory Data Analysis

In [11]:
# Seasonality for positions and their respective average fantasy score values
pd.pivot_table(test_df[starter_mask], index=['POSITION'],
                                      values=['YAHOO_FANTASYPOINTS'], 
                                      columns=['DATE_Month'], 
                                      aggfunc='mean') #.plot(kind='line')

Unnamed: 0_level_0,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS
DATE_Month,NBA-01_October,NBA-02_November,NBA-03_December,NBA-04_January,NBA-05_February,NBA-06_March,NBA-07_April,NBA-08_May,NBA-09_June
POSITION,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
C,34.24,34.01,33.97,34.2,33.41,33.26,31.65,32.36,23.58
C-F,,,,,,,48.5,,20.6
F,,,,,,,37.35,39.73,42.22
F-G,,,,,,,,24.7,26.94
G,,,,,,,27.73,31.13,39.47
G-F,,,,,,,,12.67,15.56
PF,28.94,27.72,28.86,28.44,28.96,27.57,29.04,34.29,
PG,31.07,31.09,30.6,31.54,33.7,33.27,33.2,36.07,
SF,26.54,25.78,27.71,26.74,28.94,25.98,27.32,33.01,
SG,27.28,27.55,27.28,27.75,28.32,28.75,25.34,30.55,


In [12]:
pd.pivot_table(test_df[starter_mask], index=['OWNTEAM', 'PLAYER'],
                                      values=['YAHOO_FANTASYPOINTS'],
                                      columns=['DATE_Month'],
                                      aggfunc='mean') #.sort_values(by='YAHOO_FANTASYPOINTS', ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS,YAHOO_FANTASYPOINTS
Unnamed: 0_level_1,DATE_Month,NBA-01_October,NBA-02_November,NBA-03_December,NBA-04_January,NBA-05_February,NBA-06_March,NBA-07_April,NBA-08_May,NBA-09_June
OWNTEAM,PLAYER,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Atlanta,Alex Len,24.17,19.53,33.20,,,26.97,34.60,,
Atlanta,DeAndre' Bembry,,17.95,,22.78,19.60,,,,
Atlanta,Dewayne Dedmon,,20.80,31.21,26.75,27.88,31.65,,,
Atlanta,Jeremy Lin,,,,34.30,,,,,
Atlanta,John Collins,,28.13,39.42,34.41,32.43,38.24,43.27,,
...,...,...,...,...,...,...,...,...,...,...
Washington,Otto Porter Jr.,24.91,27.05,28.84,16.80,37.00,,,,
Washington,Thomas Bryant,,15.75,22.77,24.89,24.04,34.75,32.97,,
Washington,Tomas Satoransky,,,22.06,29.75,28.77,29.36,20.98,,
Washington,Trevor Ariza,,,32.79,30.59,33.60,21.75,,,


In [13]:
df.groupby(['PLAYER', 'OWNTEAM'])['YAHOO_FANTASYPOINTS'].mean().sort_values(ascending=False).head(50)

PLAYER                 OWNTEAM      
James Harden           Houston          58.18
Anthony Davis          New Orleans      56.11
Giannis Antetokounmpo  Milwaukee        55.77
Russell Westbrook      Oklahoma City    54.72
Joel Embiid            Philadelphia     52.13
LeBron James           LA Lakers        52.04
Paul George            Oklahoma City    49.04
Karl-Anthony Towns     Minnesota        48.69
Nikola Jokic           Denver           48.42
Kawhi Leonard          Toronto          46.69
Andre Drummond         Detroit          46.00
Kevin Durant           Golden State     45.34
Nikola Vucevic         Orlando          44.27
Stephen Curry          Golden State     44.06
Damian Lillard         Portland         43.84
Bradley Beal           Washington       43.62
Kyrie Irving           Boston           43.28
Jrue Holiday           New Orleans      42.94
Jimmy Butler           Minnesota        42.79
Kemba Walker           Charlotte        42.17
John Wall              Washington       41.