# "Learn to Code with Fantasy Football"
## Hands On Exercises from the Text

Text found here: https://fantasycoding.com/
Exercises completed by Calvin Miller

In [85]:
# import packages
import pandas as pd
from os import path

# set data directory
DATA_DIR = 'C:\\Users\\calvi\\Projects\\LTCWFF\\ltcwff-files\\ltcwff-files-main\\data'

### 1. Introduction

Goal of text is to demonstrate how to use data analysis to consistent generate insights relating to fantasy football

#### What is Data?

Data as collection of structured information

Tabular data as having rows (observations) and columns (variables)

In [90]:
# Load Play-by-Play Data
pbp_df = pd.read_csv(path.join(DATA_DIR,'play_data_sample.csv'))

# Load Player/Game Data
pg_df = pd.read_csv(path.join(DATA_DIR,'player_game_2017_sample.csv'))

# Load Average Draft Position (ADP) Data
adp_df = pd.read_csv(path.join(DATA_DIR,'adp_2017.csv'))

#### What is Analysis?

Data analysis as process of transforming raw data into insights (think of a funnel)

Types of data analysis:
1) estimation of summary statistics (measures of central tendency, measures of spread, etc.)

2) creation of models to understand how variables relate to each other

Models have use in both making sense of historical data and making predictions about future data.

Model training/fitting: use of observed data to estimate relationship between predictive variables and response variable

Model testing: evaluation of model output (e.g. predictions) against actual outcomes

#### Data Analysis Process

1) Data Collection: web-scraping, public datasets, manual entry, remote sensors, etc.

2) Data Storage: in CSV, database, etc.

3) Load Data: load relevant files (or rows/columns of tables) into analysis tool of choice

4) Data Manipulation: cleaning data, efforts to make sense of outliers or missing data, etc.

5) Analyzing Data: use raw data to create model, find relationships, make predictions or recommendations, etc.

In [10]:
# Exercises

## 1.1. Granularity of data sets
    # a) game/quarter/team
    # b) game/player
    # c) player/season
    # d) player/week
    # e) position

## 1.2. Summary Stats
    # "Explosive" players may be identified by reviewing the 
    # distribution of performance on plays - specifically looking 
    # for longer/heavier tails in positive direction

## 1.3. Predict game total score with weather data alone
    # a) Inputs: wind speed, temp
    # b) Output: total score of game
    # c) Granularity: single game
    # d) Limitations: does not include info about the games/teams
    
## 1.4. Where in pipeline?
    # a) get data to right granularity -> data manipulation
    # b) experiment w/models -> data analysis
    # c) dealing with missing data -> data manipulation
    # d) SQL -> store/load data
    # e) scrape website -> collect data
    # f) plot data -> data analysis
    # g) get data from API -> data collection
    # h) pandas -> load data / manipulate data
    # i) take mean of variables -> data analysis
    # j) combine data sources -> load data / data manipulation

### 2. Python

Python programming commonly uses 3P code from libraries/packages

Some common libraries: Pandas - tabular data manipulation; BeautifulSoup - scrape data from websites; scikit-learn - machine learning; statsmodels - statisical models

In [47]:
# Quick tour of standard Python library

# comments behind the "#"
# comments good for explaining complex functions, loops, conditionals

print(1+1) # no commet really needed for simple commands

variable_in_snake_case = 4 # variable assignment

print(variable_in_snake_case*2)

# variable update
variable_in_snake_case = variable_in_snake_case * 100
print(variable_in_snake_case)

# common data types
var_int = 48
var_float = 22.4
var_string = 'Words'
var_string_alt = "More Words"
var_bool = True
var_bool_alt = 2 < 3

print(type(var_string))

var_fstring = f'{var_string}, {var_string_alt}, together'
print(var_fstring)

# string methods
print(var_string.upper())

# comparison operators
test_true = 1 < 2
test_true = 5 > 4
test_true = 3 <= 4
test_true = 4 >= 3
test_true = 1 == 1
test_true = 1 != 10

# conditionals
if var_bool:
    message = "Test 1 was True"
elif var_bool_alt:
    message = "Test 1 was False and Test 2 was True"
else:
    message = "Neither test was True"
    
print(message)

# containers
var_list = ['a1','b2','c3']
print(var_list[0]) # Python uses 0-indexing

var_dict = {'qb': 'mac jones', # note these use keys: values
           'rb': 'najee harris',
           'wr': 'devonta smith'}
print(var_dict['wr']) # you can access values by referring to keys
var_dict['cb'] = 'patrick surtain ii' # add to dictionary

# unpacking
a, b, c = var_list # require equal length on both sides
print(a)

# loops
i = 0
for entry in var_list:
    print(var_list[i])
    i += 1

for x, y in var_dict.items():
    print(f'pos: {x}')
    print(f'name: {y}')

# list comprehensions
var_list_alt = ['bill smith', 'tom green', 'john johnson']

var_list_alt_proper = [x.title() for x in var_list_alt]
    # list comprehension has form [a for b in c]
    # where c is the list you're iterating over
    # where a is the function/method being applied
    # where b is the variable to refer to elements of c
print(var_list_alt_proper)

var_list_alt_upper = [x.upper() for x in var_list_alt]
print(var_list_alt_upper)

var_list_alt_first = [x.split(' ')[0] for x in var_list_alt]
    # comprehension is "mapping" a method/function to an item
print(var_list_alt_first)

var_list_alt_filter = [ 
    x for x in var_list_alt if x.__contains__('s')]
    # can use comprehension to conditionally act
    # [a in b for c if d]
print(var_list_alt_filter)

# dict comprehensions
var_dict_comp = {
    name.split(' ')[0].upper(): pos for pos, name in var_dict.items()
}
print(var_dict_comp)

# common functions
print(len(var_list)) # number of items in a list

def fantasy_pts_rec(yds=0, rec=0, tds=0, ppr=1):
    """
    three quotes used to allow multi-line comments/strings
    
    generally try to avoid function side effects:
        only use variables internal to function
        limit output to return value when possible
        
    function takes yards, receptions, and touchdowns and
        returns fantasy points scored
    
    note inputs all take zero value by default
    """
    return yds*0.1 + rec*ppr + tds*6

print(fantasy_pts_rec(127, 5, 2))
# unspecified inputs assumed to be in same order as function def
# for this reason, convention os to place important keywords first

print(fantasy_pts_rec(127, 5, ppr=2))

# libraries are sets of functions and types/classes

2
8
400
<class 'str'>
Words, More Words, together
WORDS
Test 1 was True
a1
devonta smith
a1
a1
b2
c3
pos: qb
name: mac jones
pos: rb
name: najee harris
pos: wr
name: devonta smith
pos: cb
name: patrick surtain ii
['Bill Smith', 'Tom Green', 'John Johnson']
['BILL SMITH', 'TOM GREEN', 'JOHN JOHNSON']
['bill', 'tom', 'john']
['bill smith', 'john johnson']
{'MAC': 'qb', 'NAJEE': 'rb', 'DEVONTA': 'wr', 'PATRICK': 'cb'}
3
29.700000000000003
22.700000000000003


#### How to Figure Things Out in Python

Official Python documentation: 
https://docs.python.org/3/index.html

Python quick reference: 
https://www.cs.put.poznan.pl/csobaniec/software/python/py-qrc.html

Google "Python {issue / target command / etc.}"

StackOverflow

In [81]:
# Exercises

## 2.1. valid Python variable names
    # do not start with number
    # not strings
    # only allowed non-alphanumeric character is "_"
    # "_" can be at beginning
    # convention is snake_case not camelCase

## 2.2. arithmetic
weekly_points = 100
weekly_points += 28
weekly_points += 5

print(f'weekly_points={weekly_points}')

## 2.3. function def
def for_the_td(player1, player2):
    """
    proclaims that player1 went to player 2 for the TD
    """
    return f'{player1} to {player2} for the TD!'

print(for_the_td('Dak','Zeke'))

## 2.4. method
    # method .islower() evaluates whether string all lower case
    # returns Boolean

test_str = 'test'
print(test_str.islower())
test_str = 'Test'
print(test_str.islower())

## 2.5. function def
def is_leveon(player):
    """
    evaluates whether player name is "Le'Veon Bell"
    with or without the '
    """
    return player.replace("'","").upper() == "LEVEON BELL"

lbell_without = "LeVeon Bell"
lbell_with = "Le'Veon Bell"
lbell_not = "Leveone Bell"
print(is_leveon(lbell_without))
print(is_leveon(lbell_with))
print(is_leveon(lbell_not))

## 2.6. function with conditional
def commentary(score):
    """
    provides comment on whether score good/bad with limit 100
    """
    if score >= 100:
        comment = f'{score} is a good score'
    else:
        comment = f"{score}'s not that good"
    return comment
        
print(commentary(77))
print(commentary(127))

## 2.7. three ways to print list without last entry
example_list = ['dave', 'steve', 'rick', 'dan']
print(example_list[:3])
print([name for name in example_list if name != 'dan'])
print(example_list[:-1])

## 2.8. dictionary element changes 
simple_dict = {'n_teams': 12, 'ppr': True}

simple_dict['n_teams'] = 10 # update single position
print(simple_dict)

def toggle_ppr(league):
    """
    function to switch boolean key values for 'ppr' in league
    """
    league['ppr'] = not league['ppr']

toggle_ppr(simple_dict)
print(simple_dict)
toggle_ppr(simple_dict)
print(simple_dict)

## 2.9 dict question
    # dict needs key's value specified when adding key
    # dict keys are strings, not unassigned variables
    # dict can't output values for keys that don't exist
    
## 2.10 list question
roster_list = ['tom brady', 'adrian peterson', 'antonio brown']

for dude in roster_list: # print last names
    print(dude.split(' ')[1])

name_dict = {name: len(name) for name in roster_list} 
    # create dictionary with comprehension
print(name_dict)

## 2.11. comprehensions
my_roster_dict= {'qb':'tom brady',
                 'rb1':'adrian peterson',
                 'wr1':'davante adams',
                 'wr2':'john brown'}

my_roster_positions = [pos for pos, name in my_roster_dict.items()]
print(my_roster_positions)

my_roster_names_ab = [name 
                       for pos, name in my_roster_dict.items() 
                       if name.split(' ')[1].startswith('a') 
                       or name.split(' ')[1].startswith('b')]
print(my_roster_names_ab)

## 2.12. functions
def mapper(list_name, function_name):
    """
    applies function function_name to each element in
    list list_name
    """
    return[function_name(i) for i in list_name]

def rushing_pts_calc(yards):
    return 0.1*yards

rush_yds = [1, 10, 20, 100, 123, 3, 0, 0]
rush_pts = mapper(rush_yds, rushing_pts_calc)
print(rush_pts)



weekly_points=133
Dak to Zeke for the TD!
True
False
True
True
False
77's not that good
127 is a good score
['dave', 'steve', 'rick']
['dave', 'steve', 'rick']
['dave', 'steve', 'rick']
{'n_teams': 10, 'ppr': True}
{'n_teams': 10, 'ppr': False}
{'n_teams': 10, 'ppr': True}
brady
peterson
brown
{'tom brady': 9, 'adrian peterson': 15, 'antonio brown': 13}
['qb', 'rb1', 'wr1', 'wr2']
['tom brady', 'davante adams', 'john brown']
[0.1, 1.0, 2.0, 10.0, 12.3, 0.30000000000000004, 0.0, 0.0]


### 3. Pandas

#### Part 1: Intro to Pandas

Pandas is an external library for working with data, particularly for data manipulation (joining, filtering, modifying, etc.)

##### Types and Functions

DataFrame: holds a single data table

Series: single column of a DataFrame

Functions include: reading from CSV, writing to CSV, printing table header, selecting subsets of columns, selecting subsets of rows (filtering), modifying/adding columns, changing level of granularity, merging/joining tables

In [137]:
# Basic Functionality of Pandas

# Load Play-by-Play Data
pbp_df = pd.read_csv(path.join(DATA_DIR,
                               'play_data_sample.csv'))
pg_df = pd.read_csv(path.join(DATA_DIR,
                              'player_game_2017_sample.csv'))
adp_df = pd.read_csv(path.join(DATA_DIR,
                               'adp_2017.csv'))

print(type(adp_df))


# DataFrame Methods and Attributes
adp_df.head(3) # method takes argument; head default is 5 rows

print(adp_df.columns) # attribute doesn't need extra input

print(adp_df.shape) # attribute shape = (rows, columns)


# Working with Subsets of columns
adp_df['name'].head()
print(type(adp_df['name'])) # single column is Series by default
print(type(adp_df['name'].to_frame())) # method changes to DF

adp_df[['name','position','adp']].head() # supplying list of columns
print(type(adp_df[['name','position','adp']])) # >1 columns is DF


# Indexing = assign ID to rows, default as 0:n-1
adp_df.set_index('player_id').head() # can reindex to a column
adp_df.set_index('player_id', inplace=True) # 1 way to overwrite
adp_df = adp_df.reset_index()
adp_df = adp_df.set_index('player_id') # other way to overwrite


# Use of Indices
adp_df_rbs = adp.loc[adp['position'] == 'RB',['name','team']]
adp_df_rbs.head() # used .loc to get columns/rows meeting condition

adp_df_rbs['times_drafted'] = adp_df['times_drafted']
adp_df_rbs.head() # add col to new df using index match in old df


# Outputting Data
adp_df_rbs.to_csv(path.join(DATA_DIR, 'adp_rb.csv'), # write CSV
                 index=False) # exclude index from file


# Exercises
## 3.0.1 load adp data
adp_df = pd.read_csv(path.join(DATA_DIR,
                               'adp_2017.csv'))

## 3.0.2 report top 50 players by adp
adp_df_50 = adp_df.nlargest(50, columns=['adp'])
print(adp_df_50.shape)

## 3.0.3 sort adp data by name desc
adp_df.sort_values('name', ascending = False)

## 3.0.4 what is type of adp_df.sort_values('adp')
print(type(adp_df.sort_values('adp'))) # sorted df is still df

## 3.0.5 create new df
adp_df_simple = adp_df[['name','position','adp']]
print(adp_df_simple.head(2))
adp_df_simple = adp_df_simple[['position','name','adp']]
    # alternative:
    # adp_df_simple.reindex(columns=['position','name','adp'])
print(adp_df_simple.head(2))
adp_df_simple['team'] = adp_df['team']
print(adp_df_simple.head(2))
adp_df_simple.to_csv(path.join(DATA_DIR, 'adp_simple.txt'),
                     sep='|') # to_csv can write .txt

<class 'pandas.core.frame.DataFrame'>
Index(['adp', 'adp_formatted', 'bye', 'high', 'low', 'name', 'player_id',
       'position', 'stdev', 'team', 'times_drafted'],
      dtype='object')
(184, 11)
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
(50, 11)
<class 'pandas.core.frame.DataFrame'>
            name position  adp
0  David Johnson       RB  1.3
1    LeVeon Bell       RB  2.3
  position           name  adp
0       RB  David Johnson  1.3
1       RB    LeVeon Bell  2.3
  position           name  adp team
0       RB  David Johnson  1.3  ARI
1       RB    LeVeon Bell  2.3  PIT


#### Part 2: Things You Can Do With DataFrames

Broadly, capabilities fall into categories:

1) Change or create columns

2) Calculate statistics of DataFrames/Series

3) Filter observations (subset rows)

4) Change granularity of data

5) Merge (or Join) and Concatenate DataFrames with pd.merge() and pd.concat()