# Sportsball 2018 - Visualizing a Fantasy Football Season
## by John Larson

## Preliminary Wrangling

> My fantasy football league is managed through ESPN. There are a limited number of features on the league page for gaining insights into player performance. This can make it difficult for league members to manage their teams effectively. The goal of this project is to create helpful visualizations to help league members understand strengths/weaknesses and where they rank in the league.
>
> Two dataframes were created through ESPN's accessible fantasy football API. [Steven Morse](https://stmorse.github.io/), an instructor in the Department of Mathematics at the U.S. Military Academy, posted a couple articles containing instructions and code that were instrumental in helping me efficienctly create [seasonscores](https://stmorse.github.io/journal/espn-fantasy-python.html) and [boxscores](https://stmorse.github.io/journal/espn-fantasy-2-python.html) csv files using ESPN's API. The script that creates these files is included in this project submittal as `espn_api_to_csv.py`. Steven's articles also inspired me to make boxplots and radial charts to visualize data for my league.
>
> After creating the csvs, team names were manually changed in Excel to generic "Team 1", "Team 2", etc.

In [1]:
# import all packages and set plots to be embedded inline:
import numpy as np
import pandas as pd
from plotly.offline import plot, iplot, init_notebook_mode
import plotly.graph_objs as go
# Make plotly work with Jupyter notebook
init_notebook_mode(connected = True)

In [2]:
# Load in datasets:
boxscores = pd.read_csv('espn-api-to-csv/boxscores.csv')
seasonscores = pd.read_csv('espn-api-to-csv/seasonscores.csv')

### Dataset Structure

In [3]:
seasonscores.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 156 entries, 0 to 155
Data columns (total 5 columns):
Week     156 non-null int64
Team     156 non-null object
Id       156 non-null int64
Score    156 non-null float64
Type     156 non-null object
dtypes: float64(1), int64(2), object(2)
memory usage: 6.2+ KB


In [4]:
seasonscores.sample()

Unnamed: 0,Week,Team,Id,Score,Type
7,2,Team 10,10,106.3,Regular


In [5]:
boxscores.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2496 entries, 0 to 2495
Data columns (total 9 columns):
playerName          2496 non-null object
matchupPeriodId     2496 non-null int64
slotId              2496 non-null int64
position            2496 non-null object
bye                 2496 non-null bool
appliedStatTotal    2496 non-null float64
teamName            2496 non-null object
wonMatchup          2496 non-null bool
W/L                 2496 non-null object
dtypes: bool(2), float64(1), int64(2), object(4)
memory usage: 141.5+ KB


In [6]:
boxscores.sample()

Unnamed: 0,playerName,matchupPeriodId,slotId,position,bye,appliedStatTotal,teamName,wonMatchup,W/L
1415,T.J. Yeldon,8,23,RB,False,12.5,Team 6,True,W


### Dataset Features

> `seasonscores` is 156 rows x 5 columns. The columns contain the following information:
> 
>  - `Week` = Ranges from 1 to 13, describing the week of a given matchup.
>  - `Team` = Team name.
>  - `Id` = Unique identifier for each team. Managers can change team name throughout the season, but `Id` stays constant.
>  - `Score` = Fantasy team score for a given week.
>  - `Type` = Describing the type of matchup. These are all "Regular" for weeks 1-13. "Playoff" would be the other type of matchup that could be explored in a different project.
>
> `boxscores` is 2496 rows x 9 columns. The columns contain the following information:
> 
>  - `playerName` = Football player's name.
>  - `matchupPeriodId` = Equivalent to `Week` in `seasonscores`.
>  - `slotId` = Defines the position in a team's lineup.
>  - `position` = Football player's position.
>  - `bye` = Identifies if the football player has a "bye" on given `matchupPeriodId`.
>  - `appliedStatTotal` = Football player's fantasy score for given `matchupPeriodId`.
>  - `teamName` = Equivalent to `Team` in `seasonscores`.
>  - `wonMatchup` = Describes whether of not `teamName` won their matchup for given `matchupPeriodId`.
>  - `W/L` = Based off boolean value from `wonMatchup`.

### Features of Interest

> Some interesting variables and relationships to consider include the following:
> 
>  - Who were the top performers on any given team?
>  - How did each team do in term of scores over the course of the regular season?
>  - Were good/bad scores at the different positions indicative of wins/losses for each team?

## Univariate Exploration
Let's see who my top performers for the season were. This could be shown with a bar chart. I prefer plotly for its interactive capabilities.

In [7]:
selected_team = 'Team 4'

In [8]:
# Isolate one team:
relevant_team = boxscores[boxscores['teamName'] == selected_team]
# Eliminate bench players:
relevant_players = relevant_team[relevant_team['slotId'] != 20]
# Identify the top eight players for the season
top_performers_list = relevant_players.groupby('playerName')['appliedStatTotal']\
    .sum().sort_values(ascending = False)[0:8].index.tolist()
# Restrict relevant_players to the top performers:
relevant_player_scores = relevant_players[relevant_players['playerName'].isin(top_performers_list)]
# Top performers df:
top_performers = relevant_player_scores.groupby('playerName')\
    .sum()['appliedStatTotal'].sort_values(ascending = False)

data = [go.Bar(x = top_performers.keys(), y = top_performers.tolist())]

layout = go.Layout(
        title = 'Top Players for {}'.format(selected_team),
        yaxis = {'title':'Total Season Score'})

fig = {'data':data,'layout':layout}
iplot(fig)

James Conner lead my team to a championship this year, with 210 fantasy points over 13 weeks. Other notable contributors were my quarterback committee of Rodgers and Ryan, my top wide receiver Stefon Diggs, and a surprisingly successful George Kittle.

With a clear visualization of my top performers, I'm curious as to how my team as a whole stacked up to other teams in the league.

## Bivariate Exploration

Let's look at the distribution of regular season scores for all teams. This is a bivariate exploration of a categorical variable (`Team`) and a quantitative variable (`Score`). A boxplot will show this nicely.

In [9]:
# Reorder teams by mean so that they're plotted nicely
teamorder = np.array(seasonscores.groupby('Team')['Score'].mean().sort_values().index)

# Plot season scores
data = []
for team in teamorder:
    # 'z' is a team specific dataframe
    z = seasonscores[seasonscores['Team'] == team]
    trace = go.Box(
        y = z['Score'],
        name = team,
        text = 'Week ' + z['Week'].astype(str),
        boxpoints = 'all',
        jitter = 0.4,
        pointpos = 0)
    data.append(trace)
                   
layout = go.Layout(title = '2018 Regular Season Fantasy Scoring',
                   showlegend = False,
                   yaxis = dict(title = 'Score', range = [50,170], nticks = 10))
            
fig = {'data':data,'layout':layout}
iplot(fig)

This chart was created with plotly, where each trace was created by looping through the league's teams. Each trace was populated with scores taken from the `seasonscores` dataframe for each week.

Scores for the season were all over the map, ranging from Team 2 putting up a measly 60 points in week 8, to Team 5 posting 169 points in Week 4. There doesn't seem to be a relationship between variance of scores and magnitude of scores. In other words, a team's scoring consistency is not correlated to the team's success in term of how many points they score on average.

My team (Team 4) ranks 4th in average score. One thing I remember about my fantasy season that's easily seen in this boxplot is my three weeks in a row of scoring 121 points. Sharing this chart with leaguemates would be a helpful way for them to understand their team's scoring consistency and how they stack up amongst the competition.

Another tool that could be helpful for league managers would be looking at positional score relative to each other. This would allow managers to see positional strengths and weaknesses and average score differences between wins and losses.

## Multivariate Exploration

For comparing positional scoring, it would be easier to plot the data if the `boxscores` dataframe is manipulated into such a way that each position is a column showing average score for each matchup. 

In [10]:
# Filter out bench players ('slotId' = 20):
positional_stats = (boxscores[(boxscores['slotId'] != 20)]
 .filter(items=['teamName', 'matchupPeriodId', 'position', 'appliedStatTotal', 'wonMatchup'], axis=1)
 # group by team, matchup, and postion and take the mean positional score using .agg:
 .groupby(['teamName', 'matchupPeriodId', 'position'])
 .agg({'appliedStatTotal': 'mean'})
 # Pivot table on 'position' to create new columns:
 .unstack('position')
 .reset_index())
# Create 'Won' column by taking the min of 'wonMatchup':
positional_stats['Won'] = boxscores.groupby(['teamName', 'matchupPeriodId']).agg({'wonMatchup': 'min'}).reset_index(drop=True)
# Rearrange columns:
positional_stats.columns = ['Team', 'Matchup', 'D/ST', 'QB', 'RB', 'TE', 'WR', 'Won']
# Round floats:
positional_stats = positional_stats.round(2)

In [11]:
positional_stats.sample()

Unnamed: 0,Team,Matchup,D/ST,QB,RB,TE,WR,Won
101,Team 5,11,10.0,24.48,10.9,7.9,11.92,True


This will be much easier to work with. A radar chart would be a cool way to visualize the relationship between average positional scoring. I will create one for my team in weeks where I won my matchup.

In [12]:
selected_team = 'Team 4'

In [13]:
# Box scores for selected team:
df_team = positional_stats[positional_stats['Team'] == selected_team]
# Box scores for selected team in wins:
df_team_win = df_team[df_team['Won'] == True]
# List of positions. 'D/ST' is listed twice to connect the radial trace:
positions = ['D/ST', 'QB', 'RB', 'WR', 'TE', 'D/ST']
# Loop through matchups:
data_win = []
for m in df_team_win['Matchup']:
    trace1 = go.Scatterpolar(
        # Loop through positions:
        r = [df_team_win[df_team_win['Matchup'] == m][p].item() for p in positions],
        theta = positions,
        fill = 'toself',
        opacity = 0.4,
        mode = 'lines+markers',
        line = dict(width = 0.5, color = 'rgb(131, 90, 241)'),
        marker = dict(size = 1),
        name = 'Week {}'.format(m))
    data_win.append(trace1)

# Create trace for average positional scores:
trace2 = go.Scatterpolar(
    r = [df_team_win[p].mean() for p in positions],
    theta = positions,
    name = 'Season avg.',
    opacity = 1,
    line = dict(width = 2,color = 'black'))
data_win.append(trace2)

layout = go.Layout(title = 'Average Positional Scoring in Wins for {}'.format(selected_team),
                   hovermode = 'closest',
                   polar = dict(radialaxis = dict(visible = True, range = [0,df_team[positions].max().max()])),
                   showlegend = False)
    
fig = {'data':data_win,'layout':layout}
iplot(fig)

Each purple trace shows an individual week. The black line shows the average positional score in wins for my team. On average, in matchups that I won, my QB scored 20.1, RBs averaged 16.7, WRs averaged 12.8, TE scored 10.2, and D/ST scored 7.8.

Let's see how this compares to a similar radial chart for losses.

In [14]:
# Box scores for selected team in losses:
df_team_loss = df_team[df_team['Won'] == False]
# List of positions. 'D/ST' is listed twice to connect the radial trace:
positions = ['D/ST', 'QB', 'RB', 'WR', 'TE', 'D/ST']
# Loop through matchups:
data_loss = []
for m in df_team_loss['Matchup']:
    trace1 = go.Scatterpolar(
        # Loop through positions:
        r = [df_team_loss[df_team_loss['Matchup'] == m][p].item() for p in positions],
        theta = positions,
        fill = 'toself',
        opacity = 0.4,
        mode = 'lines+markers',
        line = dict(width = 0.5, color = 'rgb(131, 90, 241)'),
        marker = dict(size = 1),
        name = 'Week {}'.format(m))
    data_loss.append(trace1)

# Create trace for average positional scores:
trace2 = go.Scatterpolar(
    r = [df_team_loss[p].mean() for p in positions],
    theta = positions,
    name = 'Season avg.',
    opacity = 1,
    line = dict(width = 2,color = 'black'))
data_loss.append(trace2)

layout = go.Layout(title = 'Average Positional Scoring in Losses for {}'.format(selected_team),
                   hovermode = 'closest',
                   polar = dict(radialaxis = dict(visible = True, range = [0,df_team[positions].max().max()])),
                   showlegend = False)
    
fig = {'data':data_loss,'layout':layout}
iplot(fig)

There's a smaller sample size than the wins radial chart, which makes sense considering I only lost three matchups this season. On average, in matchups that I lost, my QB scored 20.8, RBs averaged 10.5, WRs averaged 9.7, TE scored 14.2, and D/ST scored 3.3. The biggest difference in positional scoring between wins and losses is at the RB position (6.2). This means I probably lost matchups due mostly to lackluster RB performance. Even though QBs generally score the most on fantasy teams, these visualizations are evidence that solid performances out of other postions is actually more vital to winning matchups.