# Butt Fumbles 2019 Yearly Analysis

The purpose of this notebook is to provide some simple analyses of this year's fantasy football league. The data that I use as input is weekly point totals for each team for the regular season (13 weeks). At this point, I don't consider finer grained data like position points per week, etc.

In [4]:
import pandas as pd
import numpy as np

# Import weekly scores
data_file = 'data_2019.csv'
data = pd.read_csv(data_file)
data = data.set_index('Team')
NUM_TEAMS = data.shape[0]
actual_wins = data['Actual Wins']
del data['Actual Wins']

# Import the schedule
schedule_file = 'schedule_2019.csv'
schedule = pd.read_csv(schedule_file)
schedule = schedule.set_index('Team')

# Constants used throughout
NUM_WEEKS = 13
NUM_TEAMS = 12

In [5]:
# Quality check the schedule
teams = set(schedule.index)
for week in range(1,NUM_WEEKS+1):
    assert NUM_TEAMS == len(set(schedule['Week {}'.format(week)]).intersection(teams))
    
print('Schedule seems valid')    

Schedule seems valid


In [7]:
def calc_team_wins(team, scores, sched):
    # type: (str, pd.DataFrame, pd.DataFrame) -> int
    
    '''
    This function will determine the number of wins for a team given 
    all the scores for the league, the schedule, and a team.
    
    Args: team   - a string of the team to examine
          scores - a dataframe whose rows are teams and cols are weekly scores
          sched  - a dataframe whose rows are teams and cols are weekly opponents
    '''
    
    num_wins = 0
    
    for week in range(1, NUM_WEEKS+1):
    
        # Your score
        your_score = scores['Week {}'.format(week)][team]
    
        # Opponents score
        opp_score = scores['Week {}'.format(week)][schedule['Week {}'.format(week)][team]]
        
        if your_score - opp_score > 0:
            num_wins += 1
            
    return num_wins 

# Quality check actual wins from schedule
for team in data.index:
    assert calc_team_wins(team, data, schedule) == actual_wins[team]
print('Scores match win totals')

Scores match win totals


### Basic Analysis
Here, we calculate weekly averages and standard deviations for each team.

In [8]:
data_mean = data.mean(axis=1)
data_std = data.std(axis=1)
basic_stats = pd.DataFrame({'Weekly Average':data_mean, 'Weekly Standard Dev': data_std})

##### Sorted Highest Weekly Average


Sorting by average, we see a couple things:

  * Lambeau Leapers scored a TON of points. Holy cow. Matt averaged 10 points more than any other team. That's the same difference as the 2nd most points (Zach Attack) and the 7th most points (THROW IT TO SANDERS). Insane regular season.
  * Zach Attack, who finished with a record of 6-7, scored the 2nd most points in the league, and still missed the playoffs. More on that later! I actually feel bad for Dinesh.
  * Based purely on points scored, 3 of our 4 playoff teams (Lambeau Leapers, ahh fuckin' shitdick, and Chubby Winners were in the top 4 of points. Our only playoff team not in the top 4 of points - Eat 4 Dicks Asendorf, who finished 6th.

In [11]:
basic_stats.sort_values(by=['Weekly Average'], ascending=False)

Unnamed: 0_level_0,Weekly Average,Weekly Standard Dev
Team,Unnamed: 1_level_1,Unnamed: 2_level_1
Lambeau Leapers,110.093846,18.544572
Zach Attack,100.101538,22.78379
Aah Fuckin’ Shitdick,99.287692,25.213765
Chubby Winners,98.126154,23.643363
Rudy Was Offsides,94.318462,20.865547
Eat 4 Dicks Asendorf,91.623077,16.794868
THROW IT TO SANDERS,90.992308,17.612305
Frank The Tank,86.903077,30.42321
Peter’s Team,86.892308,23.552698
DJ Purple,84.549231,14.487028


##### Sorted by Highest Weekly Standard Deviation


A team's weekly standard deviation gives some notion of how consistent they were week to week. A high standard deviation will indicate that some weeks your team went off and others your team was terrible. Here are some interesting highlights

  * Frank The Tank  wins highest standard deviation by a ton! And this might be the highest ever standard deviation. He either had the highest or lowest variance in at least 3 of the 5 Butt Fumbles seasons.
  * Yours truly, DJ Purple, had the lowest standard deviation. This surprised me, but looking back I really never had any really good weeks. Never a good sign when you're #10 in points scored and last in weekly standard deviation.
  * Lambeau Leapers finished with the 4th lowest standard deviation. Couple this with the crushing the weekly average shows you just how dominate he was. That big weekly average isn't because of a few monster weeks but because of consistant dominance.

In [13]:
basic_stats.sort_values(by=['Weekly Standard Dev'], ascending=False)

Unnamed: 0_level_0,Weekly Average,Weekly Standard Dev
Team,Unnamed: 1_level_1,Unnamed: 2_level_1
Frank The Tank,86.903077,30.42321
Aah Fuckin’ Shitdick,99.287692,25.213765
Chubby Winners,98.126154,23.643363
Peter’s Team,86.892308,23.552698
Jbone!,84.049231,22.912075
Zach Attack,100.101538,22.78379
Pubic Faith,82.056923,21.747858
Rudy Was Offsides,94.318462,20.865547
Lambeau Leapers,110.093846,18.544572
THROW IT TO SANDERS,90.992308,17.612305


### Schedule Luck Analysis


Here, I calculate how lucky each team was based on the arbitrary schedule & rivalry week matchups. This is computed on a weekly basis and asks the question "What if my opponent was different this week?" The way that our league works, you are matched up with one opponent and play head-to-head against them for that week. If you score a lot of points, but end up playing the team that scored the most, that's pretty unlucky. If you score hardly any points, but end up playing the team that scored the least points, that's pretty lucky.


To compute a luck metric, I compute the expected value of your wins for each week by comparing your weekly points to every other team's points that week. 

  * If you scored the most points, you would beat every team that you could play and your expected win value is 1. 
  * If you scored the least points, you would lose to every team that you could play and your expected win value is 0.
  * If your score would beat 4 teams, but lose to 7 teams, than your expected win value is 4/11

In [14]:
def calc_team_expected_wins(team, df):
    # type: (str, pd.DataFrame) -> float
    """
    This function will calculate the expected record of a given team
    assuming they played every other team that week.
    
    Args: team - string of the team name toze analy
    df: a DataFrame whose rows are team names and columns are weekly scores
    """
    
    # We subtract the team's scores from the entire matrix
    # positive values represent an opponent scored more (loss)
    # negative values represent that team scored more (win)
    temp = df - df.loc[team].values.squeeze()

    # True values now represent a Loss
    temp_bool = temp > 0

    # Summing represents the number of losses we'd expect that week
    # 1 minus represents expected win, and we'll divide by number of team - 1 to normalize to 1
    exp_weekly_win = 1 - temp_bool.sum(axis=0)/(NUM_TEAMS-1)

    return sum(exp_weekly_win)

In [17]:
exp_wins = []

for team in data.index:
    exp_wins.append(calc_team_expected_wins(team, data))
    
exp_wins = pd.Series(exp_wins, index=data.index)
weekly_luck = pd.DataFrame({'Actual Wins':actual_wins, 'Expected Wins': exp_wins, 'Difference': actual_wins-exp_wins})

##### Luckiest Teams


I sort this by highest difference. A positive difference represents you were lucky and won more games than expected. A negative difference represents you were unlucky and won fewer games than expected. Some notes:

  * Our luckiest team this year was Peter's Team, who won about 2 more games than expected. All that shade was warrented, but he still just missed out of the playoffs.
  * Zach Attack was our least lucky team by a whopping 3 games. I did a cursory search and I'll put together the first 5 years together to calculate luck, but I'm pretty sure this is the most unlucky someone has been.

In [18]:
weekly_luck.sort_values(by=['Difference'], ascending=False)

Unnamed: 0_level_0,Actual Wins,Expected Wins,Difference
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Peter’s Team,8,5.818182,2.181818
Eat 4 Dicks Asendorf,8,6.363636,1.636364
Chubby Winners,9,7.818182,1.181818
Jbone!,6,4.909091,1.090909
Aah Fuckin’ Shitdick,9,8.0,1.0
Rudy Was Offsides,7,6.545455,0.454545
Lambeau Leapers,10,9.545455,0.454545
THROW IT TO SANDERS,6,6.090909,-0.090909
Frank The Tank,4,5.363636,-1.363636
Pubic Faith,3,4.454545,-1.454545


##### Sorted Luck by Expected Wins


We want to see what the standings would look like without luck. Some takeaways:

  * Interestingly, the top 4 are the same as if you sort by average points and in the same order. Again, sorry Dinesh!
  * Also interesting (and not terribly surprising) this order looks similar to points scored order.
  * Based on expected wins, our top 4 teams really stand out and then everyone else is pretty meh, with Pubic Faith rounding the bottom a full 1/2 game behind everyone else.

In [19]:
weekly_luck.sort_values(by=['Expected Wins'], ascending=False)

Unnamed: 0_level_0,Actual Wins,Expected Wins,Difference
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lambeau Leapers,10,9.545455,0.454545
Zach Attack,5,8.090909,-3.090909
Aah Fuckin’ Shitdick,9,8.0,1.0
Chubby Winners,9,7.818182,1.181818
Rudy Was Offsides,7,6.545455,0.454545
Eat 4 Dicks Asendorf,8,6.363636,1.636364
THROW IT TO SANDERS,6,6.090909,-0.090909
Peter’s Team,8,5.818182,2.181818
Frank The Tank,4,5.363636,-1.363636
DJ Purple,3,5.0,-2.0


### Team Excitement Rankings


Here I want to evaluate how exciting your season was. This will look at your average point differential, win or lose. Essentially, if you won and lost games by 5 points every week, that would be way more intense & exciting than if every week was a blowout and you won or lost by 30 points. 


We'll compute 4 things:

  * Mean absolute difference
  * Median absolute difference (might be more robust in this case)
  * Number of Excited Games: I arbitrarily define this to be within 10 points
  * Number of Blowout Games: I arbitrarily define this to be more than 25 points

In [23]:
# There is probably a slick pandas was to do this - for now, bruteforce 

exc_index = []
med_exc = []
mean_exc = []
num_exc = []
num_blow = []

for team in data.index:
    score_diff = []
    for week in range(1, NUM_WEEKS+1):
    
        # Your score
        your_score = data['Week {}'.format(week)][team]
    
        # Opponents score
        opp_score = data['Week {}'.format(week)][schedule['Week {}'.format(week)][team]]
        
        score_diff.append(abs(your_score - opp_score))
    
    # Logging
    exc_index.append(team)
    mean_exc.append(np.mean(score_diff))
    med_exc.append(np.median(score_diff))
    num_exc.append(sum(np.array(score_diff) < 10))
    num_blow.append(sum(np.array(score_diff) > 25))
    
exc_df = pd.DataFrame({'Mean Score Diff':mean_exc, 'Median Score Diff':med_exc, 'Number Close Games':num_exc, 'Number Blowouts':num_blow}, index = exc_index)

##### Sort by Median Score Differential


A lower differential means your weekly games were relatively close each week. Higher means your games were blowouts and not very interesting. Some notes:

  * The medians and means more or less match up in terms of rankings. Median is more robust to outliers, so I'm sticking with that. Essentially you can look at this and say 6 of my games were closer than my median and 6 of my games had a larger point differential than my median.
  * Chubby Winners had the most exciting year, followed pretty closely by Eat 4 Dicks Asendorf, DJ Purple, and Rudy was Offsides. 
  * JBone!'s median point differential was 35!!!!! That is insane! That means that 6 weeks had higher than 35 points. Crazy!
  * Lambeau Leapers, as expected also had a huge number here as he crushed most of his opponents (more in a second)
  * I don't know what to make of Peter's Team having a high number here. Considering he was the luckiest team, I expected him to have closer matches, but this also matches his 4th highest weekly standard deviation. Either he was good or bad.

In [25]:
exc_df.sort_values(by=['Median Score Diff'], ascending=True)

Unnamed: 0,Mean Score Diff,Median Score Diff,Number Close Games,Number Blowouts
Chubby Winners,20.253846,16.92,5,5
Eat 4 Dicks Asendorf,19.315385,17.94,5,5
DJ Purple,19.636923,19.56,5,3
Rudy Was Offsides,22.826154,19.82,5,6
THROW IT TO SANDERS,23.692308,22.28,3,6
Frank The Tank,31.776923,23.0,3,6
Aah Fuckin’ Shitdick,27.353846,23.1,0,4
Zach Attack,28.055385,23.9,1,5
Pubic Faith,28.703077,24.48,1,6
Peter’s Team,26.263077,29.6,3,7


##### Sort by Number of Close Games


I defined these as games where the final score was within 10 points. To me, that's the difference between maybe swapping out a bench player who could have played better that week. Some notes:

  * Lambeau Leapers and ahh fuckin' shitdick had no games within 10 points. Crazy!
  * With as unlucky as Zach Attack was, he only had 1 (!!!) game within 10 points.
  * We had 5 teams with 5 games within 10 points.


In [26]:
exc_df.sort_values(by=['Number Close Games'], ascending=False)

Unnamed: 0,Mean Score Diff,Median Score Diff,Number Close Games,Number Blowouts
DJ Purple,19.636923,19.56,5,3
Eat 4 Dicks Asendorf,19.315385,17.94,5,5
Jbone!,29.812308,35.78,5,7
Chubby Winners,20.253846,16.92,5,5
Rudy Was Offsides,22.826154,19.82,5,6
Peter’s Team,26.263077,29.6,3,7
THROW IT TO SANDERS,23.692308,22.28,3,6
Frank The Tank,31.776923,23.0,3,6
Pubic Faith,28.703077,24.48,1,6
Zach Attack,28.055385,23.9,1,5


##### Sort by Number of Blowouts


I defined these as games where the final score was more than 25 points apart. To me, this probably meant that there were no bench/substitution mistakes and that no matter what, the winner was foregone pretty quickly. Some notes:

  * Lambeau Leapers had 8 blowouts! Likely all his doing :) Just domination. That's crazy to think about. Not only was he winning, but just dominating.
  * Peter's Team was second with 7, which surprised me a little. Either his team was great and crushed someone or his team laid an egg.
  * Least number of blowouts goes to me, DJ Purple - also surprising. I had the lowest variance so really I think this means that I was somewhat competitive but still generally bad.

In [38]:
exc_df.sort_values(by=['Number Blowouts'], ascending=False)

Unnamed: 0,Mean Score Diff,Median Score Diff,Number Close Games,Number Blowouts
Lambeau Leapers,32.283333,31.29,0,8
Peter’s Team,26.508333,29.79,3,7
Jbone!,27.93,29.58,4,6
THROW IT TO SANDERS,25.051667,24.18,2,6
Pubic Faith,26.855,23.67,0,5
Frank The Tank,29.685,22.64,3,5
Chubby Winners,21.326667,20.7,4,5
Rudy Was Offsides,20.115,16.86,5,5
Eat 4 Dicks Asendorf,18.055,15.15,5,4
Aah Fuckin’ Shitdick,28.333333,23.24,0,4


### Team Luck Analysis


Here I want to evaluate
a team's week-to-week variance. This is a slightly different luck analysis than the schedule analysis above. Instead of assuming that your score for the week is fixed and looking at who you played, this analysis will keep your opponent the same each week, but look at your scores for other weeks and see if you would have won. This asks the question "What if my player's schedule was different?". Again we'll compare the Expected wins to the actual wins but sort by expected wins. Some notes:

  * Generally speaking the expected wins don't deviate as strongly as the other luck analysis.
  * After digging into things, I think this analysis correlates to points scored against. Because this analysis keeps your opponent the same every week, if your oppoent got really lucky week after week, even your good scores wouldn't have won a bunch.
  * The top 4 stay the same with an interesting exception - Peter's Team jumps into the top 4. So while Peter's matchup on a weekly basis was lucky (previous analysis showed +2 wins over expected), this analysis reveals that Peter's Team had the lowest points scored against him.
  * Also interestingly, Zach Attack only finishes 6th in this analysis even though he scored 2nd most points and would have finished 2nd in the luck analysis. This is likely due to the fact that he had the 2nd most points scored against him and so while his week-to-week scores were pretty good, he likely couldn't do much to overcome the brutal schedule.
  * JBone! also does really poorly here and this reveals an interesting difference between him and Zach Attack. JBone! had the highest points scored against him and Zach Attack had the 2nd most - both really close. Zack Attack gains a few wins, while JBone! loses about 2.5. JBone! scored about 15 points less per week and so we'll see his score drop pretty significantly.

In [30]:
# There is probably a slick pandas was to do this - for now, bruteforce 

tl_index = []
tl_total_wins = []

for team in data.index:   
    team_cumulative_wins = 0
    #print(team)
    for week in range(1, NUM_WEEKS+1):
    
        # Assume you scored this week every week 
        temp = data.copy()
        temp.loc[team] = NUM_WEEKS*[data['Week {}'.format(week)][team]]
    
        # Calculate wins
        #print(calc_team_wins(team, temp, schedule))
        team_cumulative_wins += calc_team_wins(team, temp, schedule)
        
    # Logging
    tl_index.append(team)
    tl_total_wins.append(team_cumulative_wins/13)
            
tl_df = pd.DataFrame({'Expected Wins':tl_total_wins}, index = tl_index)
tl_df['Actual Wins'] = actual_wins
tl_df.sort_values(by=['Expected Wins'], ascending=False)

Unnamed: 0,Expected Wins,Actual Wins
Lambeau Leapers,10.153846,10
Chubby Winners,8.538462,9
Peter’s Team,7.692308,8
Aah Fuckin’ Shitdick,7.538462,9
Rudy Was Offsides,7.461538,7
Eat 4 Dicks Asendorf,6.846154,8
Zach Attack,6.615385,5
THROW IT TO SANDERS,5.769231,6
Frank The Tank,5.538462,4
Pubic Faith,4.0,3
