In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from math import sqrt
from statistics import mean

import requests

In [5]:
WR_data = pd.read_csv('WR_datafinal.csv')

On Madden ratings:

One thing that makes evaluating player solely on Madden ratings difficult is that ratings are updated based off new information while also taking into account the existing rating. As such, a players Madden rating is a combination of their current production and rapport they've built during their time in the league. An example of this is Odell Beckham.

In [7]:
WR_data[WR_data['name'] == 'Odell Beckham']

Unnamed: 0,id,name,years_played,team,season,season_type,receptions,targets,receiving_yards,receiving_tds,...,ypr,rec_td_percentage,rec_ypg,round,overall,ht,wt,forty,vertical,Overall Rating
1813,1342,Odell Beckham,,NYG,2014,REG,91,130,1305,12,...,14.34,0.132,108.75,1.0,12.0,,,,,94.0
1814,1342,Odell Beckham,,NYG,2015,REG,96,158,1450,13,...,15.1,0.135,96.666667,1.0,12.0,,,,,93.0
1815,1342,Odell Beckham,,NYG,2016,REG,101,169,1367,10,...,13.53,0.099,85.4375,1.0,12.0,,,,,93.0
1816,1342,Odell Beckham,,NYG,2017,REG,25,41,302,3,...,12.08,0.12,75.5,1.0,12.0,,,,,95.0
1817,1342,Odell Beckham,,NYG,2018,REG,77,124,1052,6,...,13.66,0.078,87.666667,1.0,12.0,,,,,96.0
1818,1342,Odell Beckham,,CLE,2019,REG,74,133,1035,4,...,13.99,0.054,64.6875,1.0,12.0,,,,,91.0
1819,1342,Odell Beckham,,CLE,2020,REG,23,43,319,3,...,13.87,0.13,45.571429,1.0,12.0,,,,,89.0
1820,1342,Odell Beckham,,CLE,2021,REG,17,34,232,0,...,13.65,0.0,38.666667,1.0,12.0,,,,,80.0
1821,1342,Odell Beckham,,LA,2021,REG,27,48,305,5,...,11.3,0.185,38.125,1.0,12.0,,,,,80.0


In 2020, Beckham played half the season, but his stats from his games played were far below what he had historically recorded, as such, his rating took a 2 point dip from 91 to 89. In 2021, Beckham recorded just 537 yards and 5 touchdowns (he was traded mid season so there are two 2021 entries), which saw his rating fall to an 80.

The issue is that Beckham's rapport as an all-star talent 'shielded' his rating from falling further. A first-year player (no previous history) with a similar statline to Beckham's 2021 season would earn a rating far below 80. Thus, a particularly strong or weak season could skew subsequent year's ratings.

Another issue is that rookie players are often rated 'harsher' than non-rookies. Take a look at 2022 rookie of the year Garrett Wilson:

In [12]:
WR_data[WR_data['name'] == 'Garrett Wilson']

Unnamed: 0,id,name,years_played,team,season,season_type,receptions,targets,receiving_yards,receiving_tds,...,ypr,rec_td_percentage,rec_ypg,round,overall,ht,wt,forty,vertical,Overall Rating
909,645,Garrett Wilson,1.0,NYJ,2022,REG,82,146,1116,4,...,13.61,0.049,65.647059,1.0,10.0,Jun-00,183.0,4.38,36.0,84.0


Wilson recorded 1116 yards and 4 touchdowns in his rookie year, and was awarded an 84 overall rating. However, compare those numbers to Beckham's 2019 season in which he played all 16 games, and was awarded a 91 overall rating.

Wilson produced on par with Beckham, if not out-producing him in certain categories, however he was given a significantly lower rating.

This issue is compounded by the fact that rookies do not begin with the same overall ratings. Madden has their own system for rating rookies before they have played their first game. This system is some combination of college production, draft postion, and whoever is in charge of Madden rating's personal assesment of these rookies.

However, I was able to ignore this completely in the dataset by ignoring the first Overall Rating entry for each player. For example, Odell Beckham had a rating of 83 to commence his rookie year; before he had recorded any NFL statistics. So I did not include this rating. Unfortunately though, this rating still has an impact on subsequent ratings.

Take for example, Garrett Wilson and Michael Thomas' rookie seasons

In [19]:
WR_data[(WR_data['name'] == 'Garrett Wilson') | ((WR_data['name'] == 'Michael Thomas')&(WR_data['season'] == 2016))]

Unnamed: 0,id,name,years_played,team,season,season_type,receptions,targets,receiving_yards,receiving_tds,...,ypr,rec_td_percentage,rec_ypg,round,overall,ht,wt,forty,vertical,Overall Rating
909,645,Garrett Wilson,1.0,NYJ,2022,REG,82,146,1116,4,...,13.61,0.049,65.647059,1.0,10.0,Jun-00,183.0,4.38,36.0,84.0
1702,1266,Michael Thomas,1.0,NO,2016,REG,92,121,1137,9,...,12.36,0.098,75.8,2.0,47.0,3-Jun,212.0,4.57,35.0,84.0


Wilson and Thomas were awarded the same rating for their rookie season's despite Thomas outproducing Wilson in most significant categories (receptions, touchdowns, reception yards per game). 

This is because Wilson began his rookie season with an overall of 78, whereas Thomas began his NFL campaign with a rating of 75. As you can see in the data, this is likely because Wilson was the 10th overall pick, and Thomas the 47th.

Thankfully, these rookie ratings issues are generally corrected by the second year's end (aka second madden rating), as the starting rating becomes an increasingly distant memory.

In [29]:
WR_data[(WR_data['name'] == 'Michael Thomas') & (WR_data['season'] == 2017)]

Unnamed: 0,id,name,years_played,team,season,season_type,receptions,targets,receiving_yards,receiving_tds,...,ypr,rec_td_percentage,rec_ypg,round,overall,ht,wt,forty,vertical,Overall Rating
1703,1266,Michael Thomas,2.0,NO,2017,REG,104,149,1245,5,...,11.97,0.048,77.8125,2.0,47.0,3-Jun,212.0,4.57,35.0,91.0


The dataset only goes until 2022, but in 2023 Garrett Wilson had 95 receptions for 1032 yards and 3 touchdowns. He was given an 86 rating. As such, the production and rating issue from their rookie season has been 'fixed' by the second year as their original ratings when entering the league are less relevant in the equation.

All this to say that when calculating the mean Madden rating for a player in their first 4 seasons (rookie contract), I should take a weighted average, where the first year's rating is valued less than the subsequent three.

**Madden Grade: a weighted average of a given player's first 4 Madden ratings (rookie contract)**


$$
MG = \frac{0.5R_1 + R_2 + R_3 + R_4}{3.5}
$$

where R represents the Overall Rating

In [65]:
WR_data_sorted = WR_data.sort_values(by=['name', 'season']) #it was already sorted but I couldn't get it to work without this for some reason

def weighted_mean(ratings):
    '''
    calculate the Madden Grade for a player (weighted avg of first 4 overall ratings)
    '''
    if len(ratings) > 0:
        weights = [0.5] + [1] * 3  # Define the weights
        weighted_ratings = [a * b for a, b in zip(ratings[:4], weights[:len(ratings)])]  # Apply the weights
        return sum(weighted_ratings) / sum(weights[:len(ratings)])  # Compute the weighted mean
    else:
        return None  # Return None if there are no ratings

Madden_Grade = WR_data_sorted.groupby('name')['Overall Rating'].apply(weighted_mean).reset_index(name = 'Madden Grade')

For comparisons sake, taking the average Madden Grade depending on draft position:

In [82]:
WR_data_top10 = WR_data[WR_data['overall'] <= 10]

WR_data_top10_sorted = WR_data_top10.sort_values(by=['name', 'season'])

Madden_Grade_top10 = WR_data_top10_sorted.groupby('name')['Overall Rating'].apply(weighted_mean).reset_index(name = 'Madden Grade')

Madden_Grade_top10.loc[Madden_Grade_top10['name'] == 'John Ross', 'Madden Grade'] = 74

Madden_Grade_top10.drop(index= [2,3,5,15,18], inplace=True)

Madden_Grade_top10['Madden Grade'].mean()

83.94940476190476

In [98]:
WR_data_1st = WR_data[(WR_data['overall'] > 10) & (WR_data['overall']<= 32)]

WR_data_1st_sorted = WR_data_1st.sort_values(by=['name', 'season'])

Madden_Grade_1st = WR_data_1st_sorted.groupby('name')['Overall Rating'].apply(weighted_mean).reset_index(name = 'Madden Grade')

#manual overrides
Madden_Grade_1st.loc[Madden_Grade_1st['name'] == 'Henry Ruggs', 'Madden Grade'] = 78
Madden_Grade_1st.loc[Madden_Grade_1st['name'] == "N'Keal Harry", 'Madden Grade'] = 71
Madden_Grade_1st.loc[Madden_Grade_1st['name'] == 'Will Fuller', 'Madden Grade'] = 78
Madden_Grade_1st.drop(index = [14, 19, 11, 12, 13], inplace=True)

Madden_Grade_1st['Madden Grade'].mean()

80.22291666666666

In [103]:
WR_data_2nd = WR_data[(WR_data['overall'] > 32) & (WR_data['overall']<= 64)]

WR_data_2nd_sorted = WR_data_2nd.sort_values(by=['name', 'season'])

Madden_Grade_2nd = WR_data_2nd_sorted.groupby('name')['Overall Rating'].apply(weighted_mean).reset_index(name = 'Madden Grade')

Madden_Grade_2nd.drop(index = [19, 22], inplace=True)
Madden_Grade_2nd['Madden Grade'].mean()

78.8345471521942

In [116]:
WR_data_3rd = WR_data[(WR_data['overall'] > 64) & (WR_data['overall']<=100)]

WR_data_3rd_sorted = WR_data_3rd.sort_values(by=['name', 'season'])

Madden_Grade_3rd = WR_data_3rd_sorted.groupby('name')['Overall Rating'].apply(weighted_mean).reset_index(name = 'Madden Grade')

Madden_Grade_3rd['Madden Grade'].mean()

76.83931623931623

In [114]:
WR_data_4th = WR_data[(WR_data['overall'] > 100 & (WR_data['overall']<=133)]

WR_data_4th_sorted = WR_data_4th.sort_values(by=['name', 'season'])

Madden_Grade_4th = WR_data_4th_sorted.groupby('name')['Overall Rating'].apply(weighted_mean).reset_index(name = 'Madden Grade')

Madden_Grade_4th['Madden Grade'].mean()

70.64255952380952

In [122]:
WR_data_late = WR_data[(WR_data['overall'] > 133) & (WR_data['overall']<=200)]

WR_data_late_sorted = WR_data_late.sort_values(by=['name', 'season'])

Madden_Grade_late = WR_data_late_sorted.groupby('name')['Overall Rating'].apply(weighted_mean).reset_index(name = 'Madden Grade')

Madden_Grade_late['Madden Grade'].mean()

72.54687830687831

interesting how late round picks (rounds 5, 6, and 7) average higher ratings than 4th round picks over the last 11 years!