Boris Chen or Boris Degen?

In [1]:
import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn import metrics
from sklearn.metrics import calinski_harabasz_score
from sklearn.cluster import AgglomerativeClustering
%matplotlib inline
plt.style.use('seaborn')

In [3]:
offense_df = pd.read_csv('data/nfl_pass_rush_receive_raw_data.csv')
dst_df = pd.read_csv('data/nfl_dst_raw_data.csv')
kicking_df = pd.read_csv('data/nfl_kicking_raw_data.csv')

Load data for offenses, D/STs, and kicking. <br>
First, focus on offense data. Passing and QB play specifically. <br>
Starting QBs will have established ranks and potentially clustered with k-means. <br>
Players that have attempted passes on trick plays or other special play designs will be excluded.

In [5]:
offense_df[0:100]

Unnamed: 0,game_id,player_id,pos,player,team,pass_cmp,pass_att,pass_yds,pass_td,pass_int,...,OT,Roof,Surface,Temperature,Humidity,Wind_Speed,Vegas_Line,Vegas_Favorite,Over_Under,game_date
0,202209080ram,AlleJo02,QB,Josh Allen,BUF,26,31,297,3,2,...,False,dome,matrixturf,72,45,0,-2.5,BUF,52.0,2022-09-08
1,202209080ram,SingDe00,RB,Devin Singletary,BUF,0,0,0,0,0,...,False,dome,matrixturf,72,45,0,-2.5,BUF,52.0,2022-09-08
2,202209080ram,MossZa00,RB,Zack Moss,BUF,0,0,0,0,0,...,False,dome,matrixturf,72,45,0,-2.5,BUF,52.0,2022-09-08
3,202209080ram,CookJa01,RB,James Cook,BUF,0,0,0,0,0,...,False,dome,matrixturf,72,45,0,-2.5,BUF,52.0,2022-09-08
4,202209080ram,DiggSt00,WR,Stefon Diggs,BUF,0,0,0,0,0,...,False,dome,matrixturf,72,45,0,-2.5,BUF,52.0,2022-09-08
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,202209110cin,HeywCo00,TE,Connor Heyward,PIT,0,0,0,0,0,...,True,outdoors,fieldturf,72,45,0,-7.0,CIN,44.0,2022-09-11
96,202209110cin,TaylTr02,WR,Trent Taylor,CIN,0,0,0,0,0,...,True,outdoors,fieldturf,72,45,0,-7.0,CIN,44.0,2022-09-11
97,202209110cin,MorgSt02,WR,Stanley Morgan Jr.,CIN,0,0,0,0,0,...,True,outdoors,fieldturf,72,45,0,-7.0,CIN,44.0,2022-09-11
98,202209110cin,WilcMi01,TE,Mitchell Wilcox,CIN,0,0,0,0,0,...,True,outdoors,fieldturf,72,45,0,-7.0,CIN,44.0,2022-09-11


In [11]:
# qbs_df = []
# for entry in offense_df:
#     if offense_df['pos']=='QB':
#         qbs_df.append(entry)


In [38]:
qbs_df = offense_df[offense_df['pos']=='QB']
# qbs_df

Apprarently 277 QB entries this season, this data reflects weeks 1-8 of the NFL 2022 regular season. <br>
32 teams * 8 games = 256 minus teams on bye so there have certainly been backups, injuries, and QB changes.

In [19]:
print(qbs_df['player'].unique())
print(len(qbs_df['player'].unique()))

['Josh Allen' 'Matthew Stafford' 'Jameis Winston' 'Taysom Hill'
 'Marcus Mariota' 'Baker Mayfield' 'Jacoby Brissett' 'Trey Lance'
 'Justin Fields' 'Joe Burrow' 'Mitchell Trubisky' 'Patrick Mahomes'
 'Kyler Murray' 'Trace McSorley' 'Dak Prescott' 'Cooper Rush' 'Tom Brady'
 'Jared Goff' 'Jalen Hurts' 'Matt Ryan' 'Davis Mills' 'Jeff Driskel'
 'Mac Jones' 'Tua Tagovailoa' 'Aaron Rodgers' 'Jordan Love' 'Kirk Cousins'
 'Lamar Jackson' 'Joe Flacco' 'Daniel Jones' 'Ryan Tannehill' 'Derek Carr'
 'Justin Herbert' 'Trevor Lawrence' 'Carson Wentz' 'Geno Smith'
 'Russell Wilson' 'Jimmy Garoppolo' 'Case Keenum' 'Malik Willis'
 'Teddy Bridgewater' 'C.J. Beathard' 'Bailey Zappe' 'Brian Hoyer'
 'Andy Dalton' 'Tyrod Taylor' 'Zach Wilson' 'Kenny Pickett' 'Brock Purdy'
 'P.J. Walker' 'Skylar Thompson' 'Jacob Eason' 'Brett Rypien' 'Chad Henne'
 'Taylor Heinicke' 'Sam Ehlinger' 'Trevor Siemian' 'Jarrett Stidham'
 'Gardner Minshew II']
59


59 QBs have made appearances this season. <br>
We will take a look at each of these players.

In [21]:
qbs_df['player'].value_counts()

Joe Burrow            8
Daniel Jones          8
Kyler Murray          8
Marcus Mariota        8
Justin Fields         8
Lamar Jackson         8
Aaron Rodgers         8
Geno Smith            8
Tom Brady             8
Jacoby Brissett       8
Trevor Lawrence       8
Jared Goff            7
Davis Mills           7
Matt Ryan             7
Matthew Stafford      7
Derek Carr            7
Kirk Cousins          7
Russell Wilson        7
Jalen Hurts           7
Josh Allen            7
Justin Herbert        7
Jimmy Garoppolo       7
Patrick Mahomes       7
Cooper Rush           6
Tua Tagovailoa        6
Carson Wentz          6
Ryan Tannehill        6
Kenny Pickett         5
Baker Mayfield        5
Mac Jones             5
Andy Dalton           5
Mitchell Trubisky     5
Zach Wilson           5
Teddy Bridgewater     4
Bailey Zappe          4
P.J. Walker           4
Taysom Hill           3
Joe Flacco            3
Dak Prescott          3
Jameis Winston        3
Skylar Thompson       2
Jordan Love     

Taking a look at passing yards, rushing yards, passer rating, and TD:INT rate for top QBs this season (contextual understanding and interpretation of successful teams and widely regarded QBs).

In [30]:
qbs_df.columns

Index(['game_id', 'player_id', 'pos', 'player', 'team', 'pass_cmp', 'pass_att',
       'pass_yds', 'pass_td', 'pass_int', 'pass_sacked', 'pass_sacked_yds',
       'pass_long', 'pass_rating', 'rush_att', 'rush_yds', 'rush_td',
       'rush_long', 'targets', 'rec', 'rec_yds', 'rec_td', 'rec_long',
       'fumbles_lost', 'rush_scrambles', 'designed_rush_att',
       'comb_pass_rush_play', 'comb_pass_play', 'comb_rush_play',
       'Team_abbrev', 'Opponent_abbrev', 'two_point_conv', 'total_ret_td',
       'offensive_fumble_recovery_td', 'pass_yds_bonus', 'rush_yds_bonus',
       'rec_yds_bonus', 'Total_DKP', 'Off_DKP', 'Total_FDP', 'Off_FDP',
       'Total_SDP', 'Off_SDP', 'pass_target_yds', 'pass_poor_throws',
       'pass_blitzed', 'pass_hurried', 'rush_yds_before_contact', 'rush_yac',
       'rush_broken_tackles', 'rec_air_yds', 'rec_yac', 'rec_drops', 'offense',
       'off_pct', 'vis_team', 'home_team', 'vis_score', 'home_score', 'OT',
       'Roof', 'Surface', 'Temperature', 'Humidit

In [37]:
joshallen_df = qbs_df[qbs_df['player']=='Josh Allen']
# joshallen_df

## Players will be displayed with their stats from here on down:

In [35]:
print('joshallen pass_yds: ', joshallen_df['pass_yds'].sum())
print('joshallen rush_yds: ', joshallen_df['rush_yds'].sum())
print('joshallen pass_td: ', joshallen_df['pass_td'].sum())
print('joshallen pass_int: ', joshallen_df['pass_int'].sum())
print('joshallen rush_td: ', joshallen_df['rush_td'].sum())
print('joshallen fumbles_lost: ', joshallen_df['fumbles_lost'].sum())
print('joshallen pass_rating: ', joshallen_df['pass_rating'].mean())

joshallen pass_yds:  2198
joshallen rush_yds:  306
joshallen pass_td:  19
joshallen pass_int:  6
joshallen rush_td:  2
joshallen fumbles_lost:  2
joshallen pass_rating:  104.4


In [39]:
joeburrow_df = qbs_df[qbs_df['player']=='Joe Burrow']
print('joeburrow pass_yds: ', joeburrow_df['pass_yds'].sum())
print('joeburrow rush_yds: ', joeburrow_df['rush_yds'].sum())
print('joeburrow pass_td: ', joeburrow_df['pass_td'].sum())
print('joeburrow pass_int: ', joeburrow_df['pass_int'].sum())
print('joeburrow rush_td: ', joeburrow_df['rush_td'].sum())
print('joeburrow fumbles_lost: ', joeburrow_df['fumbles_lost'].sum())
print('joeburrow pass_rating: ', joeburrow_df['pass_rating'].mean())

joeburrow pass_yds:  2329
joeburrow rush_yds:  132
joeburrow pass_td:  17
joeburrow pass_int:  6
joeburrow rush_td:  3
joeburrow fumbles_lost:  2
joeburrow pass_rating:  103.21249999999999


In [40]:
danieljones_df = qbs_df[qbs_df['player']=='Daniel Jones']
print('danieljones pass_yds: ', danieljones_df['pass_yds'].sum())
print('danieljones rush_yds: ', danieljones_df['rush_yds'].sum())
print('danieljones pass_td: ', danieljones_df['pass_td'].sum())
print('danieljones pass_int: ', danieljones_df['pass_int'].sum())
print('danieljones rush_td: ', danieljones_df['rush_td'].sum())
print('danieljones fumbles_lost: ', danieljones_df['fumbles_lost'].sum())
print('danieljones pass_rating: ', danieljones_df['pass_rating'].mean())

danieljones pass_yds:  1399
danieljones rush_yds:  363
danieljones pass_td:  6
danieljones pass_int:  2
danieljones rush_td:  3
danieljones fumbles_lost:  2
danieljones pass_rating:  89.37499999999999


In [41]:
kylermurray_df = qbs_df[qbs_df['player']=='Kyler Murray']
print('kylermurray pass_yds: ', kylermurray_df['pass_yds'].sum())
print('kylermurray rush_yds: ', kylermurray_df['rush_yds'].sum())
print('kylermurray pass_td: ', kylermurray_df['pass_td'].sum())
print('kylermurray pass_int: ', kylermurray_df['pass_int'].sum())
print('kylermurray rush_td: ', kylermurray_df['rush_td'].sum())
print('kylermurray fumbles_lost: ', kylermurray_df['fumbles_lost'].sum())
print('kylermurray pass_rating: ', kylermurray_df['pass_rating'].mean())

kylermurray pass_yds:  1993
kylermurray rush_yds:  299
kylermurray pass_td:  10
kylermurray pass_int:  6
kylermurray rush_td:  2
kylermurray fumbles_lost:  1
kylermurray pass_rating:  86.8125


In [43]:
marcusmariota_df = qbs_df[qbs_df['player']=='Marcus Mariota']
print('marcusmariota pass_yds: ', marcusmariota_df['pass_yds'].sum())
print('marcusmariota rush_yds: ', marcusmariota_df['rush_yds'].sum())
print('marcusmariota pass_td: ', marcusmariota_df['pass_td'].sum())
print('marcusmariota pass_int: ', marcusmariota_df['pass_int'].sum())
print('marcusmariota rush_td: ', marcusmariota_df['rush_td'].sum())
print('marcusmariota fumbles_lost: ', marcusmariota_df['fumbles_lost'].sum())
print('marcusmariota pass_rating: ', marcusmariota_df['pass_rating'].mean())

marcusmariota pass_yds:  1432
marcusmariota rush_yds:  280
marcusmariota pass_td:  10
marcusmariota pass_int:  6
marcusmariota rush_td:  3
marcusmariota fumbles_lost:  3
marcusmariota pass_rating:  94.68750000000001


In [44]:
justinfields_df = qbs_df[qbs_df['player']=='Justin Fields']
print('justinfields pass_yds: ', justinfields_df['pass_yds'].sum())
print('justinfields rush_yds: ', justinfields_df['rush_yds'].sum())
print('justinfields pass_td: ', justinfields_df['pass_td'].sum())
print('justinfields pass_int: ', justinfields_df['pass_int'].sum())
print('justinfields rush_td: ', justinfields_df['rush_td'].sum())
print('justinfields fumbles_lost: ', justinfields_df['fumbles_lost'].sum())
print('justinfields pass_rating: ', justinfields_df['pass_rating'].mean())

justinfields pass_yds:  1199
justinfields rush_yds:  424
justinfields pass_td:  7
justinfields pass_int:  6
justinfields rush_td:  3
justinfields fumbles_lost:  1
justinfields pass_rating:  78.65


In [48]:
#Lamar
lamarjackson_df = qbs_df[qbs_df['player']=='Lamar Jackson']
print('lamarjackson pass_yds: ', lamarjackson_df['pass_yds'].sum())
print('lamarjackson rush_yds: ', lamarjackson_df['rush_yds'].sum())
print('lamarjackson pass_td: ', lamarjackson_df['pass_td'].sum())
print('lamarjackson pass_int: ', lamarjackson_df['pass_int'].sum())
print('lamarjackson rush_td: ', lamarjackson_df['rush_td'].sum())
print('lamarjackson fumbles_lost: ', lamarjackson_df['fumbles_lost'].sum())
print('lamarjackson pass_rating: ', lamarjackson_df['pass_rating'].mean())

lamarjackson pass_yds:  1635
lamarjackson rush_yds:  553
lamarjackson pass_td:  15
lamarjackson pass_int:  6
lamarjackson rush_td:  2
lamarjackson fumbles_lost:  1
lamarjackson pass_rating:  92.75


In [49]:
#Aaron
aaronrodgers_df = qbs_df[qbs_df['player']=='Justin Fields']
print('aaronrodgers pass_yds: ', aaronrodgers_df['pass_yds'].sum())
print('aaronrodgers rush_yds: ', aaronrodgers_df['rush_yds'].sum())
print('aaronrodgers pass_td: ', aaronrodgers_df['pass_td'].sum())
print('aaronrodgers pass_int: ', aaronrodgers_df['pass_int'].sum())
print('aaronrodgers rush_td: ', aaronrodgers_df['rush_td'].sum())
print('aaronrodgers fumbles_lost: ', aaronrodgers_df['fumbles_lost'].sum())
print('aaronrodgers pass_rating: ', aaronrodgers_df['pass_rating'].mean())

aaronrodgers pass_yds:  1199
aaronrodgers rush_yds:  424
aaronrodgers pass_td:  7
aaronrodgers pass_int:  6
aaronrodgers rush_td:  3
aaronrodgers fumbles_lost:  1
aaronrodgers pass_rating:  78.65


In [50]:
#Geno
genosmith_df = qbs_df[qbs_df['player']=='Geno Smith']
print('genosmith pass_yds: ', genosmith_df['pass_yds'].sum())
print('genosmith rush_yds: ', genosmith_df['rush_yds'].sum())
print('genosmith pass_td: ', genosmith_df['pass_td'].sum())
print('genosmith pass_int: ', genosmith_df['pass_int'].sum())
print('genosmith rush_td: ', genosmith_df['rush_td'].sum())
print('genosmith fumbles_lost: ', genosmith_df['fumbles_lost'].sum())
print('genosmith pass_rating: ', genosmith_df['pass_rating'].mean())

genosmith pass_yds:  1924
genosmith rush_yds:  158
genosmith pass_td:  13
genosmith pass_int:  3
genosmith rush_td:  1
genosmith fumbles_lost:  1
genosmith pass_rating:  107.85


In [52]:
#Tom
tombrady_df = qbs_df[qbs_df['player']=='Tom Brady']
print('tombrady pass_yds: ', tombrady_df['pass_yds'].sum())
print('tombrady rush_yds: ', tombrady_df['rush_yds'].sum())
print('tombrady pass_td: ', tombrady_df['pass_td'].sum())
print('tombrady pass_int: ', tombrady_df['pass_int'].sum())
print('tombrady rush_td: ', tombrady_df['rush_td'].sum())
print('tombrady fumbles_lost: ', tombrady_df['fumbles_lost'].sum())
print('tombrady pass_rating: ', tombrady_df['pass_rating'].mean())

tombrady pass_yds:  2267
tombrady rush_yds:  -5
tombrady pass_td:  9
tombrady pass_int:  1
tombrady rush_td:  0
tombrady fumbles_lost:  2
tombrady pass_rating:  91.3875


In [51]:
#Brissett
jacobybrissett_df = qbs_df[qbs_df['player']=='Jacoby Brissett']
print('jacobybrissett pass_yds: ', jacobybrissett_df['pass_yds'].sum())
print('jacobybrissett rush_yds: ', jacobybrissett_df['rush_yds'].sum())
print('jacobybrissett pass_td: ', jacobybrissett_df['pass_td'].sum())
print('jacobybrissett pass_int: ', jacobybrissett_df['pass_int'].sum())
print('jacobybrissett rush_td: ', jacobybrissett_df['rush_td'].sum())
print('jacobybrissett fumbles_lost: ', jacobybrissett_df['fumbles_lost'].sum())
print('jacobybrissett pass_rating: ', jacobybrissett_df['pass_rating'].mean())

jacobybrissett pass_yds:  1862
jacobybrissett rush_yds:  142
jacobybrissett pass_td:  7
jacobybrissett pass_int:  5
jacobybrissett rush_td:  2
jacobybrissett fumbles_lost:  3
jacobybrissett pass_rating:  90.5625


In [53]:
#Lawrence
trevorlawrence_df = qbs_df[qbs_df['player']=='Trevor Lawrence']
print('trevorlawrence pass_yds: ', trevorlawrence_df['pass_yds'].sum())
print('trevorlawrence rush_yds: ', trevorlawrence_df['rush_yds'].sum())
print('trevorlawrence pass_td: ', trevorlawrence_df['pass_td'].sum())
print('trevorlawrence pass_int: ', trevorlawrence_df['pass_int'].sum())
print('trevorlawrence rush_td: ', trevorlawrence_df['rush_td'].sum())
print('trevorlawrence fumbles_lost: ', trevorlawrence_df['fumbles_lost'].sum())
print('trevorlawrence pass_rating: ', trevorlawrence_df['pass_rating'].mean())

trevorlawrence pass_yds:  1840
trevorlawrence rush_yds:  99
trevorlawrence pass_td:  10
trevorlawrence pass_int:  6
trevorlawrence rush_td:  3
trevorlawrence fumbles_lost:  4
trevorlawrence pass_rating:  86.3


All QBs who haven't had a bye were analyzed above. Refer josh allen as test subject 1 although he has 7 GP.