# NFL Coaching Tree 
#### This project focuses on tracing the lineage of NFL head coaching trees, mapping out how current and former coaches are connected through past coaching relationships. By analyzing these coaching trees, it becomes apparent how coaching philosophies, strategies, and networks have evolved and spread across the league over time. The goal is to visualize these connections and highlight influential coaching figures who have shaped the modern NFL. In addition, by viewing each staff's win total and potential awards received, we can get a better insight into how big of an impact a successful team played on the staff's individual success in the future

# First, we will load in all the necessary packages
#### These packages will help us get a better understanding of the dataset and of all the variables that are at play. 

In [1]:
# Imports 
import pandas as pd
import numpy as np
import pandasql as ps

### We can load all datasets into a pandas library and view what's inside
#### First is the coaches csv file. It contains information about a coach and each member on their staff alongside their role. Also contains info on how long each member was a part of the staff under the head coach alongside how the team's 

In [3]:
hc = pd.read_csv('NFL_Coaches.csv')
ps.sqldf("select * from hc")

Unnamed: 0,Head_Coach,Coach,weight,group,group_count,lrole,tW,tPyW,tg_hc,ftm,...,fy_a,ly_a,TYr,nw,pyw500,above500,odo,cord,lodo,lcord
0,Adam Gase,Vance Joseph,1,DC,1,DC,32,26.28,80,MIA,...,2016,2016,1,1.000000,0,0,2,1,2,1
1,Al Groh,Ken Whisenhunt,1,aSTC,1,aSTC,9,8.00,16,NYJ,...,2000,2000,1,1.000000,1,1,3,0,3,0
2,Al Groh,Mike Nolan,1,DC,1,DC,9,8.00,16,NYJ,...,2000,2000,1,1.000000,1,1,2,1,2,1
3,Al Groh,Todd Bowles,1,aDC,1,aDC,9,8.00,16,NYJ,...,2000,2000,1,1.000000,1,1,2,0,2,0
4,Al Groh,Todd Haley,1,aOC,1,aOC,9,8.00,16,NYJ,...,2000,2000,1,1.000000,1,1,1,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
627,Weeb Ewbank,Chuck Knox,1,aOC,1,aOC,56,57.96,112,NYJ,...,1966,1966,1,1.000000,1,1,1,0,1,0
628,Weeb Ewbank,Clive Rush,3,OC,3,OC,56,57.96,112,NYJ,...,1966,1968,3,1.000000,1,1,1,1,1,1
629,Weeb Ewbank,Ed Biles,3,aDC,2,aDC,56,57.96,112,NYJ,...,1971,1973,3,1.000000,1,1,2,0,2,0
630,Weeb Ewbank,Ken Meyer,4,aOC,4,aOC,56,57.96,112,NYJ,...,1969,1972,4,1.000000,1,1,1,0,1,0


#### Next, we can load in the wins csv file. It contains stats about a head coach's season with a tema such as wins/losses, total points scored for and against. In addition, it also contains whether or not a coach won/appeared in the superbowl that year, and if they received a coach of the year award

In [4]:
wins = pd.read_csv('wins.csv')
ps.sqldf("select * from wins")

Unnamed: 0,Season,Coach,Tm,Win,Loss,Tie,PF,PA,Total,fwk,...,yrs_hc,ftm,ltm,mpwpct,sb,sba,coy,name,pywins,awins
0,1966,Allie Sherman,NYG,1,12,1,263,501,14,1,...,3,NYG,NYG,0.487475,0,0,0,Allie Sherman's,3.032486,1.821429
1,1966,Bill Austin,PIT,5,8,1,316,347,14,1,...,4,PIT,WAS,0.487475,0,0,0,Bill Austin's,7.561235,6.678571
2,1966,Blanton Collier,CLE,9,5,0,403,259,14,1,...,5,CLE,CLE,0.487475,0,0,0,Blanton Collier's,12.585978,10.928571
3,1966,Charley Winner,ARI,8,5,1,264,265,14,1,...,7,ARI,NYJ,0.487475,0,0,0,Charley Winner's,8.461919,10.321429
4,1966,Don Shula,IND,9,5,0,314,226,14,1,...,30,IND,MIA,0.487475,0,0,0,Don Shula's,11.654339,10.928571
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1755,2022,Sean McDermott,BUF,13,3,0,455,286,16,1,...,6,BUF,BUF,0.486228,0,0,0,Sean McDermott's,12.755709,13.812500
1756,2022,Sean McVay,LA,5,12,0,307,384,17,1,...,6,LA,LA,0.486228,0,0,0,Sean McVay's,6.297226,5.000000
1757,2022,Steve Wilks,CAR,6,6,0,254,252,12,6,...,2,ARI,CAR,0.486228,0,0,0,Steve Wilks',8.579623,8.500000
1758,2022,Todd Bowles,TB,8,9,0,313,358,17,1,...,6,MIA,TB,0.486228,0,0,0,Todd Bowles',7.158276,8.000000


#### We can join our coaches and win tables on the season in order to paint a full picture of what exactly each season entailed in terms of their overall performance, alongside who was on the team. 

In [5]:
coach = ps.sqldf("select wins.Season, wins.Coach as Head_Coach, hc.Coach as Coordinator, hc.lrole as role, wins.tm as team, wins.Win, wins.Loss, wins.Tie from wins join hc on wins.Coach=hc.Head_Coach")
coach

Unnamed: 0,Season,Head_Coach,Coordinator,role,team,Win,Loss,Tie
0,1966,Allie Sherman,Alex Webster,aOC,NYG,1,12,1
1,1966,Allie Sherman,Harland Svare,DC,NYG,1,12,1
2,1966,Allie Sherman,Jack Patera,aDC,NYG,1,12,1
3,1966,Bill Austin,Harland Svare,DC,PIT,5,8,1
4,1966,Bill Austin,Mike McCormack,aDC,PIT,5,8,1
...,...,...,...,...,...,...,...,...
6598,2022,Ron Rivera,Steve Wilks,DC,WAS,9,8,1
6599,2022,Sean McDermott,David Culley,aOC,BUF,13,3,0
6600,2022,Sean McVay,Brandon Staley,DC,LA,5,12,0
6601,2022,Sean McVay,Matt LaFleur,OC,LA,5,12,0


#### Now we can get an exact breakdown of a single head coach's staff for an entire specific season, and use this to better build our eventual coaching tree

In [6]:
coach.to_csv('coaching_tree')