Today we will plot rushing percentages in college football for a given conference. In particular, this notebook will look at the yearly trend in conferences for rushing on 1st & 10 when down by less than 9 points (ie within one score).

First, we need to import all the stuff we'll use.

In [212]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import cufflinks as cf
import requests
import warnings
warnings.filterwarnings('ignore')
cf.go_offline()

Collegefootballdata.com has an excellent api for getting all the data we could ever want. Let's do that now. (This will take a minute. We're asking for a lot of data!)

In [240]:
data1 = pd.DataFrame()

#Conference abbreviations: SEC, MAC, B12, PAC, MWC, B1G, CUSA, Ind, SBC, AAC, ACC

conf = 'B1G'

for year in range(2010,2011):
    response1 = requests.get("https://api.collegefootballdata.com/plays?seasonType=both&year={0}&offenseConference={1}&defenseConference!=null".format(year,conf))
    df1 = pd.io.json.json_normalize(response1.json())
    df1 = df1[(df1['offense_conference'].isnull()) | (df1['defense_conference'].isnull())] #no fcs games
    df1['year']=year
    #data1 = pd.concat([data,df1])


df1.head()

Unnamed: 0,id,offense,offense_conference,defense,defense_conference,home,away,offense_score,defense_score,drive_id,...,yards_to_goal,down,distance,yards_gained,play_type,play_text,ppa,clock.minutes,clock.seconds,year
0,302450084003,Indiana,Big Ten,Towson,,Indiana,Towson,0,0,30245008401,...,59,2,6,-4,Pass Completion,Ben Chappell pass complete to Damarlo Belcher ...,-1.5634899867942813,14,0,2010
1,302450084005,Indiana,Big Ten,Towson,,Indiana,Towson,0,0,30245008401,...,63,3,10,5,Pass Completion,Ben Chappell pass complete to Duwyce Wilson fo...,0.0312251135526961,13,50,2010
2,302450084006,Indiana,Big Ten,Towson,,Indiana,Towson,0,0,30245008401,...,58,4,5,6,Punt,"Chris Hagerup punt for 41 yards, returned by T...",,12,54,2010
3,302450084002,Indiana,Big Ten,Towson,,Indiana,Towson,0,0,30245008401,...,63,1,10,4,Pass Completion,Ben Chappell pass complete to Terrance Turner ...,-0.1644248613169578,14,30,2010
4,302450084019,Indiana,Big Ten,Towson,,Indiana,Towson,0,0,30245008403,...,36,2,9,11,Pass Completion,Ben Chappell pass complete to Damarlo Belcher ...,0.9239597600034896,11,0,2010


In [234]:
data = pd.DataFrame()

#Conference abbreviations: SEC, MAC, B12, PAC, MWC, B1G, CUSA, Ind, SBC, AAC, ACC

conf = 'B1G'

for year in range(2010,2020):
    response = requests.get("https://api.collegefootballdata.com/plays?seasonType=both&year={0}&offenseConference={1}".format(year,conf))
    df = pd.io.json.json_normalize(response.json())
    df = df[(~df['offense_conference'].isnull()) & (~df['defense_conference'].isnull())] #no fcs games
    df['year']=year
    data = pd.concat([data,df])


data.head()


Unnamed: 0,id,offense,offense_conference,defense,defense_conference,home,away,offense_score,defense_score,drive_id,...,yards_to_goal,down,distance,yards_gained,play_type,play_text,ppa,clock.minutes,clock.seconds,year
83,302450194002,Ohio State,Big Ten,Marshall,Conference USA,Ohio State,Marshall,0,0,30245019402,...,22,1,10,11,Pass Completion,Terrelle Pryor pass complete to Jake Stoneburn...,0.125883353280535,14,50,2010
84,302450194006,Ohio State,Big Ten,Marshall,Conference USA,Ohio State,Marshall,7,0,30245019402,...,70,-1,-1,0,Kickoff,Drew Basil kickoff for 70 yards for a touchback.,,13,42,2010
85,302450194003,Ohio State,Big Ten,Marshall,Conference USA,Ohio State,Marshall,0,0,30245019402,...,11,1,10,0,Pass Incompletion,Terrelle Pryor pass incomplete to Brandon Saine.,-0.291297448934314,14,22,2010
86,302450194004,Ohio State,Big Ten,Marshall,Conference USA,Ohio State,Marshall,0,0,30245019402,...,11,2,10,5,Rush,Brandon Saine rush for 5 yards to the Marsh 6.,0.037899501555332,14,5,2010
87,302450194005,Ohio State,Big Ten,Marshall,Conference USA,Ohio State,Marshall,7,0,30245019402,...,6,3,5,6,Pass Completion,Terrelle Pryor pass complete to DeVier Posey f...,,13,42,2010


Let's use only the data we need from the comprehensive results.

In [241]:
plays = data[['year','offense', 'offense_score','offense_conference' ,'defense_score', 'yards_to_goal', 'down', 'distance', 'play_type']]
plays.head()

Unnamed: 0,year,offense,offense_score,offense_conference,defense_score,yards_to_goal,down,distance,play_type
83,2010,Ohio State,0,Big Ten,0,22,1,10,Pass Completion
84,2010,Ohio State,7,Big Ten,0,70,-1,-1,Kickoff
85,2010,Ohio State,0,Big Ten,0,11,1,10,Pass Incompletion
86,2010,Ohio State,0,Big Ten,0,11,2,10,Rush
87,2010,Ohio State,7,Big Ten,0,6,3,5,Pass Completion


Let's identify play types so we can group together and drop the unneeded plays.

In [242]:
pass_types = ['Pass Reception', 'Pass Interception Return', 'Pass Incompletion', 'Sack', 'Passing Touchdown', 'Interception Return Touchdown']
rush_types = ['Rush', 'Rushing Touchdown']
punt_types = ['Punt', 'Punt Return Touchdown', 'Blocked Punt', 'Blocked Punt Touchdown']
fg_types = ['Field Goal Good', 'Field Goal Missed', 'Blocked Field Goal']

def getPlayCall(x):
    if x in pass_types:
            return 'pass'
    elif x in rush_types:
        return 'rush'
    elif x in punt_types:
        return 'punt'
    elif x in fg_types:
        return 'fg'
    else:
        return None
        
plays['play_call'] = plays['play_type'].apply(getPlayCall)
plays.head()

Unnamed: 0,year,offense,offense_score,offense_conference,defense_score,yards_to_goal,down,distance,play_type,play_call
83,2010,Ohio State,0,Big Ten,0,22,1,10,Pass Completion,
84,2010,Ohio State,7,Big Ten,0,70,-1,-1,Kickoff,
85,2010,Ohio State,0,Big Ten,0,11,1,10,Pass Incompletion,pass
86,2010,Ohio State,0,Big Ten,0,11,2,10,Rush,rush
87,2010,Ohio State,7,Big Ten,0,6,3,5,Pass Completion,


Simplify things a little bit and get the score margin.

In [243]:
plays.dropna(subset=['play_call'], inplace=True)
plays['score_margin'] = plays['offense_score']-plays['defense_score']
plays.drop(columns = ['play_type','offense_score','defense_score'], inplace = True)

plays.head()

Unnamed: 0,year,offense,offense_conference,yards_to_goal,down,distance,play_call,score_margin
85,2010,Ohio State,Big Ten,11,1,10,pass,0
86,2010,Ohio State,Big Ten,11,2,10,rush,0
89,2010,Ohio State,Big Ten,4,1,4,rush,14
90,2010,Ohio State,Big Ten,44,2,5,rush,7
92,2010,Ohio State,Big Ten,45,2,10,rush,14


Lets get the first and ten plays where the offense is within one score of their opponent.

In [244]:
fnt = plays.loc[(plays['down']==1)&(plays['distance']==10)]
fnt = fnt.loc[(fnt['score_margin']<0) & (fnt['score_margin']>-9)]
fnt.head()

Unnamed: 0,year,offense,offense_conference,yards_to_goal,down,distance,play_call,score_margin
224,2010,Minnesota,Big Ten,37,1,10,rush,-3
226,2010,Minnesota,Big Ten,72,1,10,rush,-3
232,2010,Minnesota,Big Ten,60,1,10,pass,-3
233,2010,Minnesota,Big Ten,48,1,10,rush,-3
241,2010,Minnesota,Big Ten,10,1,10,rush,-3


We would like to see the percentage of these plays that are rushes, by team and by year.

In [245]:
rushes = fnt.loc[fnt['play_call']=='rush']
passes = fnt.loc[fnt['play_call']=='pass']

pass_by_team = passes[['offense','year','play_call']].groupby(['offense','year']).count()
rush_by_team = rushes[['offense','year','play_call']].groupby(['offense','year']).count()


rush_perc = rush_by_team/(rush_by_team+pass_by_team)

rush_perc_plot = rush_perc.unstack().droplevel(axis = 1, level = 0).transpose()
rush_perc_plot

offense,Illinois,Indiana,Iowa,Maryland,Michigan,Michigan State,Minnesota,Nebraska,Northwestern,Ohio State,Penn State,Purdue,Rutgers,Wisconsin
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2010,0.820896,0.72043,0.72549,,0.829268,0.809524,0.823529,,0.829268,0.7,0.795455,0.775,,0.830189
2011,0.73913,0.818182,0.702703,,0.765625,0.805556,0.791667,0.883721,0.842105,0.785714,0.648148,0.796117,,0.734694
2012,0.783784,0.702128,0.752809,,0.762712,0.705357,0.855072,0.855263,0.607843,0.741379,0.75,0.684211,,0.876923
2013,0.727273,0.705882,0.711111,,0.803571,0.770833,0.824561,0.813559,0.927273,0.810345,0.793651,0.828571,,0.787234
2014,0.434783,0.644444,0.418919,0.512195,0.450704,0.571429,0.75,0.566667,0.417582,0.515152,0.418605,0.57971,0.631579,0.57971
2015,0.394366,0.588235,0.771429,0.630137,0.516129,0.666667,0.650485,0.592105,0.653846,0.575758,0.45614,0.518987,0.684211,0.55814
2016,0.608696,0.638095,0.686567,0.651163,0.675676,0.606061,0.538462,0.746667,0.463415,0.538462,0.554054,0.461538,0.69697,0.719298
2017,0.539474,0.380952,0.568421,0.511111,0.5,0.515625,0.622951,0.607143,0.422222,0.630435,0.565217,0.366071,0.606557,0.622222
2018,0.611111,0.541284,0.596774,0.8,0.642857,0.45614,0.515152,0.527473,0.434783,0.520408,0.365079,0.345679,0.6,0.8
2019,0.755556,0.397436,0.452055,0.57971,0.507692,0.492063,0.612245,0.587302,0.666667,0.466667,0.543478,0.364706,0.71875,0.578947


We would like to the team colors for prettier plotting. Let's get that now.

In [268]:
response = requests.get("https://api.collegefootballdata.com/teams")
df = pd.io.json.json_normalize(response.json())
df = df[~df['conference'].isnull()]

teams = df[['school','conference','color']]

response = requests.get("https://api.collegefootballdata.com/conferences")
df = pd.io.json.json_normalize(response.json())
df = df[df['abbreviation']==conf]

conference = df['name'].values[0]
conf_teams = teams.loc[teams['conference']==conference]

conf_dict = dict(zip(conf_teams.school,conf_teams.color))
conf_dict

{'Illinois': '#f77329',
 'Indiana': '#7D110C',
 'Iowa': '#000000',
 'Maryland': '#D5002B',
 'Michigan': '#00274c',
 'Michigan State': '#18453B',
 'Minnesota': '#7F011B',
 'Nebraska': '#F20017',
 'Northwestern': '#372286',
 'Ohio State': '#DE3121',
 'Penn State': '#00265D',
 'Purdue': '#B89D29',
 'Rutgers': '#d21034',
 'Wisconsin': '#A00001'}

Now we can put the dataframe of rushing percentages into an interactive plot, color coded by team!

In [270]:
rush_perc_plot.iplot(kind = 'scatter',mode = 'lines+markers',symbol='triangle-up',color=conf_dict,yaxis_tickformat =".0%",\
                     title = 'Rushing on 1st & 10 When Down by <9', xaxis_title = 'Year', yaxis_title = 'Percentage')