FinalReport.Rmd

---
title: "Examination of Referee Tendencies and Biases in the National Football League"
author: "Thompson Bliss, Connor Daly, Jacob Klein, Patrick Lewis"
date: "December 2018"
output:
  html_document:
    toc: true
  pdf_document:
    toc: true
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(message = FALSE,
                      warning = FALSE,
                      echo = FALSE)
```

<style>
p.caption {
  font-size: 1em;
  font-family: 'Big Caslon';
}
</style>

<style>
  body {
  text-align: justify}
</style>

## Section 1: Introduction

The National Football League (NFL) is arguably the most popular professional sports association in the United States. The NFL consists of 32 teams, and each team plays 16 games per season (this report does not include playoff games, for which only a select subset of the 32 teams qualify). Every game is divided into four quarters of length 15 minutes (and sometimes a variable length overtime period), and each team's objective is to score more points than its opponent by bringing the ball into the opposing team's endzone or kicking the ball through the goal posts. Seven points are awarded for touchdowns (bringing the ball into the opposing team's endzone), and three points are awarded for field goals. Detailed knowledge of football scoring is not particularly crucial for the purposes of comprehending the following analysis, but it may prove helpful in specific discussions.

Every NFL game is watched by a group of seven officials, which includes one referee, one umpire, and five line judges. These officials are responsible for watching and arbitrating plays, while deeming any illegal plays as penalties. When a penalty is called by an official, the penalty is enforced by moving the penalized team in its adverse direction by a pre-specified number of yards - common penalties are enforced with 5, 10, and 15 yard setbacks.

The purpose of this project is to examine tendencies, biases, and patterns in penalties called by NFL officials. It is important to note that we consider calls made by entire referee crews in this analysis and not just the single head official. However, for simplicity, we label the various referee crews by the names of their head referees.

Our interest in sports, and our interest in the NFL specifically, prompted us to research this topic. From an NFL fan and spectator perspective, an increasing amount of attention has been placed on the calling of penalties over the course of the past five or so years. There is often contention regarding how one should actually define a specific penalty, how to determine if one occurred, and why some teams appear to be penalized significantly more or less often than others. Penalties can dramatically alter the flow and outcomes of games, so ensuring that penalties are called fairly and consistently is crucial to maintaining integrity of the game.

Our group consists of four data science graduate students at Columbia University's Data Science Institute. Below lists who we are, and briefly describes our specific contributions to this project.

Thompson Bliss: Thompson was responsible for aggregating and cleaning all of our various data sources while ensuring proper data quality and structure. Thompson also explored the distribution of penalty types and conducted an analysis of home team penalties versus away team penalties.

Connor Daly: Connor was responsible for the distribution of penalties by quarter and by location on field. Connor also aided in the data preprocessing step by merging our two disparate data sources together to create the structured dataset that we used for analysis.

Jacob Klein: Jacob took full ownership of our interactive visualization component, which intends to highlight the various patterns and tendencies discussed below. Jacob created the graph in D3 and hosted it online, which we were then able to incorporate into this report.

Patrick Lewis: Patrick was responsible for examining penalty distributions both by team and by referee. Patrick identified outliers and anomalies in the dataset, and further investigated by analyzing any oddities that were found, including looking at the distributions by team and by referee for penalties called at home and away.

## Section 2: Description of Data

Data for this analysis was collected from the 2013-2014 NFL season through Week 9 of the 2018-2019 NFL season. It should be noted that this analysis was conducted during the 2018-2019 NFL season. Thus we have included all data available for the current season, but this season was excluded in any analyses in which penalties were compared over entire seasons, since all data for the entire season will not be available until February 2019.

We downloaded play-by-play data from [NFL Savant](http://www.nflsavant.com/about.php?fbclid=IwAR3c8IZngQd0eE_0jp3Z401TdZXSBa8E-qEAX8J7J4uC3--CisVr7ZQfjEQ). From this site, we downloaded CSV files from seasons beginning in 2013 through 2018. These six CSV files begin with "pbp" and are located within the "/data" folder of [this repository](https://github.com/JacobK-/referee-visualizations). We then downloaded referee game data from [Pro-Football-Reference](https://www.pro-football-reference.com/officials/index.htm?fbclid=IwAR387n92do_jrcbwKAjKNR2Y_GSGpcnWMGbCbJ7Pwmx7AdpZQRSbmglfSfU).

As noted in the Introduction, because there are hundreds of referees, linesmen, side judges, and line judges employed by the NFL, for the purposes of this analysis we grouped officials by head referee. We also excluded any referee crews that have not been active in the NFL for a sufficient amount of time (and thus do not yield a sufficient number of data points); all referee crews in our dataset have reffed over 50 games between the 2013 NFL Season through week 9 of the 2018 NFL Season, inclusive.

A CSV file is available in the "/data" subfolder of the repository for each of the 17 referees we included in the analysis. Also included in this subfolder is a file called "abbreviations.csv" which was used when joining the play-by-play data to the referee data, and includes a mapping of team names to team abbreviations. For example, this file maps team name "Arizona Cardinals" to team abbreviation "ARI".


## Section 3: Analysis of Data Quality

As stated above, we have two sources for the data, [NFL Savant](http://www.nflsavant.com/about.php?fbclid=IwAR3c8IZngQd0eE_0jp3Z401TdZXSBa8E-qEAX8J7J4uC3--CisVr7ZQfjEQ) and [Pro-Football-Reference](https://www.pro-football-reference.com/officials/index.htm?fbclid=IwAR387n92do_jrcbwKAjKNR2Y_GSGpcnWMGbCbJ7Pwmx7AdpZQRSbmglfSfU). While Pro-Football-Reference had neither missing values nor any trends suggesting that part of the data could be incorrect, the data from NFL Savant is somewhat problematic, as some values are nonsensical. For example, some entries in the `PenaltyType` column of the dataset hold the following values:

* "YAC 14.  A FLAG WAS THROWN"
* "BUT OFFICIALS HUDDLED AND DID NOT CALL A PENALTY."
* "CENTER-41-J.WINCHESTER"

None of these statements can be considered valid penalty types. After manually checking against the [ESPN](http://www.espn.com/nfl/scoreboard) logged play-by-play data in the games that contained these data values, it appears as though the data on NFL Savant was scraped from ESPN, as all values match, even such nonsensical values. It seems that text which includes certain words such as "Flag" or "Penalty" (as is the case in the three examples above), throws off the scraping algorithm. However, despite these small hiccups, the dataset as a whole is sensical, and we ignore any nonsensical values in our analyses.

Moreover, after the data from Pro-Football-Focus and NFL Savant are merged, there are decent percentages of missing values in the `Referee`, `Home`, and `Away` columns. This is due to the fact that some games in the NFL Savant play-by-play dataset were reffed by referee crews who have not officiated 50 or more games during the 2013 NFL Season through 2018 NFL Season Week 9, which is the range of data in which we are interested. Since NFL Savant data does not include any information on referees, home teams, or away teams, this data must be merged over from Pro-Football-Reference. Although these games will consequently have undefined values for a few columns, we found it ideal to keep the data from such observations so that we could have a more complete analysis of other columns such as `PenaltyType` and `PenaltyYards`.

## Section 4: Main Analysis
### Section 4A:  Data Processing

Many of the penalty types in our dataset are quite similar in nature (e.g. `Offensive Offside` and `Defensive Offside`), so we grouped such instances using universal grouping terms, as shown in the below table. Moreover, certain penalty type groupings accounted for less than 1% of all observations even after the groupings were applied, and such groups were then grouped again into the `Other` category. The table below highlights all groupings we used for our analysis.

| Grouping Term    |  Penalty Types Included |
|------------------|----------------------------------------------------------------------------------------|
| <strong> Delay of Game </strong> | Defensive Delay of Game, Delay of Game, Delay of Kickoff |
| <strong> Illegal Block </strong>| Chop Block, Clipping, Illegal Blindside Block, Illegal Block Above the Waste, Illegal Crackback, Illegal Peelback, Illegal Wedge, Low Block, Offensive Holding |
|<strong> Illegal Formation </strong>| Illegal Formation, Illegal Motion, Illegal Shift |
|<strong> Illegal Tackle </strong>| Face Mask (15 Yards), Horse Collar Tackle, Lowering the Head to Initiate Contact |
|<strong> Illegal Use of Hands </strong>| Illegal Use of Hands
|<strong> Offside </strong>| Defensive Offside, Encroachment, False Start, Neutral Zone Infraction, Offensive Offside, Offside on Free Kick |
|<strong> Pass Interference </strong>| Defensive Holding, Defensive Pass Interference, Illegal Contact, Offensive Pass Interference |
|<strong> Roughing a Protected Player </strong>| Roughing the Kicker, Roughing the Passer, Running Into the Kicker |
|<strong> Too Many Men on the Field </strong>| Defensive 12 On-Field, Defensive Too Many Men on the Field, Illegal Substitution, Offensive 12 On-Field, Offensive Too Many Men on Field |
|<strong> Unsportsmanlike Conduct </strong>| Disqualification, Personal Foul, Taunting, Unnecessary Roughness, Unsportsmanlike Conduct |
| <strong> Other </strong> | <strong>Fair Catch Interference:</strong> Fair Catch Interference, Interference with Opportunity to Catch, Kick Catch Interference<br><br><strong>Illegal Action to Block a Field Goal:</strong> Leaping, Leverage<br><br><strong>Illegal Bat:</strong> Illegal Bat<br><br><strong>Illegal Forward Pass:</strong> Illegal Forward Pass<br><br><strong>Illegal Kickoff:</strong> Kickoff Out of Bounds, Short Free Kick<br><br><strong>Illegal Player Out of Bounds:</strong> Illegal Touch Kick, Illegal Touch Pass, Player Out of Bounds on Kick, Player Out of Bounds on Punt <br><br><strong>Ineligible Player Downfield:</strong> Ineligible Downfield Kick, Ineligible Downfield Pass<br><br><strong>Intentional Grounding:</strong> Intentional Grounding<br><br><strong>Invalid Fair Catch Signal:</strong> Invalid Fair Catch Signal<br><br><strong>Tripping:</strong> Tripping |
<br>
Additionally, due to some teams moving cities as well as general inconsistency of abbreviations across the data, the following abbreviations were updated to ensure consistency in the dataset:

| Team Name |  Old Abbreviation   |  New Abbreviation |
|-----------|----------------------|-------------------|
| Chargers | SD | LAC | 
| Jaguars| JAC | JAX |
| Rams | LA  | LAR |
| Rams | STL | LAR |

###Section 4B: Analysis by Penalty Type

Weekly viewers of NFL football are sometimes confused about why certain types of calls (i.e. penalties) occur more frequently in one week versus another. This inconsistency certainly could be a function of luck combined with sloppiness by the players, but it also could have to do with biases by referee crews towards calling or not calling certain penalty types. Perhaps some referee crews will throw the flag (i.e. call a penalty) for any action nearing an illegal tackle while others may be compelled to let questionable actions slide and go uncalled. After averaging all of the penalty types called for all of the data, the following bar graph can be constructed:

```{r}

#Defining Global Libraries and Actions

library(dplyr)
library(ggplot2)
library(scales)
library(gdata)
library(tidyr)
library(kfigr)
library(ggrepel)

refData = read.csv('./data/df_ref.csv')

theme_ref = theme(axis.title.x = element_text(size = 16),axis.title.y = element_text(size = 16), plot.title = element_text(hjust = 0.5, size = 18), panel.background = element_rect(fill = "white", colour = "white", size = 0.5, linetype = "solid"),  panel.grid.major = element_line(size = 0.5, linetype = 'solid', colour = "gray90"), panel.grid.minor = element_line(size = 0.25, linetype = 'solid', colour = "gray90"))


```

```{r anchor='Figure', fig.height = 6, fig.cap="Figure 1: Average Number of Each Penalty Type Called Per Game", fig.align= 'center'}

#Putting Together Data


df_ref <- refData

#Calculating Average ref data
totalNumberOfGames = nlevels(as.factor(paste(df_ref$GameDate, df_ref$Home)))

#Calculating number of games each ref worked
totalNumberOfGamesByRef <- unique(df_ref[c("Referee", "GameDate")]) %>% group_by(Referee) %>% summarize(numReffed = n())


#Adding the total game numbers to the df_ref
df_ref = merge(df_ref, totalNumberOfGamesByRef, by = "Referee")


#Getting average penalty types called for all refs
averagePenaltyTypeCalled = table(df_ref$PenaltyType) / totalNumberOfGames
df_averagePenaltyTypeCalled = as.data.frame(averagePenaltyTypeCalled)
colnames(df_averagePenaltyTypeCalled) = c("PenaltyType", "FreqTotal")


#Getting Average called by refs
PenTypeCounts <- df_ref %>% group_by(Referee, PenaltyType) %>% 
    summarize(Freq = n() / numReffed[1])

#Removing NA/Other Categories
PenTypeCounts <- PenTypeCounts[!is.na(PenTypeCounts$Referee),]
PenTypeCounts <- PenTypeCounts[PenTypeCounts$PenaltyType != "OTHER",]


#Changing Data sorted by ref to diversion from average of all refs
for (penType in levels(df_ref$PenaltyType)){
  PenTypeCounts[PenTypeCounts$PenaltyType == penType,]$Freq = 
 1  - PenTypeCounts[PenTypeCounts$PenaltyType == penType,]$Freq/as.double(averagePenaltyTypeCalled[penType])
  
}

PenTypeCounts <- merge(PenTypeCounts, df_averagePenaltyTypeCalled, by = "PenaltyType")

ggplot(df_averagePenaltyTypeCalled, aes(x = reorder(PenaltyType, - FreqTotal), y = FreqTotal)) + geom_bar(stat = "identity", color = "black", fill = "red", alpha = 0.6) + theme(plot.title = element_text(hjust = 0.5), axis.text.x  = element_text(angle=45, hjust = 1)) + xlab("Penalty Type") + ylab("Average Number Called Per Game") + ggtitle("Average Number of Each Penalty Type Called Per Game") + theme_ref


```

Figure 1 shows how the average referee will make penalty calls in a game. `Offside` and `Illegal Block` are the most common penalties. This commonality is not suprising as running across the line of scrimmage and blocking are two fundamental aspects of the game and thus are bound to be done incorrectly sometimes. Moreover, it is also not suprising that particularly sloppy penalties such as `Too Many Men on the Field`, `Illegal Formation`, and `Delay of Game` are rare. Moreover, not every referee crew calls penalties at the exact rates as highlighted in Figure 1. Figure 2 below shows the percentages of increases or decreases from these averages by referee crew.


```{r penaltyTypeIncrease, anchor='Figure', fig.width = 8, fig.height = 17, fig.cap="Figure 2: Percentage of Increase/Decrease from Average Penalty Type", fig.align= 'center'}
#Setting Color Scheme
PenTypeCounts$FreqType = PenTypeCounts$Freq < 0

#Plotting Data Faceted by Ref
ggplot(PenTypeCounts, aes(x=reorder(PenaltyType, FreqTotal), y=Freq, fill=FreqType))+
  geom_bar(stat="identity",position="identity")+ ggtitle("Percentage of Increase/Decrease from Average Penalty Type Called") + scale_fill_manual(values = c("green","red"), guide = FALSE)+ facet_wrap(~Referee, nrow = 9, ncol = 2, scales = 'free_x') + coord_flip() + geom_hline(yintercept=0) + theme(panel.spacing = unit(1, "lines")) +xlab('') + ylab('Percentage of Divergence from Average')  + scale_y_continuous(labels = percent, limits=c(-0.6,0.6)) + theme_ref

keep(refData, theme_ref, sure = TRUE)


```

Figure 2 gives insight on tendencies of NFL referee crews towards calling or not calling certain types of penalties. For example, the crew of Bill Vinovich seems to call more penalties than average regardless of type. Additionally, Walt Anderson's crew seems to call less penalties than average regardless of type. Moreover, perhaps if Jerome Boger and his crew are officiating a game, it may be reasonable to expect a highly offensive game as this crew calls less `Pass Interference`, `Illegal Formation` and `Illegal Block` penalties than average and more `Roughing a Protected Player` and `Illegal Tackle` penalties than average.


###Section 4C: Analysis by Away / Home Team

It is common to think that teams are often called for more penalties when they are playing on the road as opposed to playing in their home city. This difference can be a function of player familiarity with the field, miscommunication between members of the away team due to loudness of the home crowd, and/or referee crew reaction to crowd noise encouraging or discouraging certain calls. The percentages of calls against home teams and away teams by penalty type are shown in Figure 3 below:

```{r, anchor='Figure', fig.cap="Figure 3: Percentages of Penalties Called Against the Home/Away Teams by Penalty Type", fig.align= 'center', fig.width = 8}

#Reading in data
df_ref <- refData

#Calculating penalties against away team
df_ref$AgainstAway = df_ref$PenaltyTeam == df_ref$Away

#Removing rows with NA values for against away
df_ref = df_ref[!is.na(df_ref$AgainstAway),]

#making tables by penalty type
tableAgainstAway = table(df_ref[df_ref$AgainstAway,]$PenaltyType)
tableAgainstHome = table(df_ref[!df_ref$AgainstAway,]$PenaltyType)

#getting df of percentage of away vs home
df_percentage = as.data.frame(tableAgainstAway / (tableAgainstAway + tableAgainstHome))
df_percentage$against = "Away"
df_percentage2 = as.data.frame(tableAgainstHome / (tableAgainstAway + tableAgainstHome))
df_percentage2$against = "Home"
df_percentage = rbind(df_percentage, df_percentage2)

df_percentage$plotOrder = df_percentage[df_percentage$against == "Away",]$Freq

#plot
ggplot(df_percentage, aes(x = reorder(Var1, - plotOrder), y = Freq, color = against)) + geom_hline(yintercept=0.5) + geom_point(stat = "identity", size = 3) + xlab("Penalty Type") + ggtitle("% of Penalties Called Against the Home/Away Teams by Penalty Type") + ylab("Percentage of Penalties Against") + scale_y_continuous(labels = percent, limits=c(0.4,0.6)) + labs(color = "Perentage Called Against") +coord_flip() + theme_ref

```

It is not at all suprising that Figure 3 shows that a timing penalty (i.e. `Delay of Game`) is the most unfavorable towards the away team, since home crowds often increase their noise levels in an attempt to cause such penalties on the away team. However, the higher degree of `Illegal Tackle` penalties that we observe is decently suprising. An illegal tackle penalty is not based on familiarity with field or pre-snap timing which suggests that it may be a function of referee crew responses to crowd noise. Moreover, the percentages of penalties called against home teams and away teams by referee crew is shown in Figure 4 below:

```{r ref_home_away_1, anchor='Figure', fig.cap="Figure 4: Percentages of Penalties Called Against Home and Away Teams by Referee", fig.align= 'center', fig.height= 7, fig.width = 7.75}

rm(df_percentage)
rm(df_percentage2)

#making tables by referee
tableAgainstAway = table(df_ref[df_ref$AgainstAway,]$Referee)
tableAgainstHome = table(df_ref[!df_ref$AgainstAway,]$Referee)

#getting df of percentage of away vs home
df_percentage = as.data.frame(tableAgainstAway / (tableAgainstAway + tableAgainstHome))
df_percentage$against = "Away"
df_percentage2 = as.data.frame(tableAgainstHome / (tableAgainstAway + tableAgainstHome))
df_percentage2$against = "Home"
df_percentage = rbind(df_percentage, df_percentage2)

df_percentage$Var1 = as.character(df_percentage$Var1)

#Adding average data to chart
df_percentage = rbind(df_percentage, c("AVERAGE", sum(df_ref$AgainstAway) / length(df_ref[,1]), "Away" ))

df_percentage = rbind(df_percentage, c("AVERAGE", 1 - sum(df_ref$AgainstAway) / length(df_ref[,1]), "Home"))


#Resetting column types
df_percentage$Var1 = as.factor(df_percentage$Var1)
df_percentage$Freq = as.double(df_percentage$Freq)
df_percentage$against = as.factor(df_percentage$against)


#creating order for plot
df_percentage = df_percentage[order(df_percentage$against),]
df_percentage$plotOrder = df_percentage[df_percentage$against == "Away",]$Freq

df_percentage$avg = FALSE
df_percentage[df_percentage$Var1 == "AVERAGE",]$avg = TRUE

#plot
ggplot(df_percentage, aes(x = reorder(Var1, -plotOrder), y = Freq, color = against)) + geom_hline(yintercept=0.5)  + geom_point(stat = "identity", size = 3, aes(shape = avg)) + xlab("Referee Crew") + ylab("Percentage of Penalties Against") + ggtitle("% of Penalties Called Against Home/Away Teams by Referee") + scale_y_continuous(labels = percent, limits=c(0.4,0.6)) + labs(color = "Percentage Called Against") +coord_flip() + guides(shape = FALSE) + theme_ref

keep(refData, theme_ref, sure = TRUE)
```

Figure 4 shows that nearly every referee crew calls more penalties against the away team as compared to the home team. The crew led by John Hussey is the only exception and is just below the 50% mark. Pete Morelli's crew has the highest percentage of penalty calls against the away team at nearly 55%. However, as a whole all of the referee crews call home and away penalties at rates around 50%, which suggests that penalty calling is quite fair even though we would expect a slight trend toward more penalties called against away teams because of the aforementioned points.

###Section 4D: Analysis by Quarter

We continue our exploratory data analysis by examining the distribution of penalties per game by quarter. Figure 5 shows this distribution at an aggregate level:
```{r penalties_by_q, anchor='Figure', fig.cap="Figure 5: Penalties Per Game by Quarter", fig.align= 'center'}
df2 <- refData

df_by_q <- df2[c(1,27,28,2)]
df_by_q <- df_by_q %>%
  mutate(gameID = paste(GameDate, Home, Away))
reg_len <- n_distinct(df_by_q$gameID)
ot <- df_by_q %>%
  filter(Quarter == 5)
ot_len <- n_distinct(ot$gameID)
df_by_q <- df_by_q %>%
  group_by(Quarter) %>%
  summarise(count = n()) %>%
  mutate(adj_count = ifelse(Quarter < 5, count/reg_len, count/ot_len))
df_by_q$Quarter <- as.factor(df_by_q$Quarter)
levels(df_by_q$Quarter) <- c("Q1", "Q2", "Q3", "Q4", "Overtime")


ggplot(df_by_q, aes(x = Quarter, y = adj_count, width = 0.8)) +
  geom_bar(stat = "identity", color = "black", fill = "red", alpha = 0.6) +
  ggtitle("Penalties Per Game by Quarter") +
  xlab("Quarter") +
  ylab("Count Per Game") +
  theme_ref

```

As seen in Figure 5, most penalties are called in quarters two and four of games. A high frequency of penalties in quarter four is not too surprising, because the game is nearing conclusion at this point and we expect that teams have higher risk tolerances as they attempt to end the games as victors. On the other hand, the fact that quarter two has the highest penalty frequency is somewhat surprising because one would intuitively think that the game progresses fluidly at this point, having already played a full quarter and thus teams have gotten over any jitters or states of hesitancy. It is also surprising that quarter one yields the lowest penalty frequency, but we can pose a couple different hypotheses for this pattern: perhaps pace of play is slower in the first quarter than in the other three, or perhaps there is some inherent bias in referees not wanting to begin games by immediately calling penalties.

Also included in this graph is the penalty frequency for overtime periods. Although it appears that penalties are called less frequently in overtime than in normal quarters, it is important to note that overtime generally does not last the full length of a regular quarter, because it concludes once there is a leading team after each team has had a chance to possess the ball. The game ends in a tie if there is no leader after a full period of overtime (which is equal to the length of a regular quarter), but this occurs quite infrequently. Thus, the frequency of penalties called in overtime seems proportional to that of quarters one through four when we account for duration of play.

### Section 4E: Analysis by Referee by Team

A common question by a fan of the NFL, like any sport, is whether or not a specific referee crew is unfair to particular teams. To analyze this question, we needed to combine two categorical variables, those being the particular referee crew and the particular team that the crew was reffing, in order to visualize the differences in officiating. To do this we completed the following methodology:

1. Calculate the average number of penalties that each referee crew calls on each team
2. Group the averages calculated in step 1 by team
3. Create a distribution for each team
4. Look for "outlier referees" within distribution

Figure 6 shows boxplots to show the different number of penalties each referee crew calls on average against particular teams.

```{r ref_by_team, anchor='Figure', fig.cap="Figure 6: Average Number of Penalties Called by Referee by Team", fig.align= 'center', fig.height = 7}
#looking at the penalties by team by ref
#group by team
penRef<-refData %>%
  group_by(Referee,PenaltyTeam) %>%
  summarize(n=n())
penRef<-penRef[complete.cases(penRef),]

is_outlier <- function(x) {
  return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}

colnames(penRef)<-c("Referee","PenaltyTeam","NPenalties")

#So we have the number of penalties by ref...now we just need the number of games each ref did for each team
#The best way that we can do this is through using two DPYLR statements, one to get unique games by team, then to summarize by ref by team

refDataTemp<-refData
NoGamesRef<- refDataTemp %>%
  filter(!is.na(PenaltyTeam)) %>% 
  group_by(Referee,PenaltyTeam,GameDate) %>%
  summarize(Unique_Elements = n_distinct(GameDate))

NoGamesRef <- NoGamesRef%>%
  group_by(Referee,PenaltyTeam) %>%
  summarize(n=n())

colnames(NoGamesRef)<-c("Referee","PenaltyTeam","NGames")

penRef<-merge(x=penRef,y=NoGamesRef)
penRef$PenPerGame <- penRef$NPenalties/penRef$NGames

#It is easiest to pre-calculate the median and append this to the dataframe for each ref, although this is definitely not following the best practices for data
temP<-penRef[,c(2,5)]

temP<-temP%>%
  group_by(PenaltyTeam)%>%
  dplyr::summarize(Median = median(PenPerGame))

penRef<-merge(x=penRef,y=temP)

#Now the chart
fill <- "#4271AE"
line <- "#1F3552"

ggplot(penRef, aes(x=reorder(PenaltyTeam,Median),y=PenPerGame))+geom_boxplot(fill=fill,colour=line,outlier.colour = "red4", outlier.alpha=1,outlier.shape = 18,alpha=0.5,outlier.size = 2)+coord_flip()+ggtitle("Avg Number of Penalties Called by Referee by Team")+xlab('Team')+ylab('Average Number of Penalties Called by Referee')+theme_ref

```

In Figure 6 we observe the distributions of how teams are officiated on average. We can see that Seattle (`SEA`) has the highest median number of penalties called against them on average, while New England (`NE`) has the lowest. Remember that the outliers (i.e. dots in the above graph) represent particular referee crews that call an abnormal number of penalties against a team. This will become more apparent in Figure 7 below. Note that the variance of the average number of penalties against teams varies greatly. For example, Buffalo (`BUF`) has large variation in the ways in which referees treat them, while it appears that most referees treat Carolina (`CAR`) relatively similarily.

The outliers in Figure 6 are particularly interesting. This is because finding referee crews that act normally is not something notable, but if we are able to identify abnormal patterns in officiating then this may be of significant value. As such, in Figure 7 we narrow down the above boxplots to only particular teams that have outlier referee crews.


```{r ref_tendencies, anchor='Figure', fig.cap="Figure 7: Average Number of Penalties Given by Referee by Team: Outliers", fig.align= 'center'}
#We only want the teams that have outliers from above, as we are most interested in these referees.

BUFPenRef<-penRef[penRef[,1]=='BUF',]
OAKPenRef<-penRef[penRef[,1]=='OAK',]
NYGPenRef<-penRef[penRef[,1]=='NYG',]
DETPenRef<-penRef[penRef[,1]=='DET',]
NEPenRef<-penRef[penRef[,1]=='NE',]
KCPenRef<-penRef[penRef[,1]=='KC',]
ATLPenRef<-penRef[penRef[,1]=='ATL',]
ARIPenRef<-penRef[penRef[,1]=='ARI',]
SEAPenRef<-penRef[penRef[,1]=='SEA',]
MIAPenRef<-penRef[penRef[,1]=='MIA',]
PHIPenRef<-penRef[penRef[,1]=='PHI',]
LARPenRef<-penRef[penRef[,1]=='LAR',]
NOPenRef<-penRef[penRef[,1]=='NO',]

subsetTeams <- rbind(KCPenRef,OAKPenRef)
subsetTeams <- rbind(subsetTeams,NYGPenRef)
subsetTeams <- rbind(subsetTeams,DETPenRef)
subsetTeams <- rbind(subsetTeams,NEPenRef)
subsetTeams <- rbind(subsetTeams,ARIPenRef)
subsetTeams <- rbind(subsetTeams,SEAPenRef)
subsetTeams <- rbind(subsetTeams,ATLPenRef)
subsetTeams <- rbind(subsetTeams,MIAPenRef)
subsetTeams <- rbind(subsetTeams,PHIPenRef)
subsetTeams <- rbind(subsetTeams,LARPenRef)
subsetTeams <- rbind(subsetTeams,NOPenRef)

fill <- "#4271AE"
line <- "#1F3552"

subsetTeams %>%
  group_by(PenaltyTeam) %>%
  mutate(outlier = ifelse(is_outlier(PenPerGame), as.character(Referee),NA)) %>%
  ggplot(., aes(x = factor(PenaltyTeam), y = PenPerGame)) +
    geom_boxplot(fill=fill,colour=line,alpha = 0.5,outlier.colour = "red4", outlier.alpha=1,outlier.shape = 18,outlier.size = 1.5) +
    geom_text_repel(aes(label = outlier), na.rm = TRUE, vjust = 1.4, size=2.5)+xlab('Team')+ylab('Average Penalties Per Game by Referee')+ggtitle('Avg Number of Penalties by Referee by Team: Outliers')+theme_ref

```

It is important to recall that this graph of boxplots is the same as in `r figr('ref_by_team', prefix=TRUE)`, only with the axes switched and the data now subsetted to only include those teams with outliers. All of the outlier referees are labeled. One potential explanation for these deviances could be that particular referee crews are more likely to call specific penalties (for example, pass interference), while some teams are more likely to commit certain penalties. Such cases would indeed create outliers.  Another possible explanation is that there could be a limited number of data points for particular referee/team combinations, but this will be remedied as more data can be added over time.

We see that there are a few referee crews that appear on the graph multiple times. For example, John Hussey's crew shows up three times, appearing to call more penalties on the New York Giants (`NYG`) and Atlanta (`ATL`), while calling less penalties on New England (`NE`). This could mean that the crew is slightly less consistent in its penalty calling than other referee crews.


###Section 4F: Analysis by Yard Line

Next we examine the distribution of penalties that are called by location on the field. Figure 8 below shows this distribution, where a value of zero means the penalty was called as close as possible to the offensive team's own endzone, and a value of 100 means the penalty was called at the goal line of the attacking endzone.
```{r yardlines_1, anchor='Figure', fig.cap="Figure 8: Penalties by Yard Line", fig.width= 8, fig.align= 'center'}
df <- refData

#Get mean and standard deviation for YardLine
m <- mean(df$YardLine)
std_dev <- sd(df$YardLine)

#Histogram of penalty yard lines
ggplot(df, aes(x=YardLine)) +
  geom_histogram(aes(y = ..density..),
                 binwidth = 5,
                 color = "blue", fill = "lightgreen") +
  geom_density(aes(color = "Density Curve - Yard Line Data"), lwd=1.0) +
  stat_function(fun = dnorm, args = list(mean = m, sd = std_dev),
                aes(color = "Density Curve - Normal Distribution"), lwd = 1.0) + theme_ref +
  ggtitle("Penalties by Yard Line") +
  xlab("Yard Line") +
  ylab("Density")+ scale_color_manual(values=c("red", "black")) +
  labs(color = "Color")
```

The black line in Figure 8 above represents the density curve of the data, while the red line is an overlaid normal distribution curve. It is interesting to note that penalty data is skewed right. This means that penalties more frequently occur closer to the offensive team's own endzone than they do closer to the defensive team's endzone. This supports the argument that legitimate penalties may go uncalled when an offensive team gets closer to scoring. To examine this more closely, we can create a Cleveland Dot Plot to see if all referees follow a similar pattern: that is, do all referees tend to call more penalties in the defensive half of the field (yard lines 0-50) rather than in the offensive half of the field (yard lines 50-100)? Figure 9 below yields insight into this question.

```{r ref_yardlines, anchor='Figure', fig.width = 7, fig.cap="Figure 9: Penalty Yard Line Averages by Referee", fig.align= 'center'}
library(tidyverse)
df_subset <- df %>%
  group_by(Referee) %>%
  summarise(mean_yard_line = mean(YardLine)) %>%
  filter(!is.na(Referee))

ggplot(df_subset, aes(x=mean_yard_line,
                      y=fct_reorder(Referee, mean_yard_line))) +
  geom_point(color = "orange",
             size = 4) +
  ggtitle("Penalty Yard Line Averages by Referee") +
  scale_y_discrete(name = 'Referee') +
  scale_x_continuous(name = "Mean Yard Line") +
  theme_ref +
  geom_vline(aes(xintercept = mean(mean_yard_line), color = "Average Yard Line of \nPenalty Calls Across \nAll Referees")) +  labs(color = "Vertical Line") + scale_color_manual(values=c("black"))
```

Figure 9 shows the distribution of average yard lines at which penalties are called by each referee crew. The graph shows that the average yard line of penalties for all referees is between 46 and 49. This finding is consistent with the results from the histogram of `r figr('yardlines_1', prefix=TRUE)`, where we see that penalties are more frequently called at yard lines closer to the offensive team's endzone than to the defensive team's endzone.

Although all referees show the same general pattern in the average area where they call penalties (that being between yard lines 46 and 49), there are three outliers, relatively speaking. In general, the referee crews of Bill Vinovich and Craig Wrolstad call penalties at greater yard lines than does the average crew, while Gene Steratore's referee crew calls penalties at lower yard lines than most.

##Section 5: Interactive Component - Offensive/Defensive Penalties by Area of Field

This component serves to act as an interactive application that users can use to learn about how penalities within the NFL occur at different frequencies depending on field position.

To get started place your mouse on a 10 yard zone of the field. One can also use the toggle below to filter by offensive/defensive penalties.

The entire Interactive Component was coded using Native JavaScript and D3.js v4. This component can also be viewed as a standalone page by navigating to bit.ly/edavnfl.

```{r}
htmltools::includeHTML("./docs/index_report.html")
```


##Section 6: Executive Summary

The frequency, consistency, and possible biases of penalty calls in the NFL are areas of football analytics that have only garnered minimal attention individually, let alone all together. By combining various publicly-available data sources, we created a comprehensive dataset that includes NFL play-by-play game data joined with detailed referee assignments and granular penalty data.

Once we combined the various data sources into a single dataset, we began exploring the data from multiple angles. An examination of the distribution of penalties by quarter produced `r figr('penalties_by_q', prefix = TRUE)`. In this figure, we see slight spikes in the frequency of penalty calls in the second quarter and fourth quarter of games. A high frequency of penalties in the fourth quarter of games intuitively makes sense because the game is nearing conclusion and teams take more risks in an attempt to either keep or overtake the lead. However, it is a surprising result that quarter two yields the highest frequency of penalty calls. During quarter two, an entire quarter has already been completed, which would intuitively suggest that the flow of the game is relatively steady, and that both teams and referees have settled into their respective elements. Additionally, overtime (whose duration on average is somewhere between half and three-quarters of that of a normal quarter) yields a penalty frequency that is seemingly proportional to a normal quarter, given the duration of overtime play.

After some other high-level exploration of penalty distributions, we began comparing penalties called by referee. One of the most important findings of our analysis is conveyed in `r figr('ref_home_away_1', prefix=TRUE)`. This graph shows the percentage of penalties that each referee crew calls against the home team (i.e. the team playing at its home stadium) versus that of the away team (i.e. the team playing on the road). We observe that all referee crews, except for that of John Hussey, call more penalties against the away team than they do against the home team. Although it cannot be stated with certainty whether referee crews are biased in favor of home teams or whether teams legitimately commit penalties at lower rates when they are playing at home, this is certainly a notable insight that prompts discussion and further analysis.

To further explore referee crew tendencies, we examined the number of penalties called per game by each referee crew against each team. In `r figr('ref_tendencies', prefix=TRUE)` we display notable findings from this exploration. This visualization of boxplots only includes teams that have one or more outliers in their distributions. It is interesting to see which referee crews are determined as outliers in the graph, and it is particularly noteworthy when referee crews are deemed outliers in the left tail of the distribution for some team(s), while deemed outliers in the right tail of the distribution for other team(s). For example, on average we see that John Hussey's referee crew calls significantly fewer penalties per game against the New England patriots (`NE`), while the crew calls significantly more penalties per game against the Atlanta Falcons (`ATL`) and the New York Giants (`NYG`).

Although many technological advances are constantly being made in just about all sports (such as the introduction of instant replay), referees are an inherently human part of football. Thus, at the core of the exploratory data analysis conducted in this project lies human behavior. No two humans see things the exact same way, which serves as some justification for the inconsistencies we see in referee penalty calls, despite much effort taken by the NFL to define the constitution of particular penalties in objective terms. Therefore, we must err on the side of caution when identifying anomalies or inconsistencies in our dataset. For example, while there is no denying from `r figr('penaltyTypeIncrease', prefix=TRUE)` that Bill Vinovich's referee crew calls more penalties per game for all penalty types than does the average referee crew, this does not necessarily imply that Bill Vinovich's crew is biased. In fact, we should be more concerned with inconsistencies within a particular referee crew's penalty calls than we should be concerned with inconsistencies between distinct referee crews.

We encourage readers of this report, as well as anyone interested in football analytics or penalties in the NFL, to explore our interactive graph as shown in <a href="#section-5-interactive-component---offensivedefensive-penalties-by-area-of-field">Section 5</a> of this report. This interactive component allows users to get a feel for the types of penalties, and the frequencies at which such penalties occur, at various locations on the football field. This tool offers an engaging way for users to learn about penalty distributions in the NFL, while requiring minimal analytical prowess. We hope this tool, coupled with the analyses and visualizations included in this report, offer insight into NFL penalties for individuals of all backgrounds.


##Section 7: Conclusion & Future Directions

While many unique insights were uncovered through this analysis, there remain a few interesting directions in which we can proceed to perform further exploratory data analysis.

As noted in detail within this report, home teams are penalized significantly less often than are away teams. This is a trend we found consistent across all teams, just about all referees, and across all quarters of games. In each NFL season, there are a few games (generally less than three or four) that are played at neutral fields, either in another country or at a pre-determined destination in the United States (as is the case for the annual Super Bowl championship game). It would be quite interesting to expand our home versus away penalty analysis to games played at neutral locations. However, a limitation with this potential exploration would be that we have a significantly smaller sample size than what was used in this analysis. Moreover, we would need to be careful to filter out games that are technically at a neutral location but where one team still holds a locational advantage (for example when the Oakland Raiders played a "Home Game" against the Houston Texans at the Estadio Azteca in Mexico City which is significantly closer to Houston). In carrying out such an analysis, we suspect that this would yield a much more balanced distribution of penalties called against opposing teams when neither team is playing at its home stadium.

One challenge that we faced in this project was vizualizing data for so many categorical variables. This was a particularly notable challenge in <a href="#section-4e-analysis-by-referee-by-team">Section 4E</a>, where we analyzed penalty calls by referee crew by team. Having 32 teams and 16 referee crews means that visualizing anything beyond a boxplot becomes very difficult (although one thing that we learned through completing our project is just how useful a boxplot can be). One future path that we would like to take would be to create additional dynamic visualizations for our data. Allowing users to slice the data on different variables would allow us to show more, without overloading the report with graphs. Alternatively, instead of looking at all teams and all referee crews, in future work we could narrow our analysis to look at specific teams or specific referee crews. This would allow us to focus our analysis on areas in which we are particularly interested.

This project also taught us how long preprocessing data can take, particularly when merging multiple data sources together. Merging all of our data together took considerable effort, not to mention the data cleaning that needed to be done on each of the datasets after merging. One trick that we used in merging the data was to create unique variables that served as unique identifiers, on which we joined the datasets. The creation of a 'merge-id' may lead to slightly more work in going through the data and figuring out which fields one needs to keep or to delete, but it saves an incredible amount of time in the actual merge. With this knowledge in hand, we feel much more prepared to complete similar preprocessing tasks in the future. 

A final challenge that we came across is the lack of long-term data. As mentioned above, having more years of data would allow us to complete a more in depth analysis, particularly on the NFL playoffs and the Super Bowl, which were games we did not include in our dataset for this report.

A final future deliverable that we would like to conduct is an analysis of the number of penalties called at different times in NFL games based on current score. Essentially, we hope to be able to answer whether or not more penalties are called when the scores of two teams playing in a game are close. Unfortunately our dataset did not contain scores at various points in the games, so we would have to look elsewhere for granular data in order to complete such an analysis.