In [12]:
import pandas as pd
import numpy as np
import nfl_data_py as nfl

In [2]:
%load_ext rpy2.ipython

In [13]:
%%R
library(tidyverse)
library(nflfastR)
library(ggthemes)

###  Load the 2021 play-by-play (pbp) data 

In [4]:
%%R
pbp_r <- load_pbp(2021)

Note: nflreadr caches (i.e., stores a saved version) data by default.
If you expect different output try one of the following:
ℹ Restart your R Session or
ℹ Run `nflreadr::.clear_cache()`.
This message is displayed once every 8 hours.


In [8]:
# python
pbp_py = nfl.import_pbp_data([2021])

2021 done.
Downcasting floats.


### Filter the dataset to include only passing plays with recorded air yards (i.e., non-missing values).

In [6]:
%%R
pbp_r_p <- pbp_r |> 
  filter(play_type == "pass" & !is.na(air_yards))

In [10]:
# pthon
filter_crit = 'play_type == "pass" & air_yards.notnull()'
pbp_py_p = (
    pbp_py.query(filter_crit)
    .groupby(["passer_id", "passer"])
    .agg({"air_yards": ["count", "mean"]})
)

### Analyzing Quarterback Performance
We'll now calculate the average depth of target (aDOT)—the mean air yards per pass—for all quarterbacks who attempted at least 100 passes with recorded depth. Since some players share the same name, we’ll group by both passer_id and passer before summarizing the data:

In [7]:
%%R
pbp_r_p |> 
  group_by(passer_id, passer) |> 
  summarize(n = n(), adot = mean(air_yards)) |> 
  filter(n >= 100 & !is.na(passer)) |> 
  arrange(-adot) |> 
  print(n = Inf)

`summarise()` has grouped output by 'passer_id'. You can override using the
`.groups` argument.
# A tibble: 42 × 4
# Groups:   passer_id [42]
   passer_id  passer               n  adot
   <chr>      <chr>            <int> <dbl>
 1 00-0035704 D.Lock             110 10.1 
 2 00-0029263 R.Wilson           400  9.89
 3 00-0036945 J.Fields           268  9.87
 4 00-0034796 L.Jackson          378  9.35
 5 00-0036389 J.Hurts            473  9.19
 6 00-0034855 B.Mayfield         416  8.80
 7 00-0026498 M.Stafford         740  8.51
 8 00-0031503 J.Winston          161  8.32
 9 00-0029604 K.Cousins          556  8.22
10 00-0034857 J.Allen            708  8.22
11 00-0031280 D.Carr             676  8.13
12 00-0031237 T.Bridgewater      426  8.04
13 00-0035228 K.Murray           515  7.97
14 00-0019596 T.Brady            808  7.92
15 00-0036971 T.Lawrence         599  7.91
16 00-0036972 M.Jones            557  7.90
17 00-0033077 D.Prescott         638  7.84
18 00-0036442 J.Burrow           659  7.7

In [11]:
# Python
pbp_py_p.columns = list(map("_".join, pbp_py_p.columns.values))
sort_crit = "air_yards_count > 100"

print(
    pbp_py_p.query(sort_crit)
    .sort_values(by="air_yards_mean", ascending=False)
    .to_string()
)


                             air_yards_count  air_yards_mean
passer_id  passer                                           
00-0035704 D.Lock                        110       10.063637
00-0029263 R.Wilson                      400        9.887500
00-0036945 J.Fields                      268        9.869403
00-0034796 L.Jackson                     378        9.349206
00-0036389 J.Hurts                       473        9.190275
00-0034855 B.Mayfield                    416        8.795673
00-0026498 M.Stafford                    740        8.508108
00-0031503 J.Winston                     161        8.322982
00-0029604 K.Cousins                     556        8.224820
00-0034857 J.Allen                       708        8.224576
00-0031280 D.Carr                        676        8.128698
00-0031237 T.Bridgewater                 426        8.037559
00-0035228 K.Murray                      515        7.965048
00-0019596 T.Brady                       808        7.920792
00-0036971 T.Lawrence   

The resulting aDOT values provide insight into quarterback aggressiveness, measuring how deep they tend to throw the ball. As you review the results, consider whether they align with your expectations. Are certain quarterbacks more aggressive than you thought? Could other metrics complement this analysis?

## Obtaining and Filtering Data

In [None]:
## Python 
seasons = range(2016, 2022 + 1) 
pbp_py = nfl.import_pbp_data(seasons)

2016 done.
2017 done.
2018 done.
2019 done.


In [None]:
%%R
pbp_r <- load_pbp(2016:2022)

To get the subset of data you need for this analysis, filter down to just the passing plays, which can be done with the following code:

In [None]:
## Python 
pbp_py_p = \
    pbp_py\
    .query("play_type == 'pass' & air_yards.notnull()")\
    .reset_index()

In [None]:
## R 
pbp_r_p <    pbp_r |>     filter(play_type == "pass" & !is.na(air_yards)) Here, play_type being equal to p