/
puntr.Rmd
128 lines (112 loc) · 5.31 KB
/
puntr.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
title: "Getting started"
output:
html_document: default
pdf_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
your_local_path <- '/Users/dennisbrookner/github'
```
## Installation
Install via:
```{r, echo=FALSE, message=FALSE, results='hide'}
install.packages(c("devtools", "tidyverse"), repos = "http://cran.us.r-project.org")
```
```{r, message=FALSE, results='hide'}
devtools::install_github("Puntalytics/puntr")
library(puntr)
library(tidyverse) # always a good idea to do this too
```
## Using `puntr` with NFL data
### Completed seasons
For speed, we've already scraped (using [`nflfastR`](https://www.nflfastr.com/)) and saved punting data for the 1999-2020 seasons. The easiest thing to do is download the `puntr-data` repo [here](https://github.com/Puntalytics/puntr-data/tree/master/data), and then point `puntr::import_punts()` to your local copy of the data. You can also download the data directly each time; this takes around 15 minutes.
Import, clean, and calculate as follows:
```{r imports, message=FALSE}
#punts_raw <- import_punts(1999:2020, local=TRUE, path=your_local_path) # recommended
punts_raw <- import_punts(2018:2020) # This takes ~15 minutes
punts_cleaned <- trust_the_process(punts_raw) # clean
punts <- calculate_all(punts_cleaned) # calculate custom Puntalytics metrics
```
You now have a dataframe `punts` where each row is a punt, and each column is a stat relevant to punting (including our custom metrics).
#### Note about minimum dataframe size
`puntr` calculates stats using 3-year rolling averages, to avoid any artifacts of anomalous seasons. For this reason, `puntr::calculate_all()` requires a dataframe containing at least **3 seasons** and **1000 punts**. Note in the above example that three seasons are used. Note in the below example that three seasons are used for the calculation, after which all but the most recent seasons are filtered out.
### In-progress seasons
The kind folks at [`nflfastR`](https://www.nflfastr.com/) have set up a convenient `SQL`-y way to scrape data for in-progress seasons. Rather than redo all of that work, we'll just share here the code we use for this purpose:
```{r eval=FALSE}
#install.packages("DBI")
#install.packages("RSQLite")
library(DBI)
library(RSQLite)
library(nflfastR)
library(puntr)
update_db()
connection <- dbConnect(SQLite(), "./pbp_db")
pbp <- tbl(connection, "nflfastR_pbp")
punts <- pbp %>% filter(punt_attempt==1) %>%
filter(season %in% 2019:2021)
collect() %>%
trust_the_process() %>%
calculate_all() %>%
filter(season == 2021)
dbDisconnect(connection)
```
### Comparing punters
***Note***: New in `puntr 1.3`, the functions
* `puntr::create_mini()`
* `puntr::create_miniY()`
* `puntr::create_miniG()`
are now deprecated, in favor of
* `puntr::by_punters()`
* `puntr::by_punter_seasons()`
* `puntr::by_punter_games()`
To compare punters, use
```{r}
punters <- by_punters(punts)
```
to get a dataframe where each row is a punter, and each column is an average stat for that punter. The most common standard and Punt Runts stats are included by default, but you can add whatever you like by passing additional arguments to `dplyr::summarize()`. For example:
```{r}
punters_custom <- by_punters(punts, longest_punt = max(GrossYards))
```
Let's take a look at some of the columns in this data frame:
```{r}
punters %>%
arrange(desc(pEPA)) %>%
select(punter_player_name, Gross, Net, pEPA) %>%
rmarkdown::paged_table()
```
To compare punter **seasons**, instead use
```{r}
punter_seasons <- by_punter_seasons(punts)
```
which gives every unique punter season a row.
```{r, echo=FALSE}
punter_seasons %>%
arrange(desc(pEPA)) %>%
select(punter_player_name, season, Gross, Net, pEPA) %>%
rmarkdown::paged_table()
```
And finally, to compare punter **games**, use
```{r}
punter_games <- by_punter_games(punts)
```
which gives every unique punter game a row.
**Note**: If a career, season, or game you're looking for is missing from your dataframe, try changing the `threshold = ` parameter to require fewer punts.
These dataframes - `punts`, `punters`, `punter_seasons` and `punter_games` - should serve as a good starting point for any custom analysis you'd like to do, be that using built-in `puntr` metrics, or your own.
## Using `puntr` with college data
***NOTE: `puntr` was successfully migrated from `cfbscrapR` to `cfbfastR` in version 1.2.2***
***NOTE: The `by_` family of summary functions have not yet been tested for `cfbfastR` data, but might work.***
`puntr` can also handle punting data for college football, piggybacking off of the scraping abilities of the [`cfbfastR`](https://saiemgilani.github.io/cfbfastR/) package. You need at least 3 seasons worth of data to run `calculate_all()`. Import and clean as follows:
```{r message=FALSE, warning=FALSE, eval=FALSE}
college_punts <- import_college_punts(2019:2021) %>% # import (calls cfbfastR behind the scenes)
college_to_pro() %>% # rename columns to those used by nflfastR
calculate_all() # calculate as with NFL data
```
Now we can use the same `create_miniY` as above to compare college punter seasons (`create_mini` and `create_miniG` would also work here, of course.)
```{r, eval=FALSE}
miniY_college <- create_miniY(college_punts)
```
```{r, echo=FALSE, eval=FALSE}
miniY_college <- miniY_college %>% arrange(desc(SHARP_RERUN))
rmarkdown::paged_table(miniY_college)
```