Skip to content

Commit

Permalink
adding milb functionality to baseballr
Browse files Browse the repository at this point in the history
  • Loading branch information
BillPetti committed Jan 6, 2020
1 parent e25715f commit d15cf39
Show file tree
Hide file tree
Showing 13 changed files with 254 additions and 22 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: baseballr
Title: Functions for acquiring and analyzing baseball data
Version: 0.5.0
Version: 0.6.0
Author: Bill Petti <billpetti@gmail.com>
Authors@R: c(
person("Bill", "Petti", email = "billpetti@gmail.com",
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ export(get_ncaa_baseball_pbp)
export(get_ncaa_baseball_roster)
export(get_ncaa_game_logs)
export(get_ncaa_schedule_info)
export(get_pbp_mlb)
export(get_probables_mlb)
export(get_retrosheet_data)
export(ggspraychart)
Expand Down
2 changes: 1 addition & 1 deletion R/get_game_info_mlb.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' Retrieve additional game information via the MLB api \url{http://statsapi.mlb.com/api/}
#' Retrieve additional game information for major and minor league games via the MLB api \url{http://statsapi.mlb.com/api/}
#'
#' @param game_pk The unique game_pk identifier for the game
#' @importFrom jsonlite fromJSON
Expand Down
23 changes: 20 additions & 3 deletions R/get_game_pks_mlb.R
Original file line number Diff line number Diff line change
@@ -1,18 +1,35 @@
#' Find game_pk values for MLB games via the MLB api \url{http://statsapi.mlb.com/api/}
#' Find game_pk values for professional baseball games (major and minor leagues)
#' via the MLB api \url{http://statsapi.mlb.com/api/}
#'
#' @param date The date for which you want to find game_pk values for MLB games
#' @param level_ids A numeric vector with ids for each level where game_pks are
#' desired. See below for a reference of level ids.
#' @importFrom jsonlite fromJSON
#' @return Returns a data frame that includes game_pk values and additional
#' information for games scheduled or played
#' requested
#' @section Level IDs:
#'
#' The following IDs can be passed to the level_ids argument:
#'
#' 1 = MLB
#' 11 = Triple-A
#' 12 = Doubl-A
#' 13 = Class A Advanced
#' 14 = Class A
#' 15 = Class A Short Season
#' 5442 = Rookie Advanced
#' 16 = Rookie
#' 17 = Winter League
#' @keywords MLB, sabermetrics
#' @export
#'
#' @examples get_game_pks_mlb("2019-04-29")

get_game_pks_mlb <- function(date) {
get_game_pks_mlb <- function(date,
level_ids = c(1)) {

api_call <- paste0("http://statsapi.mlb.com/api/v1/schedule?sportId=1&date=", date)
api_call <- paste0("http://statsapi.mlb.com/api/v1/schedule?sportId=", paste(level_ids, collapse = ','), "&date=", date)

payload <- jsonlite::fromJSON(api_call, flatten = TRUE)

Expand Down
139 changes: 139 additions & 0 deletions R/get_pbp_mlb.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
#' Acquire pitch-by-pitch data for Major and Minor League games via the MLB api \url{http://statsapi.mlb.com/api/}
#'
#' @param game_pk The date for which you want to find game_pk values for MLB games
#' @importFrom jsonlite fromJSON
#' @return Returns a data frame that includes over 100 columns of data provided
#' by the MLB Stats API at a pitch level.
#'
#' Some data will vary depending on the
#' park and the league level, as most sensor data is not availble in
#' minor league parks via this API. Note that the column names have mostly
#' been left as-is and there are likely duplicate columns in terms of the
#' information they provide. I plan to clean the output up down the road, but
#' for now I am leaving the majority as-is.
#'
#' Both major and minor league pitch-by-pitch data can be pulled with this
#' function.
#' @keywords MLB, sabermetrics
#' @export
#'
#' @examples get_pbp_mlb(575156)

get_pbp_mlb <- function(game_pk) {

api_call <- paste0("http://statsapi.mlb.com/api/v1.1/game/", game_pk, "/feed/live")
payload <- jsonlite::fromJSON(api_call, flatten = TRUE)

plays <- payload$liveData$plays$allPlays$playEvents %>% bind_rows()

at_bats <- payload$liveData$plays$allPlays

current <- payload$liveData$plays$currentPlay

game_status <- payload$gameData$status$abstractGameState

home_team <- payload$gameData$teams$home$name

home_level <- payload$gameData$teams$home$sport

home_league <- payload$gameData$teams$home$league

away_team <- payload$gameData$teams$away$name

away_level <- payload$gameData$teams$away$sport

away_league <- payload$gameData$teams$away$league

list_columns <- lapply(at_bats, function(x) class(x)) %>%
dplyr::bind_rows(.id = "variable") %>%
tidyr::gather(key, value, -variable) %>%
dplyr::filter(value == "list") %>%
dplyr::pull(key)

at_bats <- at_bats %>%
dplyr::select(-c(one_of(list_columns)))

pbp <- plays %>%
dplyr::left_join(at_bats, by = c("endTime" = "playEndTime"))

pbp <- pbp %>%
tidyr::fill(atBatIndex:matchup.splits.menOnBase, .direction = "up") %>%
dplyr::mutate(game_pk = game_pk,
game_date = substr(payload$gameData$datetime$dateTime, 1, 10)) %>%
dplyr::select(game_pk, game_date, everything())

pbp <- pbp %>%
dplyr::mutate(matchup.batter.fullName =
factor(matchup.batter.fullName),
matchup.pitcher.fullName =
factor(matchup.pitcher.fullName),
atBatIndex = factor(atBatIndex)
# batted.ball.result = case_when(!result.event %in% c(
# "Single", "Double", "Triple", "Home Run") ~ "Out/Other",
# TRUE ~ result.event),
# batted.ball.result = factor(batted.ball.result,
# levels = c("Single", "Double", "Triple", "Home Run", "Out/Other"))
) %>%
dplyr::mutate(home_team = home_team,
home_level_id = home_level$id,
home_level_name = home_level$name,
home_parentOrg_id = payload$gameData$teams$home$parentOrgId,
home_parentOrg_name = payload$gameData$teams$home$parentOrgName,
home_league_id = home_league$id,
home_league_name = home_league$name,
away_team = away_team,
away_level_id = away_level$id,
away_level_name = away_level$name,
away_parentOrg_id = payload$gameData$teams$away$parentOrgId,
away_parentOrg_name = payload$gameData$teams$away$parentOrgName,
away_league_id = away_league$id,
away_league_name = away_league$name,
batting_team = factor(ifelse(about.halfInning == "bottom",
home_team,
away_team)),
fielding_team = factor(ifelse(about.halfInning == "bottom",
away_team,
home_team)))
pbp <- pbp %>%
dplyr::arrange(desc(atBatIndex), desc(pitchNumber))

pbp <- pbp %>%
dplyr::group_by(atBatIndex) %>%
dplyr::mutate(last.pitch.of.ab =
ifelse(pitchNumber == max(pitchNumber), "true", "false"),
last.pitch.of.ab = factor(last.pitch.of.ab)) %>%
ungroup()

pbp <- dplyr::bind_rows(stats_api_live_empty_df, pbp)

check_home_level <- pbp %>%
dplyr::distinct(home_level_id) %>%
dplyr::pull()

# this will need to be updated in the future to properly estimate X,Z coordinates at the minor league level

# if(check_home_level != 1) {
#
# pbp <- pbp %>%
# dplyr::mutate(pitchData.coordinates.x = -pitchData.coordinates.x,
# pitchData.coordinates.y = -pitchData.coordinates.y)
#
# pbp <- pbp %>%
# dplyr::mutate(pitchData.coordinates.pX_est = predict(x_model, pbp),
# pitchData.coordinates.pZ_est = predict(y_model, pbp))
#
# pbp <- pbp %>%
# dplyr::mutate(pitchData.coordinates.x = -pitchData.coordinates.x,
# pitchData.coordinates.y = -pitchData.coordinates.y)
# }

pbp <- pbp %>%
dplyr::rename(count.balls.start = count.balls.x,
count.strikes.start = count.strikes.x,
count.outs.start = count.outs.x,
count.balls.end = count.balls.y,
count.strikes.end = count.strikes.y,
count.outs.end = count.outs.y)

return(pbp)
}
Binary file added R/sysdata.rda
Binary file not shown.
31 changes: 23 additions & 8 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,41 @@ knitr::opts_chunk$set(

# baseballr

# `baseballr` 0.3.4
**(latest version released 2018-05-29)**
# `baseballr` 0.6.0

`baseballr` is a package written for R focused on baseball analysis. It includes functions for scraping various data from websites, such as FanGraphs.com, Baseball-Reference.com, and baseballsavant.com. It also includes functions for calculating metrics, such as wOBA, FIP, and team-level consistency over custom time frames.
**(latest version released 2020-01-07)**

You can read more about some of the functions and how to use them at its [official site](http://billpetti.github.io/baseballr/) as well as this [Hardball Times article](http://www.hardballtimes.com/developing-the-baseballr-package-for-r/).
`baseballr` is a package written for R focused on baseball analysis. It
includes functions for scraping various data from websites, such as
FanGraphs.com, Baseball-Reference.com, and baseballsavant.com. It also
includes functions for calculating metrics, such as wOBA, FIP, and
team-level consistency over custom time frames.

You can read more about some of the functions and how to use them at its
[official site](http://billpetti.github.io/baseballr/) as well as this
[Hardball Times
article](http://www.hardballtimes.com/developing-the-baseballr-package-for-r/).

## Installation

You can install `baseballr` from github with:

```{r gh-installation, eval = FALSE}
``` r
# install.packages("devtools")
devtools::install_github("BillPetti/baseballr")
```

For experimental functions in development, you can install the [development branch](https://github.com/BillPetti/baseballr/tree/development_branch):

``` r
# install.packages("devtools")
devtools::install_github("BillPetti/baseballr", ref = "development_branch")
```

## Pull Requests

Pull request are welcome, but I cannot guarantee that they will be accepted or accepted quickly. Please make all pull requests to the [development branch](https://github.com/BillPetti/baseballr/tree/development_branch) for review.

## Functionality

The package consists of two main sets of functions: data acquisition and metric calculation.
Expand Down Expand Up @@ -77,11 +95,8 @@ data %>%
head()
```


You can also generate these wOBA-based stats, as well as FIP, for pitchers using the `fip_plus()` function:



```{r}
daily_pitcher_bref("2015-04-05", "2015-04-30") %>%
fip_plus() %>%
Expand Down
14 changes: 10 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ status](https://travis-ci.org/BillPetti/baseballr.svg?branch=master)](https://tr

# baseballr

# `baseballr` 0.5.0
# `baseballr` 0.6.0

**(latest version released 2019-06-25)**
**(latest version released 2020-01-07)**

`baseballr` is a package written for R focused on baseball analysis. It
includes functions for scraping various data from websites, such as
Expand All @@ -33,7 +33,9 @@ You can install `baseballr` from github with:
devtools::install_github("BillPetti/baseballr")
```

For experimental functions in development, you can install the [development branch](https://github.com/BillPetti/baseballr/tree/development_branch):
For experimental functions in development, you can install the
[development
branch](https://github.com/BillPetti/baseballr/tree/development_branch):

``` r
# install.packages("devtools")
Expand All @@ -42,7 +44,11 @@ devtools::install_github("BillPetti/baseballr", ref = "development_branch")

## Pull Requests

Pull request are welcome, but I cannot guarantee that they will be accepted or accepted quickly. Please make all pull requests to the [development branch](https://github.com/BillPetti/baseballr/tree/development_branch) for review.
Pull request are welcome, but I cannot guarantee that they will be
accepted or accepted quickly. Please make all pull requests to the
[development
branch](https://github.com/BillPetti/baseballr/tree/development_branch)
for review.

## Functionality

Expand Down
Binary file added data/stats_api_live_empty_df.rda
Binary file not shown.
Binary file added data/teams_lu_table.rda
Binary file not shown.
4 changes: 2 additions & 2 deletions man/get_game_info_mlb.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 24 additions & 3 deletions man/get_game_pks_mlb.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

33 changes: 33 additions & 0 deletions man/get_pbp_mlb.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit d15cf39

Please sign in to comment.