New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
home_team_runs and away_team_runs returned NA #17
Comments
That is to be expected -- these values are missing (in the source files) unless runs are scored during the atbat. I admit this is not the best data format. You probably want the running totals (without NAs). |
Ahh, thanks for the clarification -- I was making a bad assumption about what those fields meant. The commands you provide work for the most part -- looking at a subset of data (WAS/SDN), it appears it gets the home values correct (here, it's 0 for the entire game), but then it starts adding 1's after a point. I am looking to see if there's a particular reason why it changes over, but haven't found a trend yet. Thanks again for your help! |
Here is a method to convert home_team_runs/away_team_runs to the equivalent numeric representation. library(pitchRx)
june8 <- scrape(start = "2014-06-08", end = "2014-06-08")
atbats <- june8$atbat
library(dplyr)
# make sure records are ordered by num (within game)
atbats <- split(atbats, atbats$gameday_link) %>%
lapply(., function(x) x[order(x$num), ]) %>%
rbind_all
# replace missing values with the next non-missing value
f <- function(runs) {
runs <- as.numeric(runs)
idx <- which(!is.na(runs))
rep(runs[idx], diff(c(0, idx)))
}
atbats$home_team_runs <- unlist(with(atbats, tapply(home_team_runs, INDEX = gameday_link, f)))
atbats$away_team_runs <- unlist(with(atbats, tapply(away_team_runs, INDEX = gameday_link, f))) |
I am using pitchRx and scrape to look at some data related to what pitches a pitcher uses, given the score of the game. In order to do this, I am looking at the home_team_runs and away_team_runs columns in GameDay data, which pitchRx/scrape provides. However, I am encountering a lot of NA's in my data when the values are actually there, when I search 'home_team_runs' on gd2.mlb.com in the relevant xml file.
Here are my commands:
library(dplyr)
library(pitchRx)
june8 <- scrape(start = "2014-06-08", end = "2014-06-08")
I was mostly interested in the WAS/SDN game, which returned all NA for Jordan Zimmermann; looking at different games on June 8 and also games on different days gives me for the most part the same results -- there are some sporadic entries (see screenshot attached, which are the results of doing a View(june8$atbat))
I am on a OS X 10.9.3, and using pitchRx version 1.5 on R Studio Version 0.98.501.
Happy to pass along any other info you need if I have forgotten anything -- many thanks!
Stuart
The text was updated successfully, but these errors were encountered: