Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculations for delta.pitch and delta.field differ from previous calculations #108

Open
davidbmitchell opened this issue Jul 9, 2017 · 4 comments

Comments

@davidbmitchell
Copy link
Contributor

I compared running makeWAR() on the May data set to the MayProcessed data set and noticed that the delta.field and delta.pitch columns in the New MayProcessed data set differed from the original MayProcessed data set. They actually look transposed which you can see below. I did this using dplyr 0.5.0, but I first noticed it when testing makeWAR() after refactoring for dplyr 0.7.0 .

>NewMayProcessed <- makeWAR(May)
>head(NewMayProcessed$openWARPlays[,c(1:5,16, 19:23)])
  batterId start1B start2B start3B pitcherId                         gameId      delta delta.field delta.pitch    delta.br  delta.bat
1   476704    <NA>    <NA>    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1  0.3789624          NA  0.37896244          NA  0.3789624
2   519083  476704    <NA>    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.2055008 -0.04671768 -0.15878313  0.03238909 -0.2378899
3   452234    <NA>  476704    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.3296470          NA -0.32964703  0.04026076 -0.3699078
4   493316    <NA>  476704    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1  0.2032371  0.11692098  0.08631608 -0.53123407  0.7344711
5   518626  493316    <NA>  476704    450351 gid_2013_05_01_anamlb_oakmlb_1  0.1956572          NA  0.19565721 -0.01790497  0.2135622
6   474384  518626  493316  476704    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.7097701 -0.36090191 -0.34886821  0.01234560 -0.7221157

> head(MayProcessed$openWARPlays[,c(1:5,16, 19:23)])
  batterId start1B start2B start3B pitcherId                         gameId      delta delta.field delta.pitch    delta.br  delta.bat
1   476704    <NA>    <NA>    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1  0.3789624          NA   0.3789624          NA  0.3789624
2   519083  476704    <NA>    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.2055008  -0.1588469  -0.0466539  0.03238909 -0.2378899
3   452234    <NA>  476704    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.3296470          NA  -0.3296470  0.04026076 -0.3699078
4   493316    <NA>  476704    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1  0.2032371   0.1169279   0.0863092 -0.53123407  0.7344711
5   518626  493316    <NA>  476704    450351 gid_2013_05_01_anamlb_oakmlb_1  0.1956572          NA   0.1956572 -0.01790497  0.2135622
6   474384  518626  493316  476704    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.7097701  -0.3487953  -0.3609748  0.01234560 -0.7221157

The original MayProcessed data set was added over 2 years ago, and there have been quite a few changes to makeWAR() since then. I imagine this happened when openWAR and dplyrized. I'm pretty sure it has to do with [Line 140].(https://github.com/beanumber/openWAR/blob/master/R/makeWAR.R#L140)

x$data <- mutate_(x$data, delta.pitch = ~ifelse(is.na(delta.field), delta, delta - delta.field))

So I guess it boils down to which data set is correct? Is it the original MayProcessed data set?

@davidbmitchell
Copy link
Contributor Author

I actually think this occurring in makeWARFielding, specifically Lines 365-366.

delta.field <- with(data, ifelse(endOuts == startOuts, 
                                  delta * p.hat, delta * (1 - p.hat)))

@beanumber
Copy link
Owner

OK, thanks, I will take a look. The dplyr update broke all of my other packages too!

@davidbmitchell
Copy link
Contributor Author

I bet its breaking a lot of packages in the R universe. Honestly, I don't think the tidyeval is all that tidy. It makes things a lot more convoluted, but I'm also just not used to it yet.

@davidbmitchell
Copy link
Contributor Author

I think I found where the change could have occurred. The commit to close issue #92 might have done it. Which makes me believe that the NewMayProcessed above is actually correct, but I'm still not 100% on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants