Methods that do not pass strict tests #450

alexpghayes · 2018-07-25T20:54:21Z

thisisnic · 2018-07-26T06:58:22Z

I'd love to have a go at fixing one or two of these as my first contribution!

ellessenne · 2018-07-27T16:09:01Z

Hi @alexpghayes, I can deal with the methods for the mjoint class 😉

alexpghayes · 2018-07-27T16:14:13Z

@thisisnic I'm in the process of adding a couple more tests, and will follow up with you once I've got them written!

@ellessenne Awesome, much appreciated. There isn't much doc for the new test system but if you run devtools::install_github("tidymodels/broom") I believe you should get everything you need. Let me know if you run into any problems!

Thanks to both of you!

thisisnic · 2018-07-31T08:02:38Z

@alexpghayes - I have submitted a PR for an update I made to tidy.survdiff, but I'll hold off any more for the moment until you confirm I've done everything I need to do properly, given this is my first contribution! :)

ellessenne · 2018-07-31T13:16:40Z

@alexpghayes see PR #462

I am writing here as well as it may interesting to a larger audience: any idea on how to deal with column names generated dynamically from model calls (and therefore not included in column_glossary from modeltests, causing the strict = TRUE requirements to fail)?

alexpghayes · 2018-07-31T17:16:06Z

Git

It's better to work on a new branch. This will avoid duplicated commits when I merge. For this PR I'll just squash merge, but in the future using a new branch is always a good idea.

Column names not in glossary

Preferences:

Sometimes the data isn't tidy and the tidier just needs a gather or similar. This is the case for augment.mjoint() (although I suspect we'll need a gather-separate-spread like here to simultaneously tidy the residuals and the fitted values).
Use strict = FALSE with a comment explaining what test fails and why that is okay. This is probably the best for glance.mjoint() at the moment.

library(tidyverse)
library(joineRML)

data(heart.valve)
hvd <- heart.valve[!is.na(heart.valve$log.grad) & !is.na(heart.valve$log.lvmi), ]

set.seed(1)
fit1 <- mjoint(formLongFixed = log.lvmi ~ time + age,
               formLongRandom = ~ time | num,
               formSurv = Surv(fuyrs, status) ~ age,
               data = hvd,
               timeVar = "time",
               control = list(nMCscale = 2, burnin = 5)) # controls for illustration only
au <- augment(fit1)
gather(au, outcome, resid, contains("resid"))

Aside: it may be better to report sigma rather than sigma2 since broom tidiers typically report standard errors/deviations rather than variances.

Documentation

For breaking changes like this it's important to explicitly document all changes in NEWS. It'll probably break some code, and we want to make it clear what people need to do to fix that code.

I'm going to try and get the build working and then I'll take another look at this later today/tomorrow.

alexpghayes · 2018-07-31T17:16:41Z

@thisisnic Will take a look once I get the build going again!

ellessenne · 2018-08-01T09:18:26Z

@alexpghayes thanks for the feedback, I really appreciate it!

Apologies again for that bad pull request - I don't know what I was thinking 😅 I'll close #462 and open a new one properly so you don't have to squash merge.

Regarding the columns names not in the glossary: the input dataset has longitudinal data, which is not in tidy format if you have multiple longitudinal outcomes (generally).
A standard (I think, this is just my anecdotal experience) way of storing longitudinal data is as follows:

      id  time      y1    y2
   <int> <dbl>   <dbl> <int>
 1     1 0.900 -0.741      0
 2     1 1.33   0.115      2
 3     1 2.03   0.0974     5
 4     1 4.76   0.489      3
 5     1 5.42  -0.467      8
 6     1 7.94   0.387      2
 7     2 0.697 -1.50       3
 8     2 1.33  -0.498      3
 9     2 2.30  -0.237      4
10     2 3.66  -0.294      7
11     2 6.41   1.69       3
12     2 6.43   0.0995     2
13     3 1.04  -0.325      1
14     3 2.84   1.08       2
15     3 5.15  -0.521      1

In this example, y1 and y2 are two separate longitudinal outcomes recorded at each time. joineRML works with this kind of input data, hence returning a "strictly" tidy data frame would not return the same input data the user expects. I like the gather-separate-spread approach and I can implement it directly, do you think it would not be confusing for the user to have a "strictly" tidy data frame out of augment.mjoint?

Finally, glance.mjoint: I agree with all that you mentioned - I will make the changes you suggested!

alexpghayes · 2018-08-01T16:52:34Z

Not a problem in the slightest! I'm really grateful for all the effort you're putting into contributing!

For augment.mjoint() let's go with strictly tidy data. This may be slightly surprising to users, but if we document it carefully it'll be more consistent with the rest of broom. I think it's better for users to spread tidier output rather than gather it, and there are a number of places across broom where I want to make similar changes.

michaelpawlus · 2018-08-30T16:43:58Z

Hi Alex. Thanks for making some of these beginner friendly issues and being willing to help those of us who want to get started contributing to open source projects. I'd be happy to try to work on tidy.map as my first contribution.

jmuhlenkamp · 2018-11-04T01:34:14Z

@alexpghayes I'm interested in helping. Dug around in some of the code and found the first one I'd like to try (possibly more after this). Can you confirm my strategy here makes sense before I open a PR? And anything else I should consider beforehand?

tidy.ridgelm

This strict test fails because of the xm column on the output (only when a single lamba is used). Furthermore, the columns are inconsistent between when a user feeds in a sequence of lambas or not (GCV is not returned in the single lambda case).

library(broom)
names(longley)[1] <- "y"
fit1 <- MASS::lm.ridge(y ~ ., longley)
colnames(tidy(fit1))
## [1] "lambda"   "term"     "estimate" "scale"    "xm" 

fit2 <- MASS::lm.ridge(y ~ ., longley, lambda = seq(0.001, .05, .001))
colnames(tidy(fit2))
## [1] "lambda"   "GCV"      "term"     "estimate" "scale"

My proposed strategy:

For the first case above, remove xm column from output and add GCV column. This will make the output columns match the second case and also match the documentation.
Update the test to remove strict = FALSE
Update the NEWS.md

Should I also open a separate issue as a paper trail that I can reference within the NEWS? A la the "- Bug fix for tidy.polr for incorrectly using colnames. (#498)" currently in the dev NEWS?

Let me know what you think and I'd be happy to open issue and/or PR asap. And I'm likely interested in tackling more of these as well in the coming days/weeks.

jmuhlenkamp · 2018-11-20T02:33:49Z

Can ignore my previous comment, I went ahead and opened a PR #533 and an issue to explain it (which is linked into the NEWS) #532.

ghost · 2018-11-28T23:11:04Z

@alexpghayes I'm also interested in helping
I thought about tidy.density but I don't know where to start, I can't find the tidy() function in broom, then the modeltests package seems to be still not available and it seems a problem in the column_glossary with the missing method

karissawhiting · 2018-12-06T18:47:48Z

Hi @alexpghayes. I would like to help with some of the tidy.htest strict = TRUE issues.

For tidy.htest/oneway.test in tests/testthat/test-stats-htest.R the issue seems to be that the output of stats::oneway.test has num df and denom df as parameters, but these are not in the column glossary.

Also, these column names should probably be edited to be syntactically valid. In R/stats-htest-tidiers.R I see that there is a message indicating that columns will be renamed to num.df and denom.df if (length(x$parameter) > 1) is TRUE, however I don’t think this renaming actually happens as far as I can tell (although I may be missing something).

x <- oneway.test(extra ~ group, data = sleep)
ret <- x[c("estimate", "statistic", "p.value", "parameter")]

# parameter may have multiple values as well, such as oneway.test
if (length(x$parameter) > 1) {
  ret$parameter <- NULL
  if (is.null(names(x$parameter))) {
    warning("Multiple unnamed parameters in hypothesis test; dropping them")
  } else {
    message(
      "Multiple parameters; naming those columns ",
      paste(make.names(names(x$parameter)), collapse = ", ")
    )
    ret <- append(ret, x$parameter, after = 1)
  }
}
#> Multiple parameters; naming those columns num.df, denom.df
names(ret)
#> [1] NA          "num df"    "denom df"  "statistic" "p.value"

If that is in fact the case, I propose the addition of the following to R/stats-htest-tidiers.R :

x <- oneway.test(extra ~ group, data = sleep)
ret <- x[c("estimate", "statistic", "p.value", "parameter")]

# parameter may have multiple values as well, such as oneway.test
if (length(x$parameter) > 1) {
  ret$parameter <- NULL
  if (is.null(names(x$parameter))) {
    warning("Multiple unnamed parameters in hypothesis test; dropping them")
  } else {
    message(
      "Multiple parameters; naming those columns ",
      paste(make.names(names(x$parameter)), collapse = ", ")
    )
    names(x$parameter) <- make.names(names(x$parameter))
    ret <- append(ret, x$parameter, after = 1)
  }
}
#> Multiple parameters; naming those columns num.df, denom.df
names(ret)
#> [1] NA          "num.df"    "denom.df"  "statistic" "p.value"

Also I propose that denom.df be added to the column glossary. num.df is already in the column glossary, so these changes should allow this to pass the strict test.

If this is an acceptable solution, I will create a branch and PR for the change in R/stats-htest-tidiers.R.

Thank you for tagging these as beginner. This is my first contribution, so any feedback is welcome!

alexpghayes · 2018-12-06T19:01:02Z

Apologies for the delays in responding -- school has been busy! Thanks everybody for your interest!

@michaelpawlus If you're still interested, tackling tidy.map() would be awesome.

@datascientistmc tidy.density() might not be the best place to start. I'm don't have a clear vision for what needs to happen there at the moment. I recommend another one of the tidy.* methods instead.

@karissawhiting That sounds great! Would you open a PR and then we can continue discussion there?

Before y'all get started I encourage you to read:

The fantastic advice by @thisisnic on Git / PRs
This vignette on how to contribute to broom

If you run into issues, please open a new issue / PR describing the problem you're tackling and tag me! I'll try to get back to you a bit faster than I have been -- at the moment finals are about to suck up a bunch of my time, but I should be much more responsive over winter break!

thisisnic · 2018-12-11T08:14:34Z

Got some time off around this weekend, and wanted to get through some more of these, so I'll be looking at tidy.survreg next.

thisisnic · 2019-01-16T16:18:18Z

@alexpghayes:

The reason that tidy.manova() fails tests when strict = TRUE is because the output contains a column called either pillai, wilks, hl, or roy, depending on which of these statistics the user has selected to output.

Given that the output is in a tidy format, and these column names are fixed/predictable values, should I submit a PR to modeltests to add these to the column glossary, rather than setting strict = FALSE?

github-actions · 2021-06-17T00:07:55Z

This issue has been automatically closed due to inactivity.

github-actions · 2021-07-02T00:03:53Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

alexpghayes added help wanted consistency/specification beginner-friendly tests labels Jul 25, 2018

thisisnic mentioned this issue Jul 31, 2018

Update tidier for survival::survdiff #461

Closed

jmuhlenkamp mentioned this issue Nov 20, 2018

Columns in tidy.ridgelm are inconsistent and do not pass strict tests #532

Closed

jmuhlenkamp mentioned this issue Nov 28, 2018

Meet specification more fully (i.e. enable more strict tests) #544

Closed

karissawhiting mentioned this issue Dec 7, 2018

Make column names of output of htest-oneway tidier syntactically valid #549

Merged

alexpghayes removed the help wanted label Mar 3, 2019

github-actions bot added the stale issue label Jun 17, 2021

github-actions bot closed this as completed Jun 17, 2021

github-actions bot locked and limited conversation to collaborators Jul 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Methods that do not pass strict tests #450

Methods that do not pass strict tests #450

alexpghayes commented Jul 25, 2018 •

edited

thisisnic commented Jul 26, 2018 •

edited

ellessenne commented Jul 27, 2018

alexpghayes commented Jul 27, 2018

thisisnic commented Jul 31, 2018

ellessenne commented Jul 31, 2018

alexpghayes commented Jul 31, 2018

alexpghayes commented Jul 31, 2018

ellessenne commented Aug 1, 2018

alexpghayes commented Aug 1, 2018

michaelpawlus commented Aug 30, 2018

jmuhlenkamp commented Nov 4, 2018

jmuhlenkamp commented Nov 20, 2018

ghost commented Nov 28, 2018

karissawhiting commented Dec 6, 2018 •

edited

alexpghayes commented Dec 6, 2018

thisisnic commented Dec 11, 2018 •

edited

thisisnic commented Jan 16, 2019

github-actions bot commented Jun 17, 2021

github-actions bot commented Jul 2, 2021

Methods that do not pass strict tests #450

Methods that do not pass strict tests #450

Comments

alexpghayes commented Jul 25, 2018 • edited

thisisnic commented Jul 26, 2018 • edited

ellessenne commented Jul 27, 2018

alexpghayes commented Jul 27, 2018

thisisnic commented Jul 31, 2018

ellessenne commented Jul 31, 2018

alexpghayes commented Jul 31, 2018

alexpghayes commented Jul 31, 2018

ellessenne commented Aug 1, 2018

alexpghayes commented Aug 1, 2018

michaelpawlus commented Aug 30, 2018

jmuhlenkamp commented Nov 4, 2018

jmuhlenkamp commented Nov 20, 2018

ghost commented Nov 28, 2018

karissawhiting commented Dec 6, 2018 • edited

alexpghayes commented Dec 6, 2018

thisisnic commented Dec 11, 2018 • edited

thisisnic commented Jan 16, 2019

github-actions bot commented Jun 17, 2021

github-actions bot commented Jul 2, 2021

alexpghayes commented Jul 25, 2018 •

edited

thisisnic commented Jul 26, 2018 •

edited

karissawhiting commented Dec 6, 2018 •

edited

thisisnic commented Dec 11, 2018 •

edited