diff --git a/DESCRIPTION b/DESCRIPTION index 2f783a6..2e80fe2 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,8 +1,9 @@ Package: arsenal Title: An Arsenal of 'R' Functions for Large-Scale Statistical Summaries -Version: 0.1.2 -Date: 2016-12-29 +Version: 0.2.0 +Date: 2017-01-31 +URL: https://cran.r-project.org/package=arsenal Authors@R: c( person("Ethan", "Heinzen", email = "heinzen.ethan@mayo.edu", role = c("aut", "cre")), person("Jason", "Sinnwell", role="aut"), @@ -10,6 +11,8 @@ Authors@R: c( person("Tina", "Gunderson", role="aut"), person("Gregory", "Dougherty", role="aut"), person("Patrick", "Votruba", role="ctb"), + person("Ryan", "Lennon", role="ctb"), + person("Krista", "Goergen", role="ctb"), person("Emily", "Lundt", role="ctb") ) Description: An Arsenal of 'R' functions for large-scale statistical summaries, @@ -20,22 +23,24 @@ Description: An Arsenal of 'R' functions for large-scale statistical summaries, variable; modelsum(), which performs simple model fits on the same endpoint for many variables (univariate or adjusted for standard covariates); and freqlist(), a powerful frequency table across many categorical variables. -Suggests: knitr, rmarkdown, xtable, survival, testthat, coin, pROC, - MASS, gam, rpart +Suggests: knitr, rmarkdown, xtable, pander, survival, testthat, coin, + pROC, MASS, gam, rpart Depends: R (>= 3.2.0), stats (>= 3.2.0) Imports: broom, stringr VignetteBuilder: knitr License: GPL (>= 2) RoxygenNote: 5.0.1 NeedsCompilation: no -Packaged: 2016-12-29 15:03:54 UTC; m144326 +Packaged: 2017-01-31 23:21:39 UTC; m144326 Author: Ethan Heinzen [aut, cre], Jason Sinnwell [aut], Elizabeth Atkinson [aut], Tina Gunderson [aut], Gregory Dougherty [aut], Patrick Votruba [ctb], + Ryan Lennon [ctb], + Krista Goergen [ctb], Emily Lundt [ctb] Maintainer: Ethan Heinzen Repository: CRAN -Date/Publication: 2016-12-30 01:18:42 +Date/Publication: 2017-02-01 00:41:55 diff --git a/MD5 b/MD5 index cbe9746..a030b39 100644 --- a/MD5 +++ b/MD5 @@ -1,43 +1,46 @@ -966d309a0df5d55be152e36454793ba7 *DESCRIPTION -3f62d030021234740076da4d1ba8f002 *NAMESPACE -a5124b7fb1a608cf22414d0d439c5893 *NEWS.md -babe9da5c34ab0be1462ebb582b67f01 *R/arsenal.R +64ca95954a157e40c441927a12517109 *DESCRIPTION +163a641c4257c9de5b13411233a96e6d *NAMESPACE +ed12d43b046ac8b13ad042d42dd7dcf4 *NEWS.md +9e5c80edccdf89f7401dc898be315b6a *R/arsenal.R 79e2033244f329fa85d87cbc4013f678 *R/formulize.R -cfa9ea4571efbb3e9de3bc79dcc56041 *R/freqlist.R +9b361e8234e2f7dbdb5bf8abad07e12a *R/freqlist.R e42ce669505128b0f0b20ac6c621e91d *R/freqlist.internal.R -02a7e6a76fe38503ed55d6d0da6e12dd *R/internal.functions.R +75cce2779e3a321a78e8cd56f6e42b6a *R/internal.functions.R 2f474534e9fc946996d331ad8103b5ea *R/magic8.R e54a23c0326058d798de6b3c912a8003 *R/mdy.Date.R 1de6a042544ab82b1d3e5d5264e3ffe9 *R/mockstudy.R -8a60fb9fd3e6b75451fd52a671343fdf *R/modelsum.R +db9d7550daea1f2251908395723b6606 *R/modelsum.R 8972abaedd55f0c7f1418ad70f233302 *R/modelsum.control.R 8c879e7ce937bdbedb5fd3fb31a11d3f *R/modelsum.internal.R 0e56f25775a178e261295319e70889bd *R/not.in.R 50cd5149107450c83f3e3f855aeb57ad *R/release_questions.R 915d58ab60615cd9c66bfc92131cfba1 *R/summary.freqlist.R -1dff52d601070cac366b7123de8f8680 *R/summary.modelsum.R +e3eaad9fb39c372f2ed4aeaddf303289 *R/summary.modelsum.R 318e7c0c95e3737f5c209a9ba9ebea7b *R/summary.tableby.R -918558ed28af0fd4f0338c9d9787b926 *R/tableby.R +5df40ef875f0a8d2d7ffdca177c051d5 *R/tableby.R 6cd26e41a5b57be0697d1e98594fb16d *R/tableby.control.R 4b5df64d48a8f090b29f7eb55008b5db *R/tableby.internal.R 9bb29fd9186f2e7c11351693f70aa2e8 *R/tableby.stat.tests.R d795cb92f9d3e5ec846b31c0579badd3 *R/tableby.stats.R -872993fdbe93e6707fae16f6e0a0c99f *R/write2.R -018618b1d1fb15a1945e7892e2868204 *README.md -4aafc7bc18223c036874a46def9050bc *build/vignette.rds +56b73ad8b0d8ae2da6a8cd8e0934749e *R/write2.R +9978346e6d56e7fcdf127d79f01a7649 *README.md +e91dac73925af6b800aafec7b3780ab6 *build/vignette.rds 30cd2d9369ee4d94dd76dcbf5a215559 *data/mockstudy.RData -79433ae6628afba73b4751b5e6bbb32e *inst/doc/freqlist.R -b90b634df89f2a4dc597b834d4a0457d *inst/doc/freqlist.Rmd -197db4ba70c32f0a3579ba56e71b8c7b *inst/doc/freqlist.pdf -2be512c16c57f0c19d46a03559fa66ed *inst/doc/modelsum.R -cfe0b0bd54ea0d5dd64b179d025a5621 *inst/doc/modelsum.Rmd -85570c3c621adb383c345f2e09bd5a1e *inst/doc/modelsum.html -48ac3590253a78deafa97b9c8a30ba4f *inst/doc/tableby.R -0f686caa3862dd071b6538bd17b5d0c8 *inst/doc/tableby.Rmd -78a397ba49bf53164ac6bf6eb5278c0b *inst/doc/tableby.html -fb7f3dd8cf45309fc533d5cb863271b8 *man/arsenal.Rd +6426e345a14c9dec20d62c6456c4e9f0 *inst/doc/freqlist.R +d81e95828e521941e139482c962e9710 *inst/doc/freqlist.Rmd +37d87dba1400ad18ec9abbba8dda1cbf *inst/doc/freqlist.html +92e0994599563842bd6a019ec6ea4102 *inst/doc/modelsum.R +7b5d1fa9bc0c43d57a226738d4b54c7f *inst/doc/modelsum.Rmd +b540f480e5ac4f64684563ef0e97666d *inst/doc/modelsum.html +a68ec9f4260ce8c2c93e5a85f6c673e2 *inst/doc/tableby.R +d19754caf6254260ec847743d8b228db *inst/doc/tableby.Rmd +eb330c6faaa31ef5f91bcf135056faad *inst/doc/tableby.html +2be7ca8e41f22208b476876029e80657 *inst/doc/write2.R +403d798936814c679eeaa22ea6180ff7 *inst/doc/write2.Rmd +2301d10e20887d78abda3bdd2d52d5b3 *inst/doc/write2.html +33f826e2f2c76b4f19e4f032d67fac58 *man/arsenal.Rd a08790dee110e6f91216f1c9f7b9888e *man/as.data.frame.freqlist.Rd -fb7c429f3c12092d0fad2dc88754fd8f *man/as.data.frame.modelsum.Rd +cedee07e345ad606f41d35fb7c5c208c *man/as.data.frame.modelsum.Rd 1326c71d7954bf79cd26895f0aedd30f *man/as.data.frame.tableby.Rd 2133b459a58b59f77b1e835fafc009e6 *man/formulize.Rd c15c3a5f78637e5ac782800687fd2289 *man/freqlist.Rd @@ -45,22 +48,31 @@ b149f67287b2605109b516fbd9ef8f20 *man/freqlist.internal.Rd 5148640645d5b9f0de27a83ba4ed6788 *man/grapes-nin-grapes.Rd 2e1f8c2d941134f65a82f84346247458 *man/mdy.Date.Rd 4cd57758131c3d1dffce65879eb34e76 *man/mockstudy.Rd -fb624ea9b801473dae9d30eefe8a1047 *man/modelsum.Rd +1d876b7eb1a077a5220759b72535f65c *man/modelsum.Rd ee7241120bee38b7c92d9160f86699e0 *man/modelsum.control.Rd 56b926cfc2d81c0539e70ca8a4d68794 *man/modelsum.internal.Rd 562faf3ed641a245644af5887f5704aa *man/summary.freqlist.Rd -1a749d94bb67684290e8d11927700f0d *man/summary.modelsum.Rd +0ab1c3227e5d79e9a4419e0546fbaa77 *man/summary.modelsum.Rd b03d9fe7d9940a052a7b539b58a4f2c5 *man/summary.tableby.Rd -d8894b3a8db813af90570d664d3009dd *man/tableby.Rd +43e9f7d6d6a5c5316f98dc1bdc78e3c1 *man/tableby.Rd e60ebe8ed16f9112a141ec2b24574b10 *man/tableby.control.Rd c00a698159af17705b7306ce44e2070b *man/tableby.internal.Rd a7d8a4fca049fadfeb6be460b35a209f *man/tableby.stats.Rd -6d0a923e8398c923744d26bcbb44ddc3 *man/write2.Rd +fa58b2bb17f00974410a7d06f6b20b83 *man/write2.Rd +013f6556b9cd188c26365af8f42c1ec7 *man/write2specific.Rd b86d96d2b720f7aeeb3b9412b87d67ef *tests/testthat.R -ba8ab9b99798b694a5f5b06b9a28f2d8 *tests/testthat/test_formulize.R -6811553415304e4626ae831d29c1d680 *tests/testthat/test_freqlist.R -d4732617af6ee210bbdc165014b5a4b2 *tests/testthat/test_modelsum.R -357806abad78c1b6ca0c358d6d4d9b4a *tests/testthat/test_tableby.R -b90b634df89f2a4dc597b834d4a0457d *vignettes/freqlist.Rmd -cfe0b0bd54ea0d5dd64b179d025a5621 *vignettes/modelsum.Rmd -0f686caa3862dd071b6538bd17b5d0c8 *vignettes/tableby.Rmd +1c0b875954db84c2384bf6d753d50146 *tests/testthat/test_formulize.R +2943f65b203a330bfde7f22fbd0967d9 *tests/testthat/test_freqlist.R +8016a1934078062f9128e70b1943a8fb *tests/testthat/test_modelsum.R +d37c1af2099852398edf4aa01295a6bf *tests/testthat/test_tableby.R +1c732994556af5cd482c218cf6c039d8 *tests/testthat/test_write2.R +f177f2b74b4f9f751e8edbb6a9fad712 *tests/testthat/write2.freqlist.html +0a7b3e0ee537b49cf7ed5c11b1a4fecd *tests/testthat/write2.kable.html +83bfb15f5b9103e625c16cbccbfad808 *tests/testthat/write2.modelsum.html +61cdf411f2af5411f89903cba990ea9e *tests/testthat/write2.pander.html +f43dc701ee19bf489fa31c12c1f9ffb1 *tests/testthat/write2.tableby.html +f00648a7b68d7c41a326ef44d2129152 *tests/testthat/write2.xtable.html +d81e95828e521941e139482c962e9710 *vignettes/freqlist.Rmd +7b5d1fa9bc0c43d57a226738d4b54c7f *vignettes/modelsum.Rmd +d19754caf6254260ec847743d8b228db *vignettes/tableby.Rmd +403d798936814c679eeaa22ea6180ff7 *vignettes/write2.Rmd diff --git a/NAMESPACE b/NAMESPACE index c522040..ebd0277 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -21,6 +21,13 @@ S3method(summary,freqlist) S3method(summary,modelsum) S3method(summary,tableby) S3method(tests,tableby) +S3method(write2,character) +S3method(write2,default) +S3method(write2,freqlist) +S3method(write2,knitr_kable) +S3method(write2,modelsum) +S3method(write2,tableby) +S3method(write2,xtable) export("%nin%") export("labels<-") export(Date.mdy) @@ -44,6 +51,7 @@ export(q1q3) export(tableby) export(tableby.control) export(tests) +export(write2) export(write2html) export(write2pdf) export(write2word) diff --git a/NEWS.md b/NEWS.md index 74248d6..7f0fa55 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,12 @@ +# arsenal 0.2.0 + +* Vignettes have been updated. + +* `write2()` is now exported and supports all output formats supported by `rmarkdown::render()`. There is now a vignette for it + and the S3 methods have been expanded to handle more inputs, including `knitr::kable()`, `xtable::xtable()`, and `pander::pander_return()`. + +* Fixed a bug in `summary.modelsum()`. + # arsenal 0.1.2 * `broom` and `stringr` have been moved to `imports` instead of `depends`. diff --git a/R/arsenal.R b/R/arsenal.R index 361dbac..7a1527c 100644 --- a/R/arsenal.R +++ b/R/arsenal.R @@ -6,6 +6,8 @@ #' An Arsenal of 'R' functions for large-scale statistical summaries, #' which are streamlined to work within the latest reporting tools in 'R' and 'RStudio' and #' which use formulas and versatile summary statistics for summary tables and models. +#' +#' The package download, NEWS, and README are available on CRAN: \url{https://cran.r-project.org/package=arsenal} #' #' @section Functions: #' @@ -19,6 +21,8 @@ #' #' \code{\link{write2word}}, \code{\link{write2html}}, \code{\link{write2pdf}}: Functions to generate a word, html, or pdf document containing a single table. #' +#' \code{\link{write2}}: Functions to generate a single document containing a single table. (Also the S3 backbone behind the \code{write2*} functions.) +#' #' \code{\link{formulize}}: A shortcut to generate one-, two-, or many-sided formulas. #' #' \code{\link{mdy.Date}} and \code{\link{Date.mdy}}: Convert numeric dates for month, day, and year to Date object, and vice versa. @@ -36,7 +40,6 @@ NULL #### commands to build the package using devtools -# devtools::document() # devtools::check_man() # devtools::test() # devtools::check() diff --git a/R/freqlist.R b/R/freqlist.R index f1a3748..245bacb 100644 --- a/R/freqlist.R +++ b/R/freqlist.R @@ -131,7 +131,7 @@ freqlist <- function(tab, sparse = FALSE, na.options = c('include', 'showexclude #' @export print.freqlist <- function(x, ...) { - cat("Freqlist object\n\n") + cat("Freqlist Object\n\n") cat(ncol(x$freqlist) - 4, " variables:\n", sep = "") print(colnames(x$freqlist)[1:(ncol(x$freqlist) - 4)]) invisible(x) diff --git a/R/internal.functions.R b/R/internal.functions.R index bebc788..97d9fda 100644 --- a/R/internal.functions.R +++ b/R/internal.functions.R @@ -498,7 +498,7 @@ makePaddedStr <- function(startStr, size, padChar = ' ') { } # See if can split the string on a reasonable boundary character - for (split in c(" ", "\t", "_", "-", "*", ".", ";", ":")) { + for (split in c(" ", "\t", "_", "-",".", ";", ":")) { results <- strsplit(startStr, split, fixed = TRUE)[[1]] if ((length(results) > 1) && (minStrLen(results) <= size)) return(pastePaddedStr(results, size, sep = split, padChar = padChar, appendSep = TRUE)) diff --git a/R/modelsum.R b/R/modelsum.R index 187d448..077b60a 100644 --- a/R/modelsum.R +++ b/R/modelsum.R @@ -47,13 +47,14 @@ #' #' data(mockstudy) #' -#' tab1 <- modelsum(bmi ~ sex + age, data=mockstudy) -#' summary(tab1, text=TRUE) +#' tab1 <- modelsum(bmi ~ sex + age, data = mockstudy) +#' summary(tab1, text = TRUE) #' -#' tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~age + sex, family="gaussian",data=mockstudy) -#' summary(tab2, text=TRUE) +#' tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust = ~ age + sex, +#' family = "gaussian", data = mockstudy) +#' summary(tab2, text = TRUE) #' -#' summary(tab2, show.intercept=FALSE, text=TRUE) +#' summary(tab2, show.intercept = FALSE, text = TRUE) #' #' tab2.df <- as.data.frame(tab2) #' diff --git a/R/summary.modelsum.R b/R/summary.modelsum.R index c4c9744..87215ef 100644 --- a/R/summary.modelsum.R +++ b/R/summary.modelsum.R @@ -14,7 +14,7 @@ modelsum.translations <- list() ## adj.r.squared = "adj.rsq", sex = "Sex", sexM #' @param object The data defining the table to display #' @param title Title for the table, defaults to \code{NULL} (no title) #' @param labelTranslations List where name is the label in the output, and value is the label you -#' want displayed e.g. \code{list (q1q3: "Q1, Q3", medsurv = "Median Survival")}. +#' want displayed e.g. \code{list(q1q3 = "Q1, Q3", medsurv = "Median Survival")}. #' @param digits Maximum number of digits to display for floating point numbers. #' If \code{NA} (default), it uses the value from \code{object$control$digits} #' (whose default is 3, which would result in, e.g., 12.3, 1.23, 0.123, and 0.012). diff --git a/R/tableby.R b/R/tableby.R index 502f0b6..573ca24 100644 --- a/R/tableby.R +++ b/R/tableby.R @@ -111,7 +111,7 @@ #' tab1 <- tableby(arm ~ sex + age, data=mockstudy) #' summary(tab1, text=TRUE) #' -#' mylabels <- list( sex = "SEX", age ="Age, yrs") +#' mylabels <- list(sex = "SEX", age ="Age, yrs") #' summary(tab1, labelTranslations = mylabels, text=TRUE) #' #' tab3 <- tableby(arm ~ sex + age, data=mockstudy, test=FALSE, total=FALSE, diff --git a/R/write2.R b/R/write2.R index cf8b979..0068337 100644 --- a/R/write2.R +++ b/R/write2.R @@ -1,86 +1,220 @@ -#' write2word, write2html, write2pdf +#' write2 #' -#' Functions to generate a word, html, or pdf document containing a single table. +#' Functions to generate a single document containing a single table. (Also the S3 backbone behind the \code{write2*} functions.) #' -#' @param object An object whose \code{summary} output looks "good" when using \code{results='asis'} in markdown. +#' @param object An object. #' @param file A single character string denoting the filename for the output document. -#' @param ... Additional arguments to be passed to \code{summary}, \code{rmarkdown::render}, etc. +#' @param ... Additional arguments to be passed to \code{FUN}, \code{rmarkdown::render}, etc. #' One popular option is to use \code{quiet = TRUE} to suppress the command line output. +#' @param FUN The summary-like or print-like function to use to generate the markdown content. Can be passed as a function or a +#' character string. It's expected that \code{FUN(object, ...)} looks "good" when using \code{results='asis'} in markdown. #' @param keep.md Logical, denoting whether to keep the intermediate \code{.md} file. +#' @param output_format One of the following: +#' \enumerate{ +#' \item{An output format object, e.g. \code{rmarkdown::\link[rmarkdown]{html_document}(...)}.} +#' \item{A character string denoting such a format function, e.g. \code{"html_document"}. In this case, the \code{"..."} are NOT passed.} +#' \item{The format function itself, e.g. \code{rmarkdown::html_document}. In this case, the \code{"..."} arguments are passed.} +#' \item{One of \code{"html"}, \code{"pdf"}, and \code{"word"}, shortcuts implemented here. In this case, the \code{"..."} arguments are passed.} +#' \item{\code{NULL}, in which the output is HTML by default.} +#' } +#' See \code{rmarkdown::\link[rmarkdown]{render}} for details. #' @return \code{object} is returned invisibly, and \code{file} is written. -#' @details This is (kind of) an S3 method (the real S3 method is \code{write2}),and the default -#' (used for \code{\link{tableby}}, \code{\link{modelsum}}, \code{\link{freqlist}}, etc.) assumes -#' that there is a \code{summary} method implemented. -#' -#' To generate the appropriate file type, the default uses one of \code{rmarkdown::word_document}, \code{rmarkdown::html_document}, -#' and \code{rmarkdown::pdf_document} to get the job done. \code{"..."} arguments are passed to these functions, too. -#' @seealso \code{\link[rmarkdown]{render}}, \code{\link[rmarkdown]{word_document}}, \code{\link[rmarkdown]{html_document}}, \code{\link[rmarkdown]{pdf_document}} +#' @details \code{write2} is an S3 method, and the default +#' assumes that there is a \code{FUN} method implemented which looks 'good' in Rmarkdown. +#' +#' There are methods implemented for \code{\link{tableby}}, \code{\link{modelsum}}, and \code{\link{freqlist}}, all of which use the +#' \code{summary} function. There are also methods compatible with \code{\link[knitr]{kable}}, \code{\link[xtable]{xtable}}, +#' and \code{\link[pander]{pander_return}}. +#' @seealso \code{\link{write2word}}, \code{\link{write2pdf}}, \code{\link{write2html}}, +#' \code{\link[rmarkdown]{render}}, \code{\link[rmarkdown]{word_document}}, \code{\link[rmarkdown]{html_document}}, \code{\link[rmarkdown]{pdf_document}}, +#' \code{\link[rmarkdown]{rtf_document}}, \code{\link[rmarkdown]{md_document}}, \code{\link[rmarkdown]{odt_document}} #' @examples #' \dontrun{ #' data(mockstudy) #' # tableby example #' tab1 <- tableby(arm ~ sex + age, data=mockstudy) -#' write2html(tab1, "~/ibm/trash.html") -#' -#' # freqlist example -#' tab.ex <- table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany") -#' noby <- freqlist(tab.ex, na.options = "include") -#' write2pdf(noby, "~/ibm/trash2.pdf") -#' -#' # A more complicated example -#' write2word(tab1, "~/ibm/trash.doc", keep.md = TRUE, -#' reference_docx = mystyles.docx, # passed to rmarkdown::word_document +#' write2(tab1, "~/trash.rtf", +#' toc = TRUE, # passed to rmarkdown::rtf_document, though in this case it's not practical #' quiet = TRUE, # passed to rmarkdown::render -#' title = "My cool new title" # passed to summary.tableby +#' title = "My cool new title", # passed to summary.tableby +#' output_format = rmarkdown::rtf_document) #' } #' @author Ethan Heinzen, adapted from code from Krista Goergen #' @name write2 NULL #> NULL + +#' @rdname write2 +#' @export write2 <- function(object, file, ..., keep.md, output_format) { UseMethod("write2") } -write2.default <- function(object, file, ..., keep.md = FALSE, output_format = c("html", "pdf", "word")) +############################ write2 for arsenal objects ############################ + +#' @rdname write2 +#' @export +write2.tableby <- function(object, file, ..., keep.md = FALSE, output_format = NULL) +{ + write2.default(object = object, file = file, FUN = summary, ..., keep.md = keep.md, output_format = output_format) +} + +#' @rdname write2 +#' @export +write2.modelsum <- function(object, file, ..., keep.md = FALSE, output_format = NULL) +{ + write2.default(object = object, file = file, FUN = summary, ..., keep.md = keep.md, output_format = output_format) +} + +#' @rdname write2 +#' @export +write2.freqlist <- function(object, file, ..., keep.md = FALSE, output_format = NULL) +{ + write2.default(object = object, file = file, FUN = summary, ..., keep.md = keep.md, output_format = output_format) +} + +############################ write2 for external objects ############################ + +#' @rdname write2 +#' @export +write2.knitr_kable <- function(object, file, ..., keep.md = FALSE, output_format = NULL) +{ + write2.default(object = object, file = file, FUN = print, ..., keep.md = keep.md, output_format = output_format) +} + +#' @rdname write2 +#' @export +write2.xtable <- function(object, file, ..., keep.md = FALSE, output_format = NULL) +{ + write2.default(object = object, file = file, FUN = print, ..., keep.md = keep.md, output_format = output_format) +} + +## this intended for use with pander_return() +#' @rdname write2 +#' @export +write2.character <- function(object, file, ..., keep.md = FALSE, output_format = NULL) +{ + write2.default(object = object, file = file, FUN = cat, ..., sep = "\n", keep.md = keep.md, output_format = output_format) +} + + +#' @rdname write2 +#' @export +write2.default <- function(object, file, FUN, ..., keep.md = FALSE, output_format = NULL) { if(!is.character(file) || length(file) > 1) stop("'file' argument must be a single character string.") if(!is.logical(keep.md) || length(keep.md) > 1) stop("'keep.md' argument must be a single character string.") - output_format <- match.arg(output_format) + FUN <- match.fun(FUN) - output_format <- if(output_format == "html") rmarkdown::html_document else if(output_format == "pdf") rmarkdown::pdf_document else rmarkdown::word_document - dots <- list(...) - utils::capture.output(summary(object, ...), file = paste0(file, ".md")) + if(is.character(output_format) && length(output_format) > 1) + { + warning("At this point, write2() only supports one output type.") + output_format <- output_format[1] + } - output.args <- dots[names(dots) %in% names(formals(output_format))] + + output_format <- if(is.null(output_format) || identical(output_format, "html")) + { + rmarkdown::html_document + } else if(identical(output_format, "pdf")) + { + rmarkdown::pdf_document + } else if(identical(output_format, "word")) + { + rmarkdown::word_document + } else output_format + + filename <- paste0(file, ".md") + file.create(filename) + dots <- list(...) + if(names(formals(FUN))[1] == "...") # this is when the FUN is, e.g., cat(). Any named arguments would still get cat'd, which we don't want + { + ARGS <- c(list(object), dots[names(dots) %in% names(formals(FUN))]) + utils::capture.output(do.call(FUN, ARGS), file = filename) + } else + { + utils::capture.output(FUN(object, ...), file = filename) + } render.args <- dots[names(dots) %in% names(formals(rmarkdown::render))] - render.args$input <- paste0(file, ".md") - render.args$output_format <- do.call(output_format, output.args) + render.args$input <- filename render.args$output_file <- file + + # if output_format is a function, evaluate it with the ... arguments + # otherwise, just return the character string or list which rmarkdown::render() will handle + render.args$output_format <- if(is.function(output_format)) + { + if("..." %in% names(formals(output_format))) + { + do.call(output_format, dots) + } else do.call(output_format, dots[names(dots) %in% names(formals(output_format))]) + } else output_format + do.call(rmarkdown::render, render.args) - if(!keep.md) system(paste0("rm -f ", file, ".md")) + # This short-circuits if they want to keep the .md file. Otherwise, file.remove() returns a logical about successful file removal + if(!keep.md && !file.remove(filename)) warning("Something went wrong removing the temporary .md file.") + invisible(object) } -#' @rdname write2 +######################################################################################################################################################### +######################################################################################################################################################### +######################################################################################################################################################### + + +#' write2word, write2html, write2pdf +#' +#' Functions to generate a word, html, or pdf document containing a single table. +#' +#' @inheritParams write2 +#' @return \code{object} is returned invisibly, and \code{file} is written. +#' @details +#' To generate the appropriate file type, the \code{write2*} functions use one of \code{rmarkdown::word_document}, \code{rmarkdown::html_document}, +#' and \code{rmarkdown::pdf_document} to get the job done. \code{"..."} arguments are passed to these functions, too. +#' @seealso \code{\link{write2}} +#' @examples +#' \dontrun{ +#' data(mockstudy) +#' # tableby example +#' tab1 <- tableby(arm ~ sex + age, data=mockstudy) +#' write2html(tab1, "~/trash.html") +#' +#' # freqlist example +#' tab.ex <- table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany") +#' noby <- freqlist(tab.ex, na.options = "include") +#' write2pdf(noby, "~/trash2.pdf") +#' +#' # A more complicated example +#' write2word(tab1, "~/trash.doc", +#' keep.md = TRUE, +#' reference_docx = mystyles.docx, # passed to rmarkdown::word_document +#' quiet = TRUE, # passed to rmarkdown::render +#' title = "My cool new title") # passed to summary.tableby +#' } +#' @author Ethan Heinzen, adapted from code from Krista Goergen +#' @name write2specific +NULL +#> NULL + +#' @rdname write2specific #' @export write2word <- function(object, file, ..., keep.md = FALSE) { write2(object, file, ..., keep.md = keep.md, output_format = "word") } -#' @rdname write2 +#' @rdname write2specific #' @export write2pdf <- function(object, file, ..., keep.md = FALSE) { write2(object, file, ..., keep.md = keep.md, output_format = "pdf") } -#' @rdname write2 +#' @rdname write2specific #' @export write2html <- function(object, file, ..., keep.md = FALSE) { diff --git a/README.md b/README.md index ab4c939..9e35f7c 100644 --- a/README.md +++ b/README.md @@ -4,35 +4,36 @@ The goal of `library(arsenal)` is to make statistical reporting easy. It include in his/her "arsenal" of functions. There are, at this time, 3 main functions, documented below. Each of these functions is motivated by a local SAS macro of similar functionality. -## The `tableby` Function +## The `tableby()` Function -`tableby` is a function to easily summarize a set of independent variables by a categorical variable. +`tableby()` is a function to easily summarize a set of independent variables by a categorical variable. Optionally, an appropriate test is performed to test the distribution of the independent variables across -the levels of the categorical variable. Options for this function are easily controled using `tableby.control`. +the levels of the categorical variable. Options for this function are easily controled using `tableby.control()`. -The `tableby` output is easily knitted in an Rmarkdown document or displayed in the command line using the `summary` function. -Other S3 methods are implemented for objects of class `"tableby"`, including `print`, `[`, `as.data.frame`, and `merge`. +The `tableby()` output is easily knitted in an Rmarkdown document or displayed in the command line using the `summary()` function. +Other S3 methods are implemented for objects of class `"tableby"`, including `print()`, `[`, `as.data.frame()`, and `merge()`. -## The `modelsum` Function +## The `modelsum()` Function -`modelsum` is a function to fit and summarize models for each independent variable with a response variable, -with options to adjust by variables for each model. Options for this function are easily controled using `modelsum.control`. +`modelsum()` is a function to fit and summarize models for each independent variable with a response variable, +with options to adjust by variables for each model. Options for this function are easily controled using `modelsum.control()`. -The `modelsum` output is easily knitted in an Rmarkdown document or displayed in the command line using the `summary` function. -Other S3 methods are implemented for objects of class `"modelsum"`, including `print` and `as.data.frame`. +The `modelsum` output is easily knitted in an Rmarkdown document or displayed in the command line using the `summary()` function. +Other S3 methods are implemented for objects of class `"modelsum"`, including `print()` and `as.data.frame()`. -## The `freqlist` Function +## The `freqlist()` Function -`freqlist` is a function to approximate the output from SAS's `PROC FREQ` procedure when using the `/list` option of the `TABLE` statement. +`freqlist()` is a function to approximate the output from SAS's `PROC FREQ` procedure when using the `/list` option of the `TABLE` statement. -The `freqlist` output is easily knitted in an Rmarkdown document or displayed in the command line using the `summary` function. -Other S3 methods are implemented for objects of class `"freqlist"`, including `print` and `as.data.frame`. +The `freqlist()` output is easily knitted in an Rmarkdown document or displayed in the command line using the `summary()` function. +Other S3 methods are implemented for objects of class `"freqlist"`, including `print()` and `as.data.frame()`. ## Other Notable Functions -* `write2word`, `write2pdf`, and `write2html` are functions to output any of the above objects into a document. They're a shortcut for - "I just want to output this one table but I don't want to open an Rmarkdown script, ugh..." +* `write2word()`, `write2pdf()`, and `write2html()` are functions to output an object into a document, much like SAS's `ODS` procedure. + They're a shortcut for "I just want to output this one table but I don't want to open an Rmarkdown script, ugh..." + The S3 method behind them is `write2()`. -* `formulize` is a shortcut to collapse variable names into a formula. +* `formulize()` is a shortcut to collapse variable names into a formula. -* `mdy.Date` and `Date.mdy` convert numeric dates for month, day, and year to Date object, and vice versa. +* `mdy.Date()` and `Date.mdy()` convert numeric dates for month, day, and year to Date object, and vice versa. diff --git a/build/vignette.rds b/build/vignette.rds index 7d4268a..ff5ce58 100644 Binary files a/build/vignette.rds and b/build/vignette.rds differ diff --git a/inst/doc/freqlist.R b/inst/doc/freqlist.R index 3989afb..699a1c1 100644 --- a/inst/doc/freqlist.R +++ b/inst/doc/freqlist.R @@ -2,12 +2,10 @@ knitr::opts_chunk$set(echo = TRUE, tidy.opts=list(width.cutoff=80), tidy=TRUE, comment=NA) options(width=80, max.print=1000) +## ----message = FALSE---------------------------------------------------------- require(arsenal) -# source("/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/R/freqlist.R") -# source("/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/R/summary.freqlist.R") -# source("/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/R/freqlist.internal.R") -## ----loading and setting up data---------------------------------------------- +## ----loading.data------------------------------------------------------------- # load the data data(mockstudy) @@ -23,7 +21,7 @@ noby <- freqlist(tab.ex) str(noby) # view the data frame portion of freqlist output -noby[["freqlist"]] +noby[["freqlist"]] ## or use as.data.frame(noby) ## ---- results = 'asis'-------------------------------------------------------- summary(noby) @@ -31,6 +29,9 @@ summary(noby) ## ---- results = 'asis'-------------------------------------------------------- summary(noby, caption="Basic freqlist output") +## ----------------------------------------------------------------------------- +head(as.data.frame(noby)) + ## ----labelTranslations, results = 'asis'-------------------------------------- withnames <- freqlist(tab.ex, labelTranslations = c("Treatment Arm","Gender","LASA QOL"), digits = 0) @@ -50,7 +51,7 @@ summary(freqlist(tab.ex, na.options="include")) summary(freqlist(tab.ex, na.options="showexclude")) summary(freqlist(tab.ex, na.options="remove")) -## ----frequency counts, results='asis'----------------------------------------- +## ----freq.counts, results='asis'---------------------------------------------- withby <- freqlist(tab.ex, groupBy = c("arm","sex")) summary(withby) @@ -65,14 +66,12 @@ summary(noby) ## ---- results = 'asis'-------------------------------------------------------- summary(noby, labelTranslations = c("Hi there", "What up", "Bye")) -## ----xtable setup------------------------------------------------------------- +## ----xtable.setup------------------------------------------------------------- require(xtable) -#turn off xtable header -options(xtable.comment = FALSE) -#set up custom function for xtable text +# set up custom function for xtable text italic <- function(x){ -paste0('{\\emph{ ', x, '}}') +paste0('', x, '') } @@ -83,7 +82,7 @@ xftbl <- xtable(noby[["freqlist"]], # change the column names names(xftbl)[1:3] <- c("Arm", "Gender", "LASA QOL") -print(xftbl, sanitize.colnames.function = italic, include.rownames = FALSE) +print(xftbl, sanitize.colnames.function = italic, include.rownames = FALSE, type = "html", comment = FALSE) ## ----------------------------------------------------------------------------- # base table default removes NAs diff --git a/inst/doc/freqlist.Rmd b/inst/doc/freqlist.Rmd index 74111f0..61b9e75 100644 --- a/inst/doc/freqlist.Rmd +++ b/inst/doc/freqlist.Rmd @@ -3,12 +3,9 @@ title: "The freqlist function" author: "Tina Gunderson" date: '`r format(Sys.time(),"%d %B, %Y")`' output: - pdf_document: + rmarkdown::html_vignette: toc: yes toc_depth: 3 - html_document: - toc: yes - toc_depth: '3' header-includes: \usepackage{tabularx} vignette: | %\VignetteIndexEntry{The freqlist function} @@ -19,11 +16,6 @@ vignette: | ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, tidy.opts=list(width.cutoff=80), tidy=TRUE, comment=NA) options(width=80, max.print=1000) - -require(arsenal) -# source("/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/R/freqlist.R") -# source("/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/R/summary.freqlist.R") -# source("/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/R/freqlist.internal.R") ``` \newpage @@ -34,6 +26,10 @@ require(arsenal) `freqlist` provides options for handling missing or sparse data and can provide cumulative counts and percentages based on subgroups. It depends on the `knitr` package for printing. +```{r message = FALSE} +require(arsenal) +``` + ## Sample dataset For our examples, we'll load the `mockstudy` data included with this package and use it to create a basic table. @@ -41,7 +37,7 @@ Because they have fewer levels, for brevity, we'll use the variables arm, sex, a We'll retain NAs in the table creation. See the appendix for notes regarding default NA handling and other useful information regarding tables in R. -```{r loading and setting up data} +```{r loading.data} # load the data data(mockstudy) @@ -68,7 +64,7 @@ noby <- freqlist(tab.ex) str(noby) # view the data frame portion of freqlist output -noby[["freqlist"]] +noby[["freqlist"]] ## or use as.data.frame(noby) ``` \newpage @@ -91,7 +87,11 @@ summary(noby, caption="Basic freqlist output") ``` You can also easily pull out the `freqlist` data frame for more complicated formatting or manipulation -(e.g. with another function such as `xtable` or `pander`). See below. +(e.g. with another function such as `xtable` or `pander`) using `as.data.frame`: + +```{r} +head(as.data.frame(noby)) +``` \newpage @@ -138,9 +138,11 @@ summary(freqlist(tab.ex, na.options="remove")) ## Frequency counts and percentages subset by factor levels -The groupBy argument internally subsets the data by the specified factor prior to calculating cumulative counts and percentages. By default, when used each subset will print in a separate table. Using the `single = TRUE` option when printing will collapse the subsetted result into a single table. +The groupBy argument internally subsets the data by the specified factor prior to calculating cumulative counts and percentages. +By default, when used each subset will print in a separate table. Using the `single = TRUE` option when printing will collapse +the subsetted result into a single table. -```{r frequency counts, results='asis'} +```{r freq.counts, results='asis'} withby <- freqlist(tab.ex, groupBy = c("arm","sex")) summary(withby) @@ -168,14 +170,12 @@ summary(noby, labelTranslations = c("Hi there", "What up", "Bye")) Fair warning: `xtable` has kind of a steep learning curve. These examples are given without explanation for more advanced users. -```{r xtable setup} +```{r xtable.setup} require(xtable) -#turn off xtable header -options(xtable.comment = FALSE) -#set up custom function for xtable text +# set up custom function for xtable text italic <- function(x){ -paste0('{\\emph{ ', x, '}}') +paste0('', x, '') } ``` @@ -187,7 +187,7 @@ xftbl <- xtable(noby[["freqlist"]], # change the column names names(xftbl)[1:3] <- c("Arm", "Gender", "LASA QOL") -print(xftbl, sanitize.colnames.function = italic, include.rownames = FALSE) +print(xftbl, sanitize.colnames.function = italic, include.rownames = FALSE, type = "html", comment = FALSE) ``` \newpage diff --git a/inst/doc/freqlist.html b/inst/doc/freqlist.html new file mode 100644 index 0000000..8de9a3a --- /dev/null +++ b/inst/doc/freqlist.html @@ -0,0 +1,3018 @@ + + + + + + + + + + + + + + + +The freqlist function + + + + + + + + + + + + + + + + + +

The freqlist function

+

Tina Gunderson

+

31 January, 2017

+ + +
+ +
+ + +
+

Overview

+

freqlist is a function meant to produce output similar to SAS’s PROC FREQ procedure when using the /list option of the TABLE statement. freqlist provides options for handling missing or sparse data and can provide cumulative counts and percentages based on subgroups. It depends on the knitr package for printing.

+
require(arsenal)
+
+

Sample dataset

+

For our examples, we’ll load the mockstudy data included with this package and use it to create a basic table. Because they have fewer levels, for brevity, we’ll use the variables arm, sex, and mdquality.s to create the example table. We’ll retain NAs in the table creation. See the appendix for notes regarding default NA handling and other useful information regarding tables in R.

+
# load the data
+data(mockstudy)
+
+# examine the data
+str(mockstudy)
+
'data.frame':   1499 obs. of  14 variables:
+ $ case       : int  110754 99706 105271 105001 112263 86205 99508 90158 88989 90515 ...
+ $ age        : atomic  67 74 50 71 69 56 50 57 51 63 ...
+  ..- attr(*, "label")= chr "Age in Years"
+ $ arm        : atomic  F: FOLFOX A: IFL A: IFL G: IROX ...
+  ..- attr(*, "label")= chr "Treatment Arm"
+ $ sex        : Factor w/ 2 levels "Male","Female": 1 2 2 2 2 1 1 1 2 1 ...
+ $ race       : atomic  Caucasian Caucasian Caucasian Caucasian ...
+  ..- attr(*, "label")= chr "Race"
+ $ fu.time    : int  922 270 175 128 233 120 369 421 387 363 ...
+ $ fu.stat    : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ ps         : int  0 1 1 1 0 0 0 0 1 1 ...
+ $ hgb        : num  11.5 10.7 11.1 12.6 13 10.2 13.3 12.1 13.8 12.1 ...
+ $ bmi        : atomic  25.1 19.5 NA 29.4 26.4 ...
+  ..- attr(*, "label")= chr "Body Mass Index (kg/m^2)"
+ $ alk.phos   : int  160 290 700 771 350 569 162 152 231 492 ...
+ $ ast        : int  35 52 100 68 35 27 16 12 25 18 ...
+ $ mdquality.s: int  NA 1 1 1 NA 1 1 1 1 1 ...
+ $ age.ord    : Ord.factor w/ 8 levels "10-19"<"20-29"<..: 6 7 4 7 6 5 4 5 5 6 ...
+
# retain NAs when creating the table using the useNA argument
+tab.ex <- table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany")
+ +
+
+
+

The freqlist object

+

The freqlist function returns an object of class freqlist, which has three parts: freqlist, byVar, and labels.

+ +
noby <- freqlist(tab.ex)
+
+str(noby)
+
List of 3
+ $ freqlist:'data.frame':   18 obs. of  7 variables:
+  ..$ arm        : Factor w/ 3 levels "A: IFL","F: FOLFOX",..: 1 1 1 1 1 1 2 2 2 2 ...
+  ..$ sex        : Factor w/ 2 levels "Male","Female": 1 1 1 2 2 2 1 1 1 2 ...
+  ..$ mdquality.s: Factor w/ 2 levels "0","1": 1 2 NA 1 2 NA 1 2 NA 1 ...
+  ..$ Freq       : int [1:18] 29 214 34 12 118 21 31 285 95 21 ...
+  ..$ cumFreq    : int [1:18] 29 243 277 289 407 428 459 744 839 860 ...
+  ..$ freqPercent: num [1:18] 1.93 14.28 2.27 0.8 7.87 ...
+  ..$ cumPercent : num [1:18] 1.93 16.21 18.48 19.28 27.15 ...
+ $ byVar   : NULL
+ $ labels  : NULL
+ - attr(*, "class")= chr "freqlist"
+
# view the data frame portion of freqlist output
+noby[["freqlist"]]  ## or use as.data.frame(noby)
+
         arm    sex mdquality.s Freq cumFreq freqPercent cumPercent
+1     A: IFL   Male           0   29      29        1.93       1.93
+2     A: IFL   Male           1  214     243       14.28      16.21
+3     A: IFL   Male        <NA>   34     277        2.27      18.48
+4     A: IFL Female           0   12     289        0.80      19.28
+5     A: IFL Female           1  118     407        7.87      27.15
+6     A: IFL Female        <NA>   21     428        1.40      28.55
+7  F: FOLFOX   Male           0   31     459        2.07      30.62
+8  F: FOLFOX   Male           1  285     744       19.01      49.63
+9  F: FOLFOX   Male        <NA>   95     839        6.34      55.97
+10 F: FOLFOX Female           0   21     860        1.40      57.37
+11 F: FOLFOX Female           1  198    1058       13.21      70.58
+12 F: FOLFOX Female        <NA>   61    1119        4.07      74.65
+13   G: IROX   Male           0   17    1136        1.13      75.78
+14   G: IROX   Male           1  187    1323       12.47      88.26
+15   G: IROX   Male        <NA>   24    1347        1.60      89.86
+16   G: IROX Female           0   14    1361        0.93      90.79
+17   G: IROX Female           1  121    1482        8.07      98.87
+18   G: IROX Female        <NA>   17    1499        1.13     100.00
+ +
+
+

Basic output using summary

+

The summary method for freqlist relies on the kable function (in the knitr package) for printing. knitr::kable converts the output to markdown which can be printed in the console or easily rendered in Word, pdf, or html documents.

+

Note that you must supply results="asis" to properly format the markdown output.

+
summary(noby)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
armsexmdquality.sFreqcumFreqfreqPercentcumPercent
A: IFLMale029291.931.93
121424314.2816.21
NA342772.2718.48
Female0122890.8019.28
11184077.8727.15
NA214281.4028.55
F: FOLFOXMale0314592.0730.62
128574419.0149.63
NA958396.3455.97
Female0218601.4057.37
1198105813.2170.58
NA6111194.0774.65
G: IROXMale01711361.1375.78
1187132312.4788.26
NA2413471.6089.86
Female01413610.9390.79
112114828.0798.87
NA1714991.13100.00
+

Additional arguments (except digits) in the kable function can be passed through. Perhaps the most useful is caption.

+
summary(noby, caption = "Basic freqlist output")
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Basic freqlist output
armsexmdquality.sFreqcumFreqfreqPercentcumPercent
A: IFLMale029291.931.93
121424314.2816.21
NA342772.2718.48
Female0122890.8019.28
11184077.8727.15
NA214281.4028.55
F: FOLFOXMale0314592.0730.62
128574419.0149.63
NA958396.3455.97
Female0218601.4057.37
1198105813.2170.58
NA6111194.0774.65
G: IROXMale01711361.1375.78
1187132312.4788.26
NA2413471.6089.86
Female01413610.9390.79
112114828.0798.87
NA1714991.13100.00
+

You can also easily pull out the freqlist data frame for more complicated formatting or manipulation (e.g. with another function such as xtable or pander) using as.data.frame:

+
head(as.data.frame(noby))
+
     arm    sex mdquality.s Freq cumFreq freqPercent cumPercent
+1 A: IFL   Male           0   29      29        1.93       1.93
+2 A: IFL   Male           1  214     243       14.28      16.21
+3 A: IFL   Male        <NA>   34     277        2.27      18.48
+4 A: IFL Female           0   12     289        0.80      19.28
+5 A: IFL Female           1  118     407        7.87      27.15
+6 A: IFL Female        <NA>   21     428        1.40      28.55
+ +
+
+

Rounding percentage digits or changing variable names for printing

+

The digits argument takes a single numeric value and controls the rounding of percentages in the output. The labelTranslations argument is a character vector whose length must be equal to the number of factors used in the table. Note: this does not change the names of the data frame in the freqlist object, only those used in printing. Both options are applied in the following example.

+
withnames <- freqlist(tab.ex, labelTranslations = c("Treatment Arm", "Gender", "LASA QOL"), 
+    digits = 0)
+
+summary(withnames)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Treatment ArmGenderLASA QOLFreqcumFreqfreqPercentcumPercent
A: IFLMale0292922
12142431416
NA34277218
Female012289119
1118407827
NA21428129
F: FOLFOXMale031459231
12857441950
NA95839656
Female021860157
119810581371
NA611119475
G: IROXMale0171136176
118713231288
NA241347290
Female0141361191
11211482899
NA1714991100
+ +
+
+

Additional examples

+
+

Including combinations with frequencies of zero

+

The sparse argument takes a single logical value as input. The default option is FALSE. If set to TRUE, the sparse option will include combinations with frequencies of zero in the list of results. As our initial table did not have any such levels, we create a second table to use in our example.

+
# we create a second table example to showcase the sparse argument
+tab.sparse <- table(mockstudy[, c("race", "sex", "arm")])
+
+nobysparse <- freqlist(tab.sparse, sparse = TRUE, digits = 1)
+summary(nobysparse)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
racesexarmFreqcumFreqfreqPercentcumPercent
African-AmMaleA: IFL25251.71.7
F: FOLFOX24491.63.3
G: IROX16651.14.4
FemaleA: IFL14790.95.3
F: FOLFOX251041.77.0
G: IROX111150.77.7
AsianMaleA: IFL01150.07.7
F: FOLFOX101250.78.4
G: IROX11260.18.4
FemaleA: IFL11270.18.5
F: FOLFOX41310.38.8
G: IROX21330.18.9
CaucasianMaleA: IFL24037316.125.0
F: FOLFOX35272523.648.6
G: IROX19592013.161.7
FemaleA: IFL13110518.870.4
F: FOLFOX234128515.786.1
G: IROX13614219.195.2
Hawaii/PacificMaleA: IFL114220.195.3
F: FOLFOX114230.195.4
G: IROX014230.095.4
FemaleA: IFL014230.095.4
F: FOLFOX214250.195.5
G: IROX114260.195.6
HispanicMaleA: IFL814340.596.1
F: FOLFOX1714511.197.3
G: IROX1214630.898.1
FemaleA: IFL414670.398.3
F: FOLFOX1114780.799.1
G: IROX214800.199.2
Native-Am/AlaskaMaleA: IFL114810.199.3
F: FOLFOX014810.099.3
G: IROX214830.199.4
FemaleA: IFL114840.199.5
F: FOLFOX114850.199.5
G: IROX014850.099.5
OtherMaleA: IFL214870.199.7
F: FOLFOX214890.199.8
G: IROX114900.199.9
FemaleA: IFL014900.099.9
F: FOLFOX214920.1100.0
G: IROX014920.0100.0
+
+
+

Options for NA handling

+

The various na.options allow you to include or exclude data with missing values for one or more factor levels in the counts and percentages as well as show the missing data but exclude it from the cumulative counts and percentages. The default option is to include all combinations with missing values.

+
summary(freqlist(tab.ex, na.options = "include"))
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
armsexmdquality.sFreqcumFreqfreqPercentcumPercent
A: IFLMale029291.931.93
121424314.2816.21
NA342772.2718.48
Female0122890.8019.28
11184077.8727.15
NA214281.4028.55
F: FOLFOXMale0314592.0730.62
128574419.0149.63
NA958396.3455.97
Female0218601.4057.37
1198105813.2170.58
NA6111194.0774.65
G: IROXMale01711361.1375.78
1187132312.4788.26
NA2413471.6089.86
Female01413610.9390.79
112114828.0798.87
NA1714991.13100.00
+
summary(freqlist(tab.ex, na.options = "showexclude"))
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
armsexmdquality.sFreqcumFreqfreqPercentcumPercent
A: IFLMale029292.332.33
121424317.1619.49
NA34NANANA
Female0122550.9620.45
11183739.4629.91
NA21NANANA
F: FOLFOXMale0314042.4932.40
128568922.8555.25
NA95NANANA
Female0217101.6856.94
119890815.8872.81
NA61NANANA
G: IROXMale0179251.3674.18
1187111215.0089.17
NA24NANANA
Female01411261.1290.30
112112479.70100.00
NA17NANANA
+
summary(freqlist(tab.ex, na.options = "remove"))
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
armsexmdquality.sFreqcumFreqfreqPercentcumPercent
A: IFLMale029292.332.33
121424317.1619.49
Female0122550.9620.45
11183739.4629.91
F: FOLFOXMale0314042.4932.40
128568922.8555.25
Female0217101.6856.94
119890815.8872.81
G: IROXMale0179251.3674.18
1187111215.0089.17
Female01411261.1290.30
112112479.70100.00
+
+
+

Frequency counts and percentages subset by factor levels

+

The groupBy argument internally subsets the data by the specified factor prior to calculating cumulative counts and percentages. By default, when used each subset will print in a separate table. Using the single = TRUE option when printing will collapse the subsetted result into a single table.

+
withby <- freqlist(tab.ex, groupBy = c("arm", "sex"))
+summary(withby)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
armsexmdquality.sFreqcumFreqfreqPercentcumPercent
A: IFLMale0292910.4710.47
121424377.2687.73
NA3427712.27100.00
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
armsexmdquality.sFreqcumFreqfreqPercentcumPercent
A: IFLFemale012127.957.95
111813078.1586.09
NA2115113.91100.00
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
armsexmdquality.sFreqcumFreqfreqPercentcumPercent
F: FOLFOXMale031317.547.54
128531669.3476.89
NA9541123.11100.00
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
armsexmdquality.sFreqcumFreqfreqPercentcumPercent
F: FOLFOXFemale021217.507.50
119821970.7178.21
NA6128021.79100.00
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
armsexmdquality.sFreqcumFreqfreqPercentcumPercent
G: IROXMale017177.467.46
118720482.0289.47
NA2422810.53100.00
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
armsexmdquality.sFreqcumFreqfreqPercentcumPercent
G: IROXFemale014149.219.21
112113579.6188.82
NA1715211.18100.00
+
# using the single = TRUE argument will collapse results into a single table for
+# printing
+summary(withby, single = TRUE)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
armsexmdquality.sFreqcumFreqfreqPercentcumPercent
A: IFLMale0292910.4710.47
121424377.2687.73
NA3427712.27100.00
Female012127.957.95
111813078.1586.09
NA2115113.91100.00
F: FOLFOXMale031317.547.54
128531669.3476.89
NA9541123.11100.00
Female021217.507.50
119821970.7178.21
NA6128021.79100.00
G: IROXMale017177.467.46
118720482.0289.47
NA2422810.53100.00
Female014149.219.21
112113579.6188.82
NA1715211.18100.00
+
+
+

Change labels on the fly

+

At this time, the labels can be changed just for the variables (e.g. not the frequency columns).

+
labels(noby) <- c("Arm", "Sex", "OtherThing")
+summary(noby)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ArmSexOtherThingFreqcumFreqfreqPercentcumPercent
A: IFLMale029291.931.93
121424314.2816.21
NA342772.2718.48
Female0122890.8019.28
11184077.8727.15
NA214281.4028.55
F: FOLFOXMale0314592.0730.62
128574419.0149.63
NA958396.3455.97
Female0218601.4057.37
1198105813.2170.58
NA6111194.0774.65
G: IROXMale01711361.1375.78
1187132312.4788.26
NA2413471.6089.86
Female01413610.9390.79
112114828.0798.87
NA1714991.13100.00
+

You can also supply labelTranslations to summary.

+
summary(noby, labelTranslations = c("Hi there", "What up", "Bye"))
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Hi thereWhat upByeFreqcumFreqfreqPercentcumPercent
A: IFLMale029291.931.93
121424314.2816.21
NA342772.2718.48
Female0122890.8019.28
11184077.8727.15
NA214281.4028.55
F: FOLFOXMale0314592.0730.62
128574419.0149.63
NA958396.3455.97
Female0218601.4057.37
1198105813.2170.58
NA6111194.0774.65
G: IROXMale01711361.1375.78
1187132312.4788.26
NA2413471.6089.86
Female01413610.9390.79
112114828.0798.87
NA1714991.13100.00
+
+
+

Using xtable to format and print freqlist results

+

Fair warning: xtable has kind of a steep learning curve. These examples are given without explanation for more advanced users.

+
require(xtable)
+
Loading required package: xtable
+
# set up custom function for xtable text
+italic <- function(x) {
+    paste0("<i>", x, "</i>")
+}
+
xftbl <- xtable(noby[["freqlist"]], caption = "xtable formatted output of freqlist data frame", 
+    align = "|r|r|r|r|c|c|c|r|")
+
+# change the column names
+names(xftbl)[1:3] <- c("Arm", "Gender", "LASA QOL")
+
+print(xftbl, sanitize.colnames.function = italic, include.rownames = FALSE, type = "html", 
+    comment = FALSE)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+xtable formatted output of freqlist data frame +
+Arm + +Gender + +LASA QOL + +Freq + +cumFreq + +freqPercent + +cumPercent +
+A: IFL + +Male + +0 + +29 + +29 + +1.93 + +1.93 +
+A: IFL + +Male + +1 + +214 + +243 + +14.28 + +16.21 +
+A: IFL + +Male + + +34 + +277 + +2.27 + +18.48 +
+A: IFL + +Female + +0 + +12 + +289 + +0.80 + +19.28 +
+A: IFL + +Female + +1 + +118 + +407 + +7.87 + +27.15 +
+A: IFL + +Female + + +21 + +428 + +1.40 + +28.55 +
+F: FOLFOX + +Male + +0 + +31 + +459 + +2.07 + +30.62 +
+F: FOLFOX + +Male + +1 + +285 + +744 + +19.01 + +49.63 +
+F: FOLFOX + +Male + + +95 + +839 + +6.34 + +55.97 +
+F: FOLFOX + +Female + +0 + +21 + +860 + +1.40 + +57.37 +
+F: FOLFOX + +Female + +1 + +198 + +1058 + +13.21 + +70.58 +
+F: FOLFOX + +Female + + +61 + +1119 + +4.07 + +74.65 +
+G: IROX + +Male + +0 + +17 + +1136 + +1.13 + +75.78 +
+G: IROX + +Male + +1 + +187 + +1323 + +12.47 + +88.26 +
+G: IROX + +Male + + +24 + +1347 + +1.60 + +89.86 +
+G: IROX + +Female + +0 + +14 + +1361 + +0.93 + +90.79 +
+G: IROX + +Female + +1 + +121 + +1482 + +8.07 + +98.87 +
+G: IROX + +Female + + +17 + +1499 + +1.13 + +100.00 +
+ +
+
+
+

Appendix: Notes regarding table options in R

+
+

NAs

+

There are several widely used options for basic tables in R. The table function in base R is probably the most common; by default it excludes NA values. You can change NA handling in base::table using the useNA or exclude arguments.

+
# base table default removes NAs
+tab.d1 <- base::table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany")
+tab.d1
+
, , mdquality.s = 0
+
+           sex
+arm         Male Female
+  A: IFL      29     12
+  F: FOLFOX   31     21
+  G: IROX     17     14
+
+, , mdquality.s = 1
+
+           sex
+arm         Male Female
+  A: IFL     214    118
+  F: FOLFOX  285    198
+  G: IROX    187    121
+
+, , mdquality.s = NA
+
+           sex
+arm         Male Female
+  A: IFL      34     21
+  F: FOLFOX   95     61
+  G: IROX     24     17
+

xtabs is similar to table, but uses a formula-based syntax. However, there is not an option for retaining NAs in the xtabs function; instead, NAs must be added to each level of the factor where present using the addNA function.

+
# without specifying addNA
+tab.d2 <- xtabs(formula = ~arm + sex + mdquality.s, data = mockstudy)
+tab.d2
+
, , mdquality.s = 0
+
+           sex
+arm         Male Female
+  A: IFL      29     12
+  F: FOLFOX   31     21
+  G: IROX     17     14
+
+, , mdquality.s = 1
+
+           sex
+arm         Male Female
+  A: IFL     214    118
+  F: FOLFOX  285    198
+  G: IROX    187    121
+
# now with addNA
+tab.d3 <- xtabs(~arm + sex + addNA(mdquality.s), data = mockstudy)
+tab.d3
+
, , addNA(mdquality.s) = 0
+
+           sex
+arm         Male Female
+  A: IFL      29     12
+  F: FOLFOX   31     21
+  G: IROX     17     14
+
+, , addNA(mdquality.s) = 1
+
+           sex
+arm         Male Female
+  A: IFL     214    118
+  F: FOLFOX  285    198
+  G: IROX    187    121
+
+, , addNA(mdquality.s) = NA
+
+           sex
+arm         Male Female
+  A: IFL      34     21
+  F: FOLFOX   95     61
+  G: IROX     24     17
+
+
+

Table dimname names (dnn)

+

Supplying a data.frame to the table function without giving columns individually will create a contingency table using all variables in the data.frame.

+

However, if the columns of a data.frame or matrix are supplied separately (i.e., as vectors), column names will not be preserved.

+
# providing variables separately (as vectors) drops column names
+tab.d4 <- base::table(mockstudy[, "arm"], mockstudy[, "sex"], mockstudy[, "mdquality.s"])
+tab.d4
+
, ,  = 0
+
+           
+            Male Female
+  A: IFL      29     12
+  F: FOLFOX   31     21
+  G: IROX     17     14
+
+, ,  = 1
+
+           
+            Male Female
+  A: IFL     214    118
+  F: FOLFOX  285    198
+  G: IROX    187    121
+

If desired, you can use the dnn argument to pass variable names.

+
# add the column name labels back using dnn option in base::table
+tab.dnn <- base::table(mockstudy[, "arm"], mockstudy[, "sex"], mockstudy[, "mdquality.s"], 
+    dnn = c("Amy", "Susan", "George"))
+tab.dnn
+
, , George = 0
+
+           Susan
+Amy         Male Female
+  A: IFL      29     12
+  F: FOLFOX   31     21
+  G: IROX     17     14
+
+, , George = 1
+
+           Susan
+Amy         Male Female
+  A: IFL     214    118
+  F: FOLFOX  285    198
+  G: IROX    187    121
+

If using freqlist, you can provide the labels directly to freqlist or to summary using labelTranslations.

+
+
+ + + + + + + + diff --git a/inst/doc/freqlist.pdf b/inst/doc/freqlist.pdf deleted file mode 100644 index 306cb93..0000000 Binary files a/inst/doc/freqlist.pdf and /dev/null differ diff --git a/inst/doc/modelsum.R b/inst/doc/modelsum.R index 1ff4c13..a400bbb 100644 --- a/inst/doc/modelsum.R +++ b/inst/doc/modelsum.R @@ -6,12 +6,8 @@ require(MASS) require(pROC) require(rpart) -opts_chunk$set(comment = NA, echo=TRUE, prompt=TRUE ,collapse=TRUE) +opts_chunk$set(comment = NA, echo=TRUE, prompt=TRUE, collapse=TRUE) -#vignette: > -# %\VignetteIndexEntry{modelsum} -# %\VignetteEngine{knitr::rmarkdown} -# \usepackage[utf8]{inputenc} ## ---- load-data---------------------------------------------------------- require(arsenal) @@ -29,6 +25,9 @@ summary(tab1, text=TRUE) ## ----simple-markdown, results='asis'------------------------------------- summary(tab1) +## ------------------------------------------------------------------------ +as.data.frame(tab1) + ## ----adjust, results="asis"---------------------------------------------- tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~age + sex, data=mockstudy) summary(tab2) @@ -374,15 +373,17 @@ mockstudy$wts <- gpwts[index] ## show weights by treatment arm group tapply(mockstudy$wts,mockstudy$arm, summary) -## ----results='asis', warning=FALSE--------------------------------------- +## ----results='asis'------------------------------------------------------ mockstudy$newvarA <- as.numeric(mockstudy$arm=='A: IFL') tab1 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), family=binomial) summary(tab1, title='No Case Weights used') +suppressWarnings({ tab2 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), weights=wts, family=binomial) summary(tab2, title='Case Weights used') +}) ## ------------------------------------------------------------------------ summary(tab2, text=T) diff --git a/inst/doc/modelsum.Rmd b/inst/doc/modelsum.Rmd index 03dc754..97a23b6 100644 --- a/inst/doc/modelsum.Rmd +++ b/inst/doc/modelsum.Rmd @@ -3,15 +3,9 @@ title: "The modelsum function" author: "Beth Atkinson, Pat Votruba, Jason Sinnwell, Shannon McDonnell and Greg Dougherty" date: '`r format(Sys.time(),"%d %B, %Y")`' output: - html_document: - toc: yes - toc_depth: '3' - pdf_document: + rmarkdown::html_vignette: toc: yes toc_depth: 3 - word_document: - toc: yes - toc_depth: '3' vignette: | %\VignetteIndexEntry{The modelsum function} %\VignetteEncoding{UTF-8} @@ -26,17 +20,12 @@ require(MASS) require(pROC) require(rpart) -opts_chunk$set(comment = NA, echo=TRUE, prompt=TRUE ,collapse=TRUE) +opts_chunk$set(comment = NA, echo=TRUE, prompt=TRUE, collapse=TRUE) -#vignette: > -# %\VignetteIndexEntry{modelsum} -# %\VignetteEngine{knitr::rmarkdown} -# \usepackage[utf8]{inputenc} ``` -Introduction -============= +# Introduction Very often we are asked to summarize model results from multiple fits into a nice table. The endpoint might be of different types (e.g., survival, case/control, continuous) and there @@ -54,10 +43,9 @@ so the tables could be displayed within an R markdown report. This report provides step-by-step directions for using the functions associated with `modelsum`. All functions presented here are available within the `arsenal` package. An assumption is made that users are somewhat familiar with R markdown documents. For those who are new to the topic, a good initial -resource is available at [rmarkdown.rstudio.com](rmarkdown.rstudio.com). +resource is available at [rmarkdown.rstudio.com](http://rmarkdown.rstudio.com). -Simple Example -================ +# Simple Example The first step when using the `modelsum` function is to load the `arsenal` package. All the examples in this report use a dataset called `mockstudy` made available by Paul Novotny which includes a variety of types of variables @@ -82,7 +70,7 @@ If you want to take a quick look at the table, you can use `summary` on your mod print out as text in your R console window. If you use `summary` without any options you will see a number of $\ $ statements which translates to "space" in HTML. -### Pretty text version of table +## Pretty text version of table If you want a nicer version in your console window then adding the `text=TRUE` option. @@ -90,23 +78,25 @@ If you want a nicer version in your console window then adding the `text=TRUE` o summary(tab1, text=TRUE) ``` -### Pretty Rmarkdown version of table +## Pretty Rmarkdown version of table In order for the report to look nice within an R markdown (knitr) report, you just need to specify `results="asis"` when creating the r chunk. This changes the layout slightly (compresses it) and bolds -the variable names. The three single quotes are often located above the tab key. - -`r ''` ```{r results="asis"} - - summary(tab1) - -``` +the variable names. ```{r simple-markdown, results='asis'} summary(tab1) ``` -### Add an adjustor to the model +## Data frame version of table + +If you want a data.frame version, simply use `as.data.frame`. + +```{r} +as.data.frame(tab1) +``` + +## Add an adjustor to the model The argument `adjust` allows the user to indicate that all the variables should be adjusted for these terms. @@ -116,16 +106,14 @@ summary(tab2) ``` -Models for each endpoint type -================================== +# Models for each endpoint type To make sure the correct model is run you need to specify "family". The options available right now are : gaussian, binomial, survival, and poisson. If there is enough interest, additional models can be added. -Gaussian ------------ +## Gaussian -### fit and summarize linear regression model +### Fit and summarize linear regression model Look at whether there is any evidence that AlkPhos values vary by study arm after adjusting for sex and age (assuming a linear age relationship). @@ -163,7 +151,7 @@ termplot(fit3, term=2, se=T, rug=T) In this instance it looks like there isn't enough evidence to say that the relationship is non-linear. -### extract data using the `broom` package +### Extract data using the `broom` package The `broom` package makes it easy to extract information from the fit. @@ -175,7 +163,7 @@ tmp glance(fit3) ``` -### create a summary table using modelsum +### Create a summary table using modelsum ```{r, results="asis"} ms.logy <- modelsum(log(alk.phos) ~ arm + ps + hgb, data=mockstudy, adjust= ~age + sex, @@ -184,10 +172,9 @@ ms.logy <- modelsum(log(alk.phos) ~ arm + ps + hgb, data=mockstudy, adjust= ~age summary(ms.logy) ``` -Binomial ----------- +## Binomial -### fit and summarize logistic regression model +### Fit and summarize logistic regression model ```{r} boxplot(age ~ mdquality.s, data=mockstudy, ylab=attr(mockstudy$age,'label'), xlab='mdquality.s') @@ -216,7 +203,7 @@ tmp$auc ``` -### extract data using `broom` package +### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. @@ -226,7 +213,7 @@ tidy(fit, exp=T, conf.int=T) # coefficients, p-values, conf.intervals glance(fit) # model summary statistics ``` -### create a summary table using modelsum +### Create a summary table using modelsum ```{r, results="asis"} summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial)) @@ -237,10 +224,9 @@ summary(fitall) ``` -Survival ---------- +## Survival -### fit and summarize a Cox regression model +### Fit and summarize a Cox regression model ```{r survival} require(survival) @@ -271,7 +257,7 @@ summary(fit2)$concordance survConcordance(Surv(fu.time, fu.stat) ~ predict(fit2), data=mockstudy) ``` -### extract data using `broom` package +### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. @@ -281,7 +267,7 @@ tidy(fit) # coefficients, p-values glance(fit) # model summary statistics ``` -### create a summary table using modelsum +### Create a summary table using modelsum ```{r results="asis"} ##Note: You must use quotes when specifying family="survival" @@ -295,8 +281,7 @@ summary(modelsum(Surv(fu.time, fu.stat) ~ arm, ``` -Poisson --------- +## Poisson Poisson regression is useful when predicting an outcome variable representing counts. It can also be useful when looking at survival data. Cox models and Poisson models are very closely @@ -331,7 +316,7 @@ fit2 <- glm(skips ~ Opening + Solder + Mask, data=solder, family=quasipoisson) summary(fit2) ``` -### extract data using `broom` package +### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. @@ -342,7 +327,7 @@ glance(fit) # model summary statistics ``` -### create a summary table using modelsum +### Create a summary table using modelsum ```{r results='asis'} summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="quasipoisson")) @@ -373,7 +358,7 @@ fit2 <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, summary(fit2) ``` -### extract data using `broom` package +### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. @@ -384,7 +369,7 @@ glance(fit) ##model summary statistics ``` -### create a summary table using modelsum +### Create a summary table using `modelsum` Remember that the result from `modelsum` is different from the `fit` above. The `modelsum` summary shows the results for `age + offset(log(fu.time+.01))` then `sex + offset(log(fu.time+.01))` @@ -397,12 +382,12 @@ summary(modelsum(fu.stat ~ age, adjust=~offset(log(fu.time+.01))+ sex + arm, ``` -Additional Examples -==================== +# Additional Examples + Here are multiple examples showing how to use some of the different options. -###1. Change summary statistics globally +## 1. Change summary statistics globally There are standard settings for each type of model regarding what information is summarized in the table. This behavior can be modified using the modelsum.control function. In fact, you can save your standard @@ -425,7 +410,7 @@ tab3 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy, summary(tab3) ``` -###2. Add labels to independent variables +## 2. Add labels to independent variables In the above example, age is shown with a label (Age in Years), but sex is listed "as is". This is because the data was created in SAS and in the SAS dataset, age had a label but sex did not. @@ -470,20 +455,20 @@ labels(tab1) summary(tab1) ``` -###2. Don't show intercept values +## 3. Don't show intercept values ```{r, results='asis'} summary(modelsum(age~mdquality.s+sex, data=mockstudy), show.intercept=FALSE) ``` -###3. Don't show results for adjustment variables +## 4. Don't show results for adjustment variables ```{r, results='asis'} summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial), show.adjust=FALSE) ``` -###4. Summarize multiple variables without typing them out +## 5. Summarize multiple variables without typing them out Often one wants to summarize a number of variables. Instead of typing by hand each individual variable, an alternative approach is to create a formula using the `paste` command with the `collapse="+"` option. @@ -519,7 +504,7 @@ summary(modelsum(tmp, data=mockstudy, family=binomial)) ``` -###5. Subset the dataset used in the analysis +## 6. Subset the dataset used in the analysis Here are two ways to get the same result (limit the analysis to subjects age>50 and in the F: FOLFOX treatment group). @@ -548,7 +533,7 @@ summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset = age>50 & bmi<24, dat summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset=1:30, data=mockstudy)) ``` -###6. Create combinations of variables on the fly +## 7. Create combinations of variables on the fly ```{r} ## create a variable combining the levels of mdquality.s and sex @@ -559,7 +544,7 @@ with(mockstudy, table(interaction(mdquality.s,sex))) summary(modelsum(age ~ interaction(mdquality.s,sex), data=mockstudy)) ``` -###9. Transform variables on the fly +## 8. Transform variables on the fly Certain transformations need to be surrounded by `I()` so that R knows to treat it as a variable transformation and not some special model feature. If the transformation includes any of the @@ -572,7 +557,7 @@ summary(modelsum(arm=="F: FOLFOX" ~ I(age/10) + log(bmi) + mdquality.s, ``` -###10. Change the ordering of the variables or delete a variable +## 9. Change the ordering of the variables or delete a variable ```{r, results='asis'} mytab <- modelsum(bmi ~ sex + alk.phos + age, data=mockstudy) @@ -582,7 +567,7 @@ summary(mytab[c('age','sex')]) summary(mytab[c(3,1)]) ``` -###11. Merge two modelsum objects together +## 10. Merge two `modelsum` objects together It is possible to combine two modelsum objects so that they print out together, however you need to pay attention to the columns that are being displayed. It is easier to combine two models of the same @@ -602,7 +587,7 @@ class(tab12) #summary(tab12) ``` -###12. Add a title to the table +## 11. Add a title to the table When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table. @@ -612,11 +597,12 @@ t1 <- modelsum(bmi ~ sex + age, data=mockstudy) summary(t1, title='Demographics') ``` -###13. Modify how missing values are treated +## 12. Modify how missing values are treated Depending on the report you are writing you have the following options: * Use all values available for each variable + * Use only those subjects who have measurements available for all the variables ```{r} @@ -642,13 +628,16 @@ summary(modelsum(bmi ~ ast + age, data=mockstudy, control=modelsum.control(gaussian.stats=c("estimate")))) ``` -###14. Modify the number of digits used +## 13. Modify the number of digits used Within modelsum.control function there are 4 options for controlling the number of significant digits shown. * digits: controls the number of significant digits (counting both before and after the decimal point) for continuous variables + * nsmall: controls the number of digits after the decimal point for the beta and standard error + * nsmall.ratio: controls the number of digits for the ratio statistics (OR, HR, RR), default=2 + * digits.test: controls the number of digits after the decimal point for p-values (default=3) ```{r, results='asis'} @@ -670,7 +659,7 @@ format(pi*100, nsmall=4) format(pi*100, nsmall=2, digits=4) ``` -###15. Use case-weights in the models +## 14. Use case-weights in the models Occasionally it is of interest to fit models using case weights. The `modelsum` function allows you to pass on the weights to the models and it will do the appropriate fit. @@ -692,18 +681,20 @@ mockstudy$wts <- gpwts[index] tapply(mockstudy$wts,mockstudy$arm, summary) ``` -```{r results='asis', warning=FALSE} +```{r results='asis'} mockstudy$newvarA <- as.numeric(mockstudy$arm=='A: IFL') tab1 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), family=binomial) summary(tab1, title='No Case Weights used') +suppressWarnings({ tab2 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), weights=wts, family=binomial) summary(tab2, title='Case Weights used') +}) ``` -###16. Use `modelsum` within an Sweave document +## 15. Use `modelsum` within an Sweave document For those users who wish to create tables within an Sweave document, the following code seems to work. @@ -737,7 +728,7 @@ render("Test.md", pdf_document(keep_tex=TRUE)) \end{document} ``` -###17. Export `modelsum` results to a .CSV file +## 16. Export `modelsum` results to a .CSV file When looking at multiple variables it is sometimes useful to export the results to a csv file. The `as.data.frame` function creates a data frame object that can be exported or further manipulated within R. @@ -750,7 +741,7 @@ tmp # write.csv(tmp, '/my/path/here/mymodel.csv') ``` -###18. Write `modelsum` object to a separate Word or HTML file +## 17. Write `modelsum` object to a separate Word or HTML file ```{r} ## write to an HTML document @@ -760,10 +751,9 @@ tmp # write2word(tab2, "~/ibm/trash.doc", title="My table in Word") ``` -Available Function Options -================================== +# Available Function Options -### Summary statistics +## Summary statistics The available summary statistics, by varible type, are: @@ -827,7 +817,7 @@ The full description of these parameters that can be shown for models include: * `CI.lower.estimate, CI.upper.estimate`: print the confidence interval for the beta coefficient -### modelsum.control settings +## `modelsum.control` settings A quick way to see what arguments are possible to utilize in a function is to use the `args()` command. Settings involving the number of digits can be set in `modelsum.control` or in `summary.modelsum`. @@ -851,7 +841,7 @@ Settings: * poisson.stats, quasipoisson.stats -### summary.modelsum settings +## `summary.modelsum` settings The summary.modelsum function has options that modify how the table appears (such as adding a title or modifying labels). diff --git a/inst/doc/modelsum.html b/inst/doc/modelsum.html index a53f2c5..753030d 100644 --- a/inst/doc/modelsum.html +++ b/inst/doc/modelsum.html @@ -8,119 +8,69 @@ + The modelsum function - - - - - - + - - - - - + - - - - -
- - - - - - - - - - - - -
@@ -186,59 +137,56 @@

29 December, 2016

Introduction

Very often we are asked to summarize model results from multiple fits into a nice table. The endpoint might be of different types (e.g., survival, case/control, continuous) and there may be several independent variables that we want to examine univariately or adjusted for certain variables such as age and sex. Locally, the SAS macros %modelsum, %glmuniv, and %logisuni were written to create such summary tables. With the increasing interest in R, we have developed the function modelsum to create similar tables within the R environment.

In developing the modelsum function, the goal was to bring the best features of these macros into an R function. However, the task was not simply to duplicate all the functionality, but rather to make use of R’s strengths (modeling, method dispersion, flexibility in function definition and output format) and make a tool that fits the needs of R users. Additionally, the results needed to fit within the general reproducible research framework so the tables could be displayed within an R markdown report.

-

This report provides step-by-step directions for using the functions associated with modelsum. All functions presented here are available within the arsenal package. An assumption is made that users are somewhat familiar with R markdown documents. For those who are new to the topic, a good initial resource is available at rmarkdown.rstudio.com.

+

This report provides step-by-step directions for using the functions associated with modelsum. All functions presented here are available within the arsenal package. An assumption is made that users are somewhat familiar with R markdown documents. For those who are new to the topic, a good initial resource is available at rmarkdown.rstudio.com.

Simple Example

The first step when using the modelsum function is to load the arsenal package. All the examples in this report use a dataset called mockstudy made available by Paul Novotny which includes a variety of types of variables (character, numeric, factor, ordered factor, survival) to use as examples.

-
> require(arsenal)
-> data(mockstudy) # load data
-> dim(mockstudy)  # look at how many subjects and variables are in the dataset 
-[1] 1499   14
-> # help(mockstudy) # learn more about the dataset and variables
-> str(mockstudy) # quick look at the data
-'data.frame':   1499 obs. of  14 variables:
- $ case       : int  110754 99706 105271 105001 112263 86205 99508 90158 88989 90515 ...
- $ age        : atomic  67 74 50 71 69 56 50 57 51 63 ...
-  ..- attr(*, "label")= chr "Age in Years"
- $ arm        : atomic  F: FOLFOX A: IFL A: IFL G: IROX ...
-  ..- attr(*, "label")= chr "Treatment Arm"
- $ sex        : Factor w/ 2 levels "Male","Female": 1 2 2 2 2 1 1 1 2 1 ...
- $ race       : atomic  Caucasian Caucasian Caucasian Caucasian ...
-  ..- attr(*, "label")= chr "Race"
- $ fu.time    : int  922 270 175 128 233 120 369 421 387 363 ...
- $ fu.stat    : int  2 2 2 2 2 2 2 2 2 2 ...
- $ ps         : int  0 1 1 1 0 0 0 0 1 1 ...
- $ hgb        : num  11.5 10.7 11.1 12.6 13 10.2 13.3 12.1 13.8 12.1 ...
- $ bmi        : atomic  25.1 19.5 NA 29.4 26.4 ...
-  ..- attr(*, "label")= chr "Body Mass Index (kg/m^2)"
- $ alk.phos   : int  160 290 700 771 350 569 162 152 231 492 ...
- $ ast        : int  35 52 100 68 35 27 16 12 25 18 ...
- $ mdquality.s: int  NA 1 1 1 NA 1 1 1 1 1 ...
- $ age.ord    : Ord.factor w/ 8 levels "10-19"<"20-29"<..: 6 7 4 7 6 5 4 5 5 6 ...
+
> require(arsenal)
+> data(mockstudy) # load data
+> dim(mockstudy)  # look at how many subjects and variables are in the dataset 
+[1] 1499   14
+> # help(mockstudy) # learn more about the dataset and variables
+> str(mockstudy) # quick look at the data
+'data.frame':   1499 obs. of  14 variables:
+ $ case       : int  110754 99706 105271 105001 112263 86205 99508 90158 88989 90515 ...
+ $ age        : atomic  67 74 50 71 69 56 50 57 51 63 ...
+  ..- attr(*, "label")= chr "Age in Years"
+ $ arm        : atomic  F: FOLFOX A: IFL A: IFL G: IROX ...
+  ..- attr(*, "label")= chr "Treatment Arm"
+ $ sex        : Factor w/ 2 levels "Male","Female": 1 2 2 2 2 1 1 1 2 1 ...
+ $ race       : atomic  Caucasian Caucasian Caucasian Caucasian ...
+  ..- attr(*, "label")= chr "Race"
+ $ fu.time    : int  922 270 175 128 233 120 369 421 387 363 ...
+ $ fu.stat    : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ ps         : int  0 1 1 1 0 0 0 0 1 1 ...
+ $ hgb        : num  11.5 10.7 11.1 12.6 13 10.2 13.3 12.1 13.8 12.1 ...
+ $ bmi        : atomic  25.1 19.5 NA 29.4 26.4 ...
+  ..- attr(*, "label")= chr "Body Mass Index (kg/m^2)"
+ $ alk.phos   : int  160 290 700 771 350 569 162 152 231 492 ...
+ $ ast        : int  35 52 100 68 35 27 16 12 25 18 ...
+ $ mdquality.s: int  NA 1 1 1 NA 1 1 1 1 1 ...
+ $ age.ord    : Ord.factor w/ 8 levels "10-19"<"20-29"<..: 6 7 4 7 6 5 4 5 5 6 ...

To create a simple linear regression table (the default), use a formula statement to specify the variables that you want summarized. The example below predicts BMI with the variables sex and age.

-
> tab1 <- modelsum(bmi ~ sex + age, data=mockstudy)
+
> tab1 <- modelsum(bmi ~ sex + age, data=mockstudy)

If you want to take a quick look at the table, you can use summary on your modelsum object and the table will print out as text in your R console window. If you use summary without any options you will see a number of \(\&nbsp;\) statements which translates to “space” in HTML.

-
-

Pretty text version of table

+
+

Pretty text version of table

If you want a nicer version in your console window then adding the text=TRUE option.

-
> summary(tab1, text=TRUE)
+
> summary(tab1, text=TRUE)
 ----------------------------------------------------------------------------------
-                    estimate        std.error       p.value         adj.r.squared 
------------------- --------------- --------------- --------------- ---------------
-(Intercept)        27.5            0.181           <0.001          0.004          
-sex Female         -0.731          0.29            0.012           .              
-(Intercept)        26.4            0.752           <0.001          0              
-Age in Years       0.013           0.012           0.290           .              
-----------------------------------------------------------------------------------
+ estimate std.error p.value adj.r.squared +------------------ --------------- --------------- --------------- --------------- +(Intercept) 27.5 0.181 <0.001 0.004 +sex Female -0.731 0.29 0.012 . +(Intercept) 26.4 0.752 <0.001 0 +Age in Years 0.013 0.012 0.290 . +----------------------------------------------------------------------------------
-
-

Pretty Rmarkdown version of table

-

In order for the report to look nice within an R markdown (knitr) report, you just need to specify results="asis" when creating the r chunk. This changes the layout slightly (compresses it) and bolds the variable names. The three single quotes are often located above the tab key.

-

```{r results=“asis”}

-

summary(tab1)

-

```

-
> summary(tab1)
+
+

Pretty Rmarkdown version of table

+

In order for the report to look nice within an R markdown (knitr) report, you just need to specify results="asis" when creating the r chunk. This changes the layout slightly (compresses it) and bolds the variable names.

+
> summary(tab1)
@@ -288,11 +236,21 @@

Pretty Rmarkdown version of table

-
-

Add an adjustor to the model

+
+

Data frame version of table

+

If you want a data.frame version, simply use as.data.frame.

+
> as.data.frame(tab1)
+          term model endpoint estimate std.error p.value adj.r.squared
+1  (Intercept)     1      bmi   27.500     0.181      NA         0.004
+2   sex Female     1      bmi   -0.731     0.290   0.012         0.004
+3  (Intercept)     2      bmi   26.400     0.752      NA         0.000
+4 Age in Years     2      bmi    0.013     0.012   0.290         0.000
+
+
+

Add an adjustor to the model

The argument adjust allows the user to indicate that all the variables should be adjusted for these terms.

-
> tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~age + sex, data=mockstudy)
-> summary(tab2)
+
> tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~age + sex, data=mockstudy)
+> summary(tab2)
@@ -412,112 +370,112 @@

Models for each endpoint type

Gaussian

-

fit and summarize linear regression model

+

Fit and summarize linear regression model

Look at whether there is any evidence that AlkPhos values vary by study arm after adjusting for sex and age (assuming a linear age relationship).

-
> fit <- lm(alk.phos ~ arm + age + sex, data=mockstudy)
-> summary(fit)
+
> fit <- lm(alk.phos ~ arm + age + sex, data=mockstudy)
+> summary(fit)
 
 Call:
-lm(formula = alk.phos ~ arm + age + sex, data = mockstudy)
+lm(formula = alk.phos ~ arm + age + sex, data = mockstudy)
 
 Residuals:
-    Min      1Q  Median      3Q     Max 
--168.80  -81.45  -47.17   37.39  853.56 
+    Min      1Q  Median      3Q     Max 
+-168.80  -81.45  -47.17   37.39  853.56 
 
 Coefficients:
-              Estimate Std. Error t value Pr(>|t|)    
-(Intercept)  175.54808   20.58665   8.527   <2e-16 ***
-armF: FOLFOX -13.70062    8.72963  -1.569    0.117    
-armG: IROX    -2.24498    9.86004  -0.228    0.820    
-age           -0.01741    0.31878  -0.055    0.956    
-sexFemale      3.01598    7.52097   0.401    0.688    
+              Estimate Std. Error t value Pr(>|t|)    
+(Intercept)  175.54808   20.58665   8.527   <2e-16 ***
+armF: FOLFOX -13.70062    8.72963  -1.569    0.117    
+armG: IROX    -2.24498    9.86004  -0.228    0.820    
+age           -0.01741    0.31878  -0.055    0.956    
+sexFemale      3.01598    7.52097   0.401    0.688    
 ---
-Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-
-Residual standard error: 128.5 on 1228 degrees of freedom
-  (266 observations deleted due to missingness)
-Multiple R-squared:  0.002552,  Adjusted R-squared:  -0.0006969 
-F-statistic: 0.7855 on 4 and 1228 DF,  p-value: 0.5346
-> plot(fit)
-

+Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 + +Residual standard error: 128.5 on 1228 degrees of freedom + (266 observations deleted due to missingness) +Multiple R-squared: 0.002552, Adjusted R-squared: -0.0006969 +F-statistic: 0.7855 on 4 and 1228 DF, p-value: 0.5346 +> plot(fit)
+

The results suggest that the endpoint may need to be transformed. Calculating the Box-Cox transformation suggests a log transformation.

-
> require(MASS)
-> boxcox(fit)
-

-
> fit2 <- lm(log(alk.phos) ~ arm + age + sex, data=mockstudy)
-> summary(fit2)
+
> require(MASS)
+> boxcox(fit)
+

+
> fit2 <- lm(log(alk.phos) ~ arm + age + sex, data=mockstudy)
+> summary(fit2)
 
 Call:
-lm(formula = log(alk.phos) ~ arm + age + sex, data = mockstudy)
+lm(formula = log(alk.phos) ~ arm + age + sex, data = mockstudy)
 
 Residuals:
-    Min      1Q  Median      3Q     Max 
--3.0098 -0.4470 -0.1065  0.4205  2.0620 
+    Min      1Q  Median      3Q     Max 
+-3.0098 -0.4470 -0.1065  0.4205  2.0620 
 
 Coefficients:
-               Estimate Std. Error t value Pr(>|t|)    
-(Intercept)   4.9692474  0.1025239  48.469   <2e-16 ***
-armF: FOLFOX -0.0766798  0.0434746  -1.764    0.078 .  
-armG: IROX   -0.0192828  0.0491041  -0.393    0.695    
-age          -0.0004058  0.0015876  -0.256    0.798    
-sexFemale     0.0179253  0.0374553   0.479    0.632    
+               Estimate Std. Error t value Pr(>|t|)    
+(Intercept)   4.9692474  0.1025239  48.469   <2e-16 ***
+armF: FOLFOX -0.0766798  0.0434746  -1.764    0.078 .  
+armG: IROX   -0.0192828  0.0491041  -0.393    0.695    
+age          -0.0004058  0.0015876  -0.256    0.798    
+sexFemale     0.0179253  0.0374553   0.479    0.632    
 ---
-Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-
-Residual standard error: 0.6401 on 1228 degrees of freedom
-  (266 observations deleted due to missingness)
-Multiple R-squared:  0.003121,  Adjusted R-squared:  -0.0001258 
-F-statistic: 0.9613 on 4 and 1228 DF,  p-value: 0.4278
-> plot(fit2)
-

+Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 + +Residual standard error: 0.6401 on 1228 degrees of freedom + (266 observations deleted due to missingness) +Multiple R-squared: 0.003121, Adjusted R-squared: -0.0001258 +F-statistic: 0.9613 on 4 and 1228 DF, p-value: 0.4278 +> plot(fit2)
+

Finally, look to see whether there there is a non-linear relationship with age.

-
> require(gam)
-> fit3 <- lm(log(alk.phos) ~ arm + ns(age, df=2) + sex, data=mockstudy)
-> 
-> # test whether there is a difference between models 
-> anova(fit2,fit3)
+
> require(gam)
+> fit3 <- lm(log(alk.phos) ~ arm + ns(age, df=2) + sex, data=mockstudy)
+> 
+> # test whether there is a difference between models 
+> anova(fit2,fit3)
 Analysis of Variance Table
 
-Model 1: log(alk.phos) ~ arm + age + sex
-Model 2: log(alk.phos) ~ arm + ns(age, df = 2) + sex
-  Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
-1   1228 503.19                              
-2   1227 502.07  1    1.1137 2.7218 0.09924 .
+Model 1: log(alk.phos) ~ arm + age + sex
+Model 2: log(alk.phos) ~ arm + ns(age, df = 2) + sex
+  Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
+1   1228 503.19                              
+2   1227 502.07  1    1.1137 2.7218 0.09924 .
 ---
-Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-> 
-> # look at functional form of age
-> termplot(fit3, term=2, se=T, rug=T)
-

+Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +> +> # look at functional form of age +> termplot(fit3, term=2, se=T, rug=T)
+

In this instance it looks like there isn’t enough evidence to say that the relationship is non-linear.

-

extract data using the broom package

+

Extract data using the broom package

The broom package makes it easy to extract information from the fit.

-
> tmp <- tidy(fit3) # coefficients, p-values
-> class(tmp)
-[1] "data.frame"
-> tmp
+
> tmp <- tidy(fit3) # coefficients, p-values
+> class(tmp)
+[1] "data.frame"
+> tmp
               term    estimate  std.error statistic       p.value
-1      (Intercept)  4.76454026 0.14102237 33.785704 1.928465e-177
-2     armF: FOLFOX -0.07668790 0.04344412 -1.765208  7.777754e-02
-3       armG: IROX -0.01945575 0.04906984 -0.396491  6.918118e-01
-4 ns(age, df = 2)1  0.33031939 0.26002425  1.270341  2.042041e-01
-5 ns(age, df = 2)2 -0.10069469 0.09349337 -1.077025  2.816809e-01
-6        sexFemale  0.01829092 0.03742970  0.488674  6.251598e-01
-> 
-> glance(fit3)
+1      (Intercept)  4.76454026 0.14102237 33.785704 1.928465e-177
+2     armF: FOLFOX -0.07668790 0.04344412 -1.765208  7.777754e-02
+3       armG: IROX -0.01945575 0.04906984 -0.396491  6.918118e-01
+4 ns(age, df = 2)1  0.33031939 0.26002425  1.270341  2.042041e-01
+5 ns(age, df = 2)2 -0.10069469 0.09349337 -1.077025  2.816809e-01
+6        sexFemale  0.01829092 0.03742970  0.488674  6.251598e-01
+> 
+> glance(fit3)
   r.squared adj.r.squared     sigma statistic   p.value df    logLik
-1 0.0053278   0.001274531 0.6396787  1.314445 0.2552466  6 -1195.653
+1 0.0053278   0.001274531 0.6396787  1.314445 0.2552466  6 -1195.653
        AIC      BIC deviance df.residual
-1 2405.305 2441.126 502.0747        1227
+1 2405.305 2441.126 502.0747 1227
-

create a summary table using modelsum

-
> ms.logy <- modelsum(log(alk.phos) ~ arm + ps + hgb, data=mockstudy, adjust= ~age + sex, 
-+                     family=gaussian,  
-+                     gaussian.stats=c("estimate","CI.lower.estimate","CI.upper.estimate","p.value"))
-> summary(ms.logy)
+

Create a summary table using modelsum

+
> ms.logy <- modelsum(log(alk.phos) ~ arm + ps + hgb, data=mockstudy, adjust= ~age + sex, 
++                     family=gaussian,  
++                     gaussian.stats=c("estimate","CI.lower.estimate","CI.upper.estimate","p.value"))
+> summary(ms.logy)
@@ -634,51 +592,51 @@

create a summary table using modelsum

Binomial

-

fit and summarize logistic regression model

-
> boxplot(age ~ mdquality.s, data=mockstudy, ylab=attr(mockstudy$age,'label'), xlab='mdquality.s')
-

-
> 
-> fit <- glm(mdquality.s ~ age + sex, data=mockstudy, family=binomial)
-> summary(fit)
+

Fit and summarize logistic regression model

+
> boxplot(age ~ mdquality.s, data=mockstudy, ylab=attr(mockstudy$age,'label'), xlab='mdquality.s')
+

+
> 
+> fit <- glm(mdquality.s ~ age + sex, data=mockstudy, family=binomial)
+> summary(fit)
 
 Call:
-glm(formula = mdquality.s ~ age + sex, family = binomial, data = mockstudy)
+glm(formula = mdquality.s ~ age + sex, family = binomial, data = mockstudy)
 
-Deviance Residuals: 
-    Min       1Q   Median       3Q      Max  
--2.1832   0.4500   0.4569   0.4626   0.4756  
+Deviance Residuals: 
+    Min       1Q   Median       3Q      Max  
+-2.1832   0.4500   0.4569   0.4626   0.4756  
 
 Coefficients:
-             Estimate Std. Error z value Pr(>|z|)    
-(Intercept)  2.329442   0.514684   4.526 6.01e-06 ***
-age         -0.002353   0.008256  -0.285    0.776    
-sexFemale    0.039227   0.195330   0.201    0.841    
+             Estimate Std. Error z value Pr(>|z|)    
+(Intercept)  2.329442   0.514684   4.526 6.01e-06 ***
+age         -0.002353   0.008256  -0.285    0.776    
+sexFemale    0.039227   0.195330   0.201    0.841    
 ---
-Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
-(Dispersion parameter for binomial family taken to be 1)
+(Dispersion parameter for binomial family taken to be 1)
 
-    Null deviance: 807.68  on 1246  degrees of freedom
-Residual deviance: 807.55  on 1244  degrees of freedom
-  (252 observations deleted due to missingness)
-AIC: 813.55
+    Null deviance: 807.68  on 1246  degrees of freedom
+Residual deviance: 807.55  on 1244  degrees of freedom
+  (252 observations deleted due to missingness)
+AIC: 813.55
 
-Number of Fisher Scoring iterations: 4
-> 
-> # create Odd's ratio w/ confidence intervals
-> tmp <- data.frame(summary(fit)$coef)
-> tmp
+Number of Fisher Scoring iterations: 4
+> 
+> # create Odd's ratio w/ confidence intervals
+> tmp <- data.frame(summary(fit)$coef)
+> tmp
                 Estimate  Std..Error    z.value     Pr...z..
-(Intercept)  2.329441734 0.514683688  4.5259677 6.011977e-06
-age         -0.002353404 0.008255814 -0.2850602 7.755980e-01
-sexFemale    0.039227292 0.195330166  0.2008256 8.408350e-01
-> 
-> tmp$OR <- round(exp(tmp[,1]),2)
-> tmp$lower.CI <- round(exp(tmp[,1] - 1.96* tmp[,2]),2)
-> tmp$upper.CI <- round(exp(tmp[,1] + 1.96* tmp[,2]),2)
-> names(tmp)[4] <- 'P-value'
-> 
-> kable(tmp[,c('OR','lower.CI','upper.CI','P-value')])
+(Intercept) 2.329441734 0.514683688 4.5259677 6.011977e-06 +age -0.002353404 0.008255814 -0.2850602 7.755980e-01 +sexFemale 0.039227292 0.195330166 0.2008256 8.408350e-01 +> +> tmp$OR <- round(exp(tmp[,1]),2) +> tmp$lower.CI <- round(exp(tmp[,1] - 1.96* tmp[,2]),2) +> tmp$upper.CI <- round(exp(tmp[,1] + 1.96* tmp[,2]),2) +> names(tmp)[4] <- 'P-value' +> +> kable(tmp[,c('OR','lower.CI','upper.CI','P-value')])
@@ -713,37 +671,37 @@

fit and summarize logistic regression model

-
> 
-> # Assess the predictive ability of the model
-> 
-> # code using the pROC package
-> require(pROC)
-> pred <- predict(fit, type='response')
-> tmp <- pROC::roc(mockstudy$mdquality.s[!is.na(mockstudy$mdquality.s)]~ pred, plot=TRUE, percent=TRUE)
-

-
> tmp$auc
-Area under the curve: 50.69%
+
> 
+> # Assess the predictive ability of the model
+> 
+> # code using the pROC package
+> require(pROC)
+> pred <- predict(fit, type='response')
+> tmp <- pROC::roc(mockstudy$mdquality.s[!is.na(mockstudy$mdquality.s)]~ pred, plot=TRUE, percent=TRUE)
+

+
> tmp$auc
+Area under the curve: 50.69%
-

extract data using broom package

+

Extract data using broom package

The broom package makes it easy to extract information from the fit.

-
> tidy(fit, exp=T, conf.int=T) # coefficients, p-values, conf.intervals
+
> tidy(fit, exp=T, conf.int=T) # coefficients, p-values, conf.intervals
          term   estimate   std.error  statistic      p.value  conf.low
-1 (Intercept) 10.2722053 0.514683688  4.5259677 6.011977e-06 3.8305925
-2         age  0.9976494 0.008255814 -0.2850602 7.755980e-01 0.9814436
-3   sexFemale  1.0400068 0.195330166  0.2008256 8.408350e-01 0.7119068
+1 (Intercept) 10.2722053 0.514683688  4.5259677 6.011977e-06 3.8305925
+2         age  0.9976494 0.008255814 -0.2850602 7.755980e-01 0.9814436
+3   sexFemale  1.0400068 0.195330166  0.2008256 8.408350e-01 0.7119068
   conf.high
-1 28.876261
-2  1.013764
-3  1.533763
-> 
-> glance(fit) # model summary statistics
+1 28.876261
+2  1.013764
+3  1.533763
+> 
+> glance(fit) # model summary statistics
   null.deviance df.null    logLik      AIC      BIC deviance df.residual
-1      807.6764    1246 -403.7734 813.5468 828.9323 807.5468        1244
+1 807.6764 1246 -403.7734 813.5468 828.9323 807.5468 1244
-

create a summary table using modelsum

-
> summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial))
+

Create a summary table using modelsum

+
> summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial))
@@ -822,10 +780,10 @@

create a summary table using modelsum

-
> 
-> fitall <- modelsum(mdquality.s ~ age, data=mockstudy, family=binomial,
-+                    binomial.stats=c("Nmiss2","OR","p.value"))
-> summary(fitall)
+
> 
+> fitall <- modelsum(mdquality.s ~ age, data=mockstudy, family=binomial,
++                    binomial.stats=c("Nmiss2","OR","p.value"))
+> summary(fitall)
@@ -861,121 +819,121 @@

create a summary table using modelsum

Survival

-

fit and summarize a Cox regression model

-
> require(survival)
-Loading required package: survival
-> 
-> # multivariable model with all 3 terms
-> fit  <- coxph(Surv(fu.time, fu.stat) ~ age + sex + arm, data=mockstudy)
-> summary(fit)
+

Fit and summarize a Cox regression model

+
> require(survival)
+Loading required package: survival
+> 
+> # multivariable model with all 3 terms
+> fit  <- coxph(Surv(fu.time, fu.stat) ~ age + sex + arm, data=mockstudy)
+> summary(fit)
 Call:
-coxph(formula = Surv(fu.time, fu.stat) ~ age + sex + arm, data = mockstudy)
+coxph(formula = Surv(fu.time, fu.stat) ~ age + sex + arm, data = mockstudy)
 
-  n= 1499, number of events= 1356 
+  n= 1499, number of events= 1356 
 
-                  coef exp(coef)  se(coef)      z Pr(>|z|)    
-age           0.004600  1.004611  0.002501  1.839   0.0659 .  
-sexFemale     0.039893  1.040699  0.056039  0.712   0.4765    
-armF: FOLFOX -0.454650  0.634670  0.064878 -7.008 2.42e-12 ***
-armG: IROX   -0.140785  0.868676  0.072760 -1.935   0.0530 .  
+                  coef exp(coef)  se(coef)      z Pr(>|z|)    
+age           0.004600  1.004611  0.002501  1.839   0.0659 .  
+sexFemale     0.039893  1.040699  0.056039  0.712   0.4765    
+armF: FOLFOX -0.454650  0.634670  0.064878 -7.008 2.42e-12 ***
+armG: IROX   -0.140785  0.868676  0.072760 -1.935   0.0530 .  
 ---
-Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-
-             exp(coef) exp(-coef) lower .95 upper .95
-age             1.0046     0.9954    0.9997    1.0095
-sexFemale       1.0407     0.9609    0.9324    1.1615
-armF: FOLFOX    0.6347     1.5756    0.5589    0.7207
-armG: IROX      0.8687     1.1512    0.7532    1.0018
-
-Concordance= 0.563  (se = 0.009 )
-Rsquare= 0.037   (max possible= 1 )
-Likelihood ratio test= 56.21  on 4 df,   p=1.811e-11
-Wald test            = 56.26  on 4 df,   p=1.77e-11
-Score (logrank) test = 56.96  on 4 df,   p=1.259e-11
-> 
-> # check proportional hazards assumption
-> fit.z <- cox.zph(fit)
-> fit.z
+Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+
+             exp(coef) exp(-coef) lower .95 upper .95
+age             1.0046     0.9954    0.9997    1.0095
+sexFemale       1.0407     0.9609    0.9324    1.1615
+armF: FOLFOX    0.6347     1.5756    0.5589    0.7207
+armG: IROX      0.8687     1.1512    0.7532    1.0018
+
+Concordance= 0.563  (se = 0.009 )
+Rsquare= 0.037   (max possible= 1 )
+Likelihood ratio test= 56.21  on 4 df,   p=1.811e-11
+Wald test            = 56.26  on 4 df,   p=1.77e-11
+Score (logrank) test = 56.96  on 4 df,   p=1.259e-11
+> 
+> # check proportional hazards assumption
+> fit.z <- cox.zph(fit)
+> fit.z
                  rho chisq     p
-age          -0.0311  1.46 0.226
-sexFemale    -0.0325  1.44 0.230
-armF: FOLFOX  0.0343  1.61 0.205
-armG: IROX    0.0337  1.54 0.214
-GLOBAL            NA  4.59 0.332
-> plot(fit.z[1], resid=FALSE) # makes for a cleaner picture in this case
-> abline(h=coef(fit)[1], col='red')
-

-
> 
-> # check functional form for age using pspline (penalized spline)
-> # results are returned for the linear and non-linear components
-> fit2 <- coxph(Surv(fu.time, fu.stat) ~ pspline(age) + sex + arm, data=mockstudy)
-> fit2
+age          -0.0311  1.46 0.226
+sexFemale    -0.0325  1.44 0.230
+armF: FOLFOX  0.0343  1.61 0.205
+armG: IROX    0.0337  1.54 0.214
+GLOBAL            NA  4.59 0.332
+> plot(fit.z[1], resid=FALSE) # makes for a cleaner picture in this case
+> abline(h=coef(fit)[1], col='red')
+

+
> 
+> # check functional form for age using pspline (penalized spline)
+> # results are returned for the linear and non-linear components
+> fit2 <- coxph(Surv(fu.time, fu.stat) ~ pspline(age) + sex + arm, data=mockstudy)
+> fit2
 Call:
-coxph(formula = Surv(fu.time, fu.stat) ~ pspline(age) + sex + 
-    arm, data = mockstudy)
-
-                         coef se(coef)      se2    Chisq   DF       p
-pspline(age), linear  0.00443  0.00237  0.00237  3.48989 1.00  0.0617
-pspline(age), nonlin                            13.11270 3.08  0.0047
-sexFemale             0.03993  0.05610  0.05607  0.50663 1.00  0.4766
-armF: FOLFOX         -0.46240  0.06494  0.06493 50.69608 1.00 1.1e-12
-armG: IROX           -0.15243  0.07301  0.07299  4.35876 1.00  0.0368
-
-Iterations: 6 outer, 16 Newton-Raphson
-     Theta= 0.954 
-Degrees of freedom for terms= 4.1 1.0 2.0 
-Likelihood ratio test=70.1  on 7.08 df, p=1.59e-12  n= 1499 
-> 
-> # plot smoothed age to visualize why significant
-> termplot(fit2, se=T, terms=1)
-> abline(h=0)
-

-
> 
-> # The c-statistic comes out in the summary of the fit
-> summary(fit2)$concordance
-          C       se(C) 
-0.568432549 0.008779125 
-> 
-> # It can also be calculated using the survConcordance function
-> survConcordance(Surv(fu.time, fu.stat) ~ predict(fit2), data=mockstudy)
+coxph(formula = Surv(fu.time, fu.stat) ~ pspline(age) + sex + 
+    arm, data = mockstudy)
+
+                         coef se(coef)      se2    Chisq   DF       p
+pspline(age), linear  0.00443  0.00237  0.00237  3.48989 1.00  0.0617
+pspline(age), nonlin                            13.11270 3.08  0.0047
+sexFemale             0.03993  0.05610  0.05607  0.50663 1.00  0.4766
+armF: FOLFOX         -0.46240  0.06494  0.06493 50.69608 1.00 1.1e-12
+armG: IROX           -0.15243  0.07301  0.07299  4.35876 1.00  0.0368
+
+Iterations: 6 outer, 16 Newton-Raphson
+     Theta= 0.954 
+Degrees of freedom for terms= 4.1 1.0 2.0 
+Likelihood ratio test=70.1  on 7.08 df, p=1.59e-12  n= 1499 
+> 
+> # plot smoothed age to visualize why significant
+> termplot(fit2, se=T, terms=1)
+> abline(h=0)
+

+
> 
+> # The c-statistic comes out in the summary of the fit
+> summary(fit2)$concordance
+          C       se(C) 
+0.568432549 0.008779125 
+> 
+> # It can also be calculated using the survConcordance function
+> survConcordance(Surv(fu.time, fu.stat) ~ predict(fit2), data=mockstudy)
 Call:
-survConcordance(formula = Surv(fu.time, fu.stat) ~ predict(fit2), 
-    data = mockstudy)
+survConcordance(formula = Surv(fu.time, fu.stat) ~ predict(fit2), 
+    data = mockstudy)
 
-  n= 1499 
-Concordance= 0.5684325 se= 0.008779125
-concordant discordant  tied.risk  tied.time   std(c-d) 
- 620221.00  470282.00    5021.00     766.00   19235.49 
+ n= 1499 +Concordance= 0.5684325 se= 0.008779125 +concordant discordant tied.risk tied.time std(c-d) + 620221.00 470282.00 5021.00 766.00 19235.49
-

extract data using broom package

+

Extract data using broom package

The broom package makes it easy to extract information from the fit.

-
> tidy(fit) # coefficients, p-values
+
> tidy(fit) # coefficients, p-values
           term     estimate   std.error  statistic      p.value
-1          age  0.004600011 0.002501114  1.8391844 6.588807e-02
-2    sexFemale  0.039892735 0.056038632  0.7118792 4.765396e-01
-3 armF: FOLFOX -0.454650445 0.064878289 -7.0077441 2.421952e-12
-4   armG: IROX -0.140784996 0.072759529 -1.9349355 5.299821e-02
+1          age  0.004600011 0.002501114  1.8391844 6.588807e-02
+2    sexFemale  0.039892735 0.056038632  0.7118792 4.765396e-01
+3 armF: FOLFOX -0.454650445 0.064878289 -7.0077441 2.421952e-12
+4   armG: IROX -0.140784996 0.072759529 -1.9349355 5.299821e-02
        conf.low    conf.high
-1 -0.0003020836  0.009502105
-2 -0.0699409642  0.149726435
-3 -0.5818095536 -0.327491336
-4 -0.2833910528  0.001821061
-> 
-> glance(fit) # model summary statistics
+1 -0.0003020836  0.009502105
+2 -0.0699409642  0.149726435
+3 -0.5818095536 -0.327491336
+4 -0.2833910528  0.001821061
+> 
+> glance(fit) # model summary statistics
      n nevent statistic.log  p.value.log statistic.sc   p.value.sc
-1 1499   1356      56.21071 1.811218e-11      56.9642 1.258749e-11
+1 1499   1356      56.21071 1.811218e-11      56.9642 1.258749e-11
   statistic.wald p.value.wald  r.squared r.squared.max concordance
-1          56.26 1.770173e-11 0.03680443     0.9999923    0.562838
+1          56.26 1.770173e-11 0.03680443     0.9999923    0.562838
   std.error.concordance    logLik      AIC      BIC
-1           0.008779125 -8797.588 17603.18 17624.03
+1 0.008779125 -8797.588 17603.18 17624.03
-

create a summary table using modelsum

-
> ##Note: You must use quotes when specifying family="survival" 
-> ##      family=survival will not work
-> summary(modelsum(Surv(fu.time, fu.stat) ~ arm, 
-+                  adjust=~age + sex, data=mockstudy, family="survival"))
+

Create a summary table using modelsum

+
> ##Note: You must use quotes when specifying family="survival" 
+> ##      family=survival will not work
+> summary(modelsum(Surv(fu.time, fu.stat) ~ arm, 
++                  adjust=~age + sex, data=mockstudy, family="survival"))
@@ -1030,10 +988,10 @@

create a summary table using modelsum

-
> 
-> ##Note: the pspline term is not working yet
-> #summary(modelsum(Surv(fu.time, fu.stat) ~ arm, 
-> #                adjust=~pspline(age) + sex, data=mockstudy, family='survival'))
+
> 
+> ##Note: the pspline term is not working yet
+> #summary(modelsum(Surv(fu.time, fu.stat) ~ arm, 
+> #                adjust=~pspline(age) + sex, data=mockstudy, family='survival'))
@@ -1042,112 +1000,112 @@

Poisson

Example 1: fit and summarize a Poisson regression model

For the first example, use the solder dataset available in the rpart package. The endpoint skips has a definite Poisson look.

-
> require(rpart) ##just to get access to solder dataset
-> data(solder)
-> hist(solder$skips)
-

-
> 
-> fit <- glm(skips ~ Opening + Solder + Mask , data=solder, family=poisson)
-> anova(fit, test='Chi')
+
> require(rpart) ##just to get access to solder dataset
+> data(solder)
+> hist(solder$skips)
+

+
> 
+> fit <- glm(skips ~ Opening + Solder + Mask , data=solder, family=poisson)
+> anova(fit, test='Chi')
 Analysis of Deviance Table
 
-Model: poisson, link: log
+Model: poisson, link: log
 
-Response: skips
+Response: skips
 
-Terms added sequentially (first to last)
+Terms added sequentially (first to last)
 
-        Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
-NULL                      719     6855.7              
-Opening  2  2524.56       717     4331.1 < 2.2e-16 ***
-Solder   1   936.95       716     3394.2 < 2.2e-16 ***
-Mask     3  1653.09       713     1741.1 < 2.2e-16 ***
+        Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
+NULL                      719     6855.7              
+Opening  2  2524.56       717     4331.1 < 2.2e-16 ***
+Solder   1   936.95       716     3394.2 < 2.2e-16 ***
+Mask     3  1653.09       713     1741.1 < 2.2e-16 ***
 ---
-Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-> summary(fit)
+Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+> summary(fit)
 
 Call:
-glm(formula = skips ~ Opening + Solder + Mask, family = poisson, 
-    data = solder)
+glm(formula = skips ~ Opening + Solder + Mask, family = poisson, 
+    data = solder)
 
-Deviance Residuals: 
-    Min       1Q   Median       3Q      Max  
--4.7252  -1.3409  -0.6276   0.6930   5.2342  
+Deviance Residuals: 
+    Min       1Q   Median       3Q      Max  
+-4.7252  -1.3409  -0.6276   0.6930   5.2342  
 
 Coefficients:
-            Estimate Std. Error z value Pr(>|z|)    
-(Intercept) -1.30871    0.08068 -16.222  < 2e-16 ***
-OpeningM     0.25851    0.06656   3.884 0.000103 ***
-OpeningS     1.89349    0.05363  35.306  < 2e-16 ***
-SolderThin   1.09973    0.03864  28.465  < 2e-16 ***
-MaskA3       0.42819    0.07547   5.674  1.4e-08 ***
-MaskB3       1.20225    0.06697  17.953  < 2e-16 ***
-MaskB6       1.86648    0.06310  29.580  < 2e-16 ***
+            Estimate Std. Error z value Pr(>|z|)    
+(Intercept) -1.30871    0.08068 -16.222  < 2e-16 ***
+OpeningM     0.25851    0.06656   3.884 0.000103 ***
+OpeningS     1.89349    0.05363  35.306  < 2e-16 ***
+SolderThin   1.09973    0.03864  28.465  < 2e-16 ***
+MaskA3       0.42819    0.07547   5.674  1.4e-08 ***
+MaskB3       1.20225    0.06697  17.953  < 2e-16 ***
+MaskB6       1.86648    0.06310  29.580  < 2e-16 ***
 ---
-Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
-(Dispersion parameter for poisson family taken to be 1)
+(Dispersion parameter for poisson family taken to be 1)
 
-    Null deviance: 6855.7  on 719  degrees of freedom
-Residual deviance: 1741.1  on 713  degrees of freedom
-AIC: 3337.2
+    Null deviance: 6855.7  on 719  degrees of freedom
+Residual deviance: 1741.1  on 713  degrees of freedom
+AIC: 3337.2
 
-Number of Fisher Scoring iterations: 5
+Number of Fisher Scoring iterations: 5

Overdispersion is when the Residual deviance is larger than the degrees of freedom. This can be tested, approximately using the following code. The goal is to have a p-value that is \(>0.05\).

-
> 1-pchisq(fit$deviance, fit$df.residual)
-[1] 0
+
> 1-pchisq(fit$deviance, fit$df.residual)
+[1] 0

One possible solution is to use the quasipoisson family instead of the poisson family. This adjusts for the overdispersion.

-
> fit2 <- glm(skips ~ Opening + Solder + Mask, data=solder, family=quasipoisson)
-> summary(fit2)
+
> fit2 <- glm(skips ~ Opening + Solder + Mask, data=solder, family=quasipoisson)
+> summary(fit2)
 
 Call:
-glm(formula = skips ~ Opening + Solder + Mask, family = quasipoisson, 
-    data = solder)
+glm(formula = skips ~ Opening + Solder + Mask, family = quasipoisson, 
+    data = solder)
 
-Deviance Residuals: 
-    Min       1Q   Median       3Q      Max  
--4.7252  -1.3409  -0.6276   0.6930   5.2342  
+Deviance Residuals: 
+    Min       1Q   Median       3Q      Max  
+-4.7252  -1.3409  -0.6276   0.6930   5.2342  
 
 Coefficients:
-            Estimate Std. Error t value Pr(>|t|)    
-(Intercept) -1.30871    0.12496 -10.473  < 2e-16 ***
-OpeningM     0.25851    0.10310   2.507 0.012382 *  
-OpeningS     1.89349    0.08307  22.794  < 2e-16 ***
-SolderThin   1.09973    0.05984  18.377  < 2e-16 ***
-MaskA3       0.42819    0.11689   3.663 0.000268 ***
-MaskB3       1.20225    0.10372  11.591  < 2e-16 ***
-MaskB6       1.86648    0.09774  19.097  < 2e-16 ***
+            Estimate Std. Error t value Pr(>|t|)    
+(Intercept) -1.30871    0.12496 -10.473  < 2e-16 ***
+OpeningM     0.25851    0.10310   2.507 0.012382 *  
+OpeningS     1.89349    0.08307  22.794  < 2e-16 ***
+SolderThin   1.09973    0.05984  18.377  < 2e-16 ***
+MaskA3       0.42819    0.11689   3.663 0.000268 ***
+MaskB3       1.20225    0.10372  11.591  < 2e-16 ***
+MaskB6       1.86648    0.09774  19.097  < 2e-16 ***
 ---
-Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
-(Dispersion parameter for quasipoisson family taken to be 2.399074)
+(Dispersion parameter for quasipoisson family taken to be 2.399074)
 
-    Null deviance: 6855.7  on 719  degrees of freedom
-Residual deviance: 1741.1  on 713  degrees of freedom
-AIC: NA
+    Null deviance: 6855.7  on 719  degrees of freedom
+Residual deviance: 1741.1  on 713  degrees of freedom
+AIC: NA
 
-Number of Fisher Scoring iterations: 5
+Number of Fisher Scoring iterations: 5
-

extract data using broom package

+

Extract data using broom package

The broom package makes it easy to extract information from the fit.

-
> tidy(fit) # coefficients, p-values
+
> tidy(fit) # coefficients, p-values
          term   estimate  std.error  statistic       p.value
-1 (Intercept) -1.3087062 0.08067587 -16.221780  3.537930e-59
-2    OpeningM  0.2585107 0.06656163   3.883780  1.028452e-04
-3    OpeningS  1.8934884 0.05363137  35.305612 4.816124e-273
-4  SolderThin  1.0997315 0.03863508  28.464582 3.216362e-178
-5      MaskA3  0.4281934 0.07546810   5.673833  1.396375e-08
-6      MaskB3  1.2022472 0.06696662  17.952933  4.552147e-72
-7      MaskB6  1.8664830 0.06309987  29.579826 2.716304e-192
-> 
-> glance(fit) # model summary statistics
+1 (Intercept) -1.3087062 0.08067587 -16.221780  3.537930e-59
+2    OpeningM  0.2585107 0.06656163   3.883780  1.028452e-04
+3    OpeningS  1.8934884 0.05363137  35.305612 4.816124e-273
+4  SolderThin  1.0997315 0.03863508  28.464582 3.216362e-178
+5      MaskA3  0.4281934 0.07546810   5.673833  1.396375e-08
+6      MaskB3  1.2022472 0.06696662  17.952933  4.552147e-72
+7      MaskB6  1.8664830 0.06309987  29.579826 2.716304e-192
+> 
+> glance(fit) # model summary statistics
   null.deviance df.null    logLik      AIC      BIC deviance df.residual
-1       6855.69     719 -1661.623 3337.247 3369.302  1741.08         713
+1 6855.69 719 -1661.623 3337.247 3369.302 1741.08 713
-

create a summary table using modelsum

-
> summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="quasipoisson"))
+

Create a summary table using modelsum

+
> summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="quasipoisson"))
@@ -1231,7 +1189,7 @@

create a summary table using modelsum

-
> summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="poisson"))
+
> summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="poisson"))
@@ -1319,101 +1277,101 @@

create a summary table using modelsum

Example 2: fit and summarize a Poisson regression model

This second example uses the survival endpoint available in the mockstudy dataset. There is a close relationship between survival and Poisson models, and often it is easier to fit the model using Poisson regression, especially if you want to present absolute risk.

-
> # add .01 to the follow-up time (.01*1 day) in order to keep everyone in the analysis
-> fit <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, data=mockstudy, family=poisson)
-> summary(fit)
+
> # add .01 to the follow-up time (.01*1 day) in order to keep everyone in the analysis
+> fit <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, data=mockstudy, family=poisson)
+> summary(fit)
 
 Call:
-glm(formula = fu.stat ~ offset(log(fu.time + 0.01)) + age + sex + 
-    arm, family = poisson, data = mockstudy)
+glm(formula = fu.stat ~ offset(log(fu.time + 0.01)) + age + sex + 
+    arm, family = poisson, data = mockstudy)
 
-Deviance Residuals: 
-    Min       1Q   Median       3Q      Max  
--3.1188  -0.4041   0.3242   0.9727   4.3588  
+Deviance Residuals: 
+    Min       1Q   Median       3Q      Max  
+-3.1188  -0.4041   0.3242   0.9727   4.3588  
 
 Coefficients:
-              Estimate Std. Error z value Pr(>|z|)    
-(Intercept)  -5.875627   0.108984 -53.913  < 2e-16 ***
-age           0.003724   0.001705   2.184   0.0290 *  
-sexFemale     0.027321   0.038575   0.708   0.4788    
-armF: FOLFOX -0.335141   0.044600  -7.514 5.72e-14 ***
-armG: IROX   -0.107776   0.050643  -2.128   0.0333 *  
+              Estimate Std. Error z value Pr(>|z|)    
+(Intercept)  -5.875627   0.108984 -53.913  < 2e-16 ***
+age           0.003724   0.001705   2.184   0.0290 *  
+sexFemale     0.027321   0.038575   0.708   0.4788    
+armF: FOLFOX -0.335141   0.044600  -7.514 5.72e-14 ***
+armG: IROX   -0.107776   0.050643  -2.128   0.0333 *  
 ---
-Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-
-(Dispersion parameter for poisson family taken to be 1)
-
-    Null deviance: 2113.5  on 1498  degrees of freedom
-Residual deviance: 2048.0  on 1494  degrees of freedom
-AIC: 5888.2
-
-Number of Fisher Scoring iterations: 5
-> 1-pchisq(fit$deviance, fit$df.residual)
-[1] 0
-> 
-> coef(coxph(Surv(fu.time,fu.stat) ~ age + sex + arm, data=mockstudy))
-         age    sexFemale armF: FOLFOX   armG: IROX 
- 0.004600011  0.039892735 -0.454650445 -0.140784996 
-> coef(fit)[-1]
-         age    sexFemale armF: FOLFOX   armG: IROX 
- 0.003723763  0.027320917 -0.335141090 -0.107775577 
-> 
-> # results from the Poisson model can then be described as risk ratios (similar to the hazard ratio)
-> exp(coef(fit)[-1])
-         age    sexFemale armF: FOLFOX   armG: IROX 
-   1.0037307    1.0276976    0.7152372    0.8978291 
-> 
-> # As before, we can model the dispersion which alters the standard error
-> fit2 <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, 
-+             data=mockstudy, family=quasipoisson)
-> summary(fit2)
+Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+
+(Dispersion parameter for poisson family taken to be 1)
+
+    Null deviance: 2113.5  on 1498  degrees of freedom
+Residual deviance: 2048.0  on 1494  degrees of freedom
+AIC: 5888.2
+
+Number of Fisher Scoring iterations: 5
+> 1-pchisq(fit$deviance, fit$df.residual)
+[1] 0
+> 
+> coef(coxph(Surv(fu.time,fu.stat) ~ age + sex + arm, data=mockstudy))
+         age    sexFemale armF: FOLFOX   armG: IROX 
+ 0.004600011  0.039892735 -0.454650445 -0.140784996 
+> coef(fit)[-1]
+         age    sexFemale armF: FOLFOX   armG: IROX 
+ 0.003723763  0.027320917 -0.335141090 -0.107775577 
+> 
+> # results from the Poisson model can then be described as risk ratios (similar to the hazard ratio)
+> exp(coef(fit)[-1])
+         age    sexFemale armF: FOLFOX   armG: IROX 
+   1.0037307    1.0276976    0.7152372    0.8978291 
+> 
+> # As before, we can model the dispersion which alters the standard error
+> fit2 <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, 
++             data=mockstudy, family=quasipoisson)
+> summary(fit2)
 
 Call:
-glm(formula = fu.stat ~ offset(log(fu.time + 0.01)) + age + sex + 
-    arm, family = quasipoisson, data = mockstudy)
+glm(formula = fu.stat ~ offset(log(fu.time + 0.01)) + age + sex + 
+    arm, family = quasipoisson, data = mockstudy)
 
-Deviance Residuals: 
-    Min       1Q   Median       3Q      Max  
--3.1188  -0.4041   0.3242   0.9727   4.3588  
+Deviance Residuals: 
+    Min       1Q   Median       3Q      Max  
+-3.1188  -0.4041   0.3242   0.9727   4.3588  
 
 Coefficients:
-              Estimate Std. Error t value Pr(>|t|)    
-(Intercept)  -5.875627   0.566666 -10.369   <2e-16 ***
-age           0.003724   0.008867   0.420    0.675    
-sexFemale     0.027321   0.200572   0.136    0.892    
-armF: FOLFOX -0.335141   0.231899  -1.445    0.149    
-armG: IROX   -0.107776   0.263318  -0.409    0.682    
+              Estimate Std. Error t value Pr(>|t|)    
+(Intercept)  -5.875627   0.566666 -10.369   <2e-16 ***
+age           0.003724   0.008867   0.420    0.675    
+sexFemale     0.027321   0.200572   0.136    0.892    
+armF: FOLFOX -0.335141   0.231899  -1.445    0.149    
+armG: IROX   -0.107776   0.263318  -0.409    0.682    
 ---
-Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
-(Dispersion parameter for quasipoisson family taken to be 27.03493)
+(Dispersion parameter for quasipoisson family taken to be 27.03493)
 
-    Null deviance: 2113.5  on 1498  degrees of freedom
-Residual deviance: 2048.0  on 1494  degrees of freedom
-AIC: NA
+    Null deviance: 2113.5  on 1498  degrees of freedom
+Residual deviance: 2048.0  on 1494  degrees of freedom
+AIC: NA
 
-Number of Fisher Scoring iterations: 5
+Number of Fisher Scoring iterations: 5
-

extract data using broom package

+

Extract data using broom package

The broom package makes it easy to extract information from the fit.

-
> tidy(fit) ##coefficients, p-values
+
> tidy(fit) ##coefficients, p-values
           term     estimate   std.error   statistic      p.value
-1  (Intercept) -5.875626610 0.108984423 -53.9125359 0.000000e+00
-2          age  0.003723763 0.001705363   2.1835606 2.899455e-02
-3    sexFemale  0.027320917 0.038575062   0.7082533 4.787879e-01
-4 armF: FOLFOX -0.335141090 0.044600079  -7.5143610 5.718959e-14
-5   armG: IROX -0.107775577 0.050642805  -2.1281518 3.332450e-02
-> 
-> glance(fit) ##model summary statistics
+1  (Intercept) -5.875626610 0.108984423 -53.9125359 0.000000e+00
+2          age  0.003723763 0.001705363   2.1835606 2.899455e-02
+3    sexFemale  0.027320917 0.038575062   0.7082533 4.787879e-01
+4 armF: FOLFOX -0.335141090 0.044600079  -7.5143610 5.718959e-14
+5   armG: IROX -0.107775577 0.050642805  -2.1281518 3.332450e-02
+> 
+> glance(fit) ##model summary statistics
   null.deviance df.null    logLik      AIC      BIC deviance df.residual
-1      2113.504    1498 -2939.082 5888.164 5914.727 2047.979        1494
+1 2113.504 1498 -2939.082 5888.164 5914.727 2047.979 1494
-

create a summary table using modelsum

+

Create a summary table using modelsum

Remember that the result from modelsum is different from the fit above. The modelsum summary shows the results for age + offset(log(fu.time+.01)) then sex + offset(log(fu.time+.01)) instead of age + sex + arm + offset(log(fu.time+.01)).

-
> summary(modelsum(fu.stat ~ age, adjust=~offset(log(fu.time+.01))+ sex + arm, 
-+                  data=mockstudy, family=poisson))
+
> summary(modelsum(fu.stat ~ age, adjust=~offset(log(fu.time+.01))+ sex + arm, 
++                  data=mockstudy, family=poisson))
@@ -1475,13 +1433,13 @@

create a summary table using modelsum

Additional Examples

Here are multiple examples showing how to use some of the different options.

-
-

1. Change summary statistics globally

+
+

1. Change summary statistics globally

There are standard settings for each type of model regarding what information is summarized in the table. This behavior can be modified using the modelsum.control function. In fact, you can save your standard settings and use that for future tables.

-
> mycontrols  <- modelsum.control(gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"),
-+                                 show.adjust=FALSE, show.intercept=FALSE)                            
-> tab2 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy, control=mycontrols)
-> summary(tab2)
+
> mycontrols  <- modelsum.control(gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"),
++                                 show.adjust=FALSE, show.intercept=FALSE)                            
+> tab2 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy, control=mycontrols)
+> summary(tab2)
@@ -1507,10 +1465,10 @@

1. Change summary statistics globally

You can also change these settings directly in the modelsum call.

-
> tab3 <- modelsum(bmi ~  age, adjust=~sex, data=mockstudy,
-+                  gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"), 
-+                  show.intercept=FALSE, show.adjust=FALSE)
-> summary(tab3)
+
> tab3 <- modelsum(bmi ~  age, adjust=~sex, data=mockstudy,
++                  gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"), 
++                  show.intercept=FALSE, show.adjust=FALSE)
+> summary(tab3)
@@ -1536,42 +1494,42 @@

1. Change summary statistics globally

-
-

2. Add labels to independent variables

+
+

2. Add labels to independent variables

In the above example, age is shown with a label (Age in Years), but sex is listed “as is”. This is because the data was created in SAS and in the SAS dataset, age had a label but sex did not. The label is stored as an attribute within R.

-
> ## Look at one variable's label
-> attr(mockstudy$age,'label')
-[1] "Age in Years"
-> 
-> ## See all the variables with a label
-> unlist(lapply(mockstudy,'attr','label'))
+
> ## Look at one variable's label
+> attr(mockstudy$age,'label')
+[1] "Age in Years"
+> 
+> ## See all the variables with a label
+> unlist(lapply(mockstudy,'attr','label'))
                        age                        arm 
-            "Age in Years"            "Treatment Arm" 
+            "Age in Years"            "Treatment Arm" 
                       race                        bmi 
-                    "Race" "Body Mass Index (kg/m^2)" 
-> 
-> ## or
-> cbind(sapply(mockstudy,attr,'label'))
-            [,1]                      
-case        NULL                      
-age         "Age in Years"            
-arm         "Treatment Arm"           
-sex         NULL                      
-race        "Race"                    
-fu.time     NULL                      
-fu.stat     NULL                      
-ps          NULL                      
-hgb         NULL                      
-bmi         "Body Mass Index (kg/m^2)"
-alk.phos    NULL                      
-ast         NULL                      
-mdquality.s NULL                      
-age.ord     NULL                      
+ "Race" "Body Mass Index (kg/m^2)" +> +> ## or +> cbind(sapply(mockstudy,attr,'label')) + [,1] +case NULL +age "Age in Years" +arm "Treatment Arm" +sex NULL +race "Race" +fu.time NULL +fu.stat NULL +ps NULL +hgb NULL +bmi "Body Mass Index (kg/m^2)" +alk.phos NULL +ast NULL +mdquality.s NULL +age.ord NULL

If you want to add labels to other variables, there are a couple of options. First, you could add labels to the variables in your dataset.

-
> attr(mockstudy$age,'label')  <- 'Age, yrs'
-> 
-> tab1 <- modelsum(bmi ~  age, adjust=~sex, data=mockstudy)
-> summary(tab1)
+
> attr(mockstudy$age,'label')  <- 'Age, yrs'
+> 
+> tab1 <- modelsum(bmi ~  age, adjust=~sex, data=mockstudy)
+> summary(tab1)
@@ -1614,8 +1572,8 @@

2. Add labels to independent variables

Another option is to add labels after you have created the table

-
> mylabels <- list(sexFemale = "Female", age ="Age, yrs")
-> summary(tab1, labelTranslations = mylabels)
+
> mylabels <- list(sexFemale = "Female", age ="Age, yrs")
+> summary(tab1, labelTranslations = mylabels)
@@ -1658,18 +1616,18 @@

2. Add labels to independent variables

Alternatively, you can check the variable labels and manipulate them with a function called labels, which works on the tableby object.

-
> labels(tab1)
+
> labels(tab1)
                        bmi                        age 
-"Body Mass Index (kg/m^2)"                 "Age, yrs" 
+"Body Mass Index (kg/m^2)"                 "Age, yrs" 
                  sexFemale 
-              "sex Female" 
-> labels(tab1) <- c(sexFemale="Female", age="Baseline Age (yrs)")
-> labels(tab1)
+              "sex Female" 
+> labels(tab1) <- c(sexFemale="Female", age="Baseline Age (yrs)")
+> labels(tab1)
                        bmi                        age 
-"Body Mass Index (kg/m^2)"       "Baseline Age (yrs)" 
+"Body Mass Index (kg/m^2)"       "Baseline Age (yrs)" 
                  sexFemale 
-                  "Female" 
-
> summary(tab1)
+ "Female"
+
> summary(tab1)
@@ -1712,9 +1670,9 @@

2. Add labels to independent variables

-
-

2. Don’t show intercept values

-
> summary(modelsum(age~mdquality.s+sex, data=mockstudy), show.intercept=FALSE)
+
+

3. Don’t show intercept values

+
> summary(modelsum(age~mdquality.s+sex, data=mockstudy), show.intercept=FALSE)
@@ -1754,10 +1712,10 @@

2. Don’t show intercept values

-
-

3. Don’t show results for adjustment variables

-
> summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial),
-+         show.adjust=FALSE)  
+
+

4. Don’t show results for adjustment variables

+
> summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial),
++         show.adjust=FALSE)  
@@ -1819,24 +1777,24 @@

3. Don’t show results for adjustment variables

-
-

4. Summarize multiple variables without typing them out

+
+

5. Summarize multiple variables without typing them out

Often one wants to summarize a number of variables. Instead of typing by hand each individual variable, an alternative approach is to create a formula using the paste command with the collapse="+" option.

-
> # create a vector specifying the variable names
-> myvars <- names(mockstudy)
-> 
-> # select the 8th through the 12th
-> # paste them together, separated by the + sign
-> RHS <- paste(myvars[8:12], collapse="+")
-> RHS
+
> # create a vector specifying the variable names
+> myvars <- names(mockstudy)
+> 
+> # select the 8th through the 12th
+> # paste them together, separated by the + sign
+> RHS <- paste(myvars[8:12], collapse="+")
+> RHS

[1] “ps+hgb+bmi+alk.phos+ast”

-
> 
-> # create a formula using the as.formula function
-> as.formula(paste('mdquality.s ~ ', RHS))
+
> 
+> # create a formula using the as.formula function
+> as.formula(paste('mdquality.s ~ ', RHS))

mdquality.s ~ ps + hgb + bmi + alk.phos + ast

-
> 
-> # use the formula in the modelsum function
-> summary(modelsum(as.formula(paste('mdquality.s ~', RHS)), family=binomial, data=mockstudy))
+
> 
+> # use the formula in the modelsum function
+> summary(modelsum(as.formula(paste('mdquality.s ~', RHS)), family=binomial, data=mockstudy))
@@ -1952,16 +1910,16 @@

4. Summarize multiple variables without typing them out

These steps can also be done using the formulize function.

-
> ## The formulize function does the paste and as.formula steps
-> tmp <- formulize('mdquality.s',myvars[8:10])
-> tmp
-

mdquality.s ~ ps + hgb + bmi <environment: 0x676f4c0>

-
> 
-> ## More complex formulas could also be written using formulize
-> tmp2 <- formulize('mdquality.s',c('ps','hgb','sqrt(bmi)'))
-> 
-> ## use the formula in the modelsum function
-> summary(modelsum(tmp, data=mockstudy, family=binomial))
+
> ## The formulize function does the paste and as.formula steps
+> tmp <- formulize('mdquality.s',myvars[8:10])
+> tmp
+

mdquality.s ~ ps + hgb + bmi <environment: 0x7453118>

+
> 
+> ## More complex formulas could also be written using formulize
+> tmp2 <- formulize('mdquality.s',c('ps','hgb','sqrt(bmi)'))
+> 
+> ## use the formula in the modelsum function
+> summary(modelsum(tmp, data=mockstudy, family=binomial))
@@ -2041,24 +1999,24 @@

4. Summarize multiple variables without typing them out

-
-

5. Subset the dataset used in the analysis

+
+

6. Subset the dataset used in the analysis

Here are two ways to get the same result (limit the analysis to subjects age>50 and in the F: FOLFOX treatment group).

  • The first approach uses the subset function applied to the dataset mockstudy. This example also selects a subset of variables. The modelsum function is then applied to this subsetted data.
-
> newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(age,sex, bmi:alk.phos))
-> dim(mockstudy)
-[1] 1499   14
-> table(mockstudy$arm)
-
-   A: IFL F: FOLFOX   G: IROX 
-      428       691       380 
-> dim(newdata)
-[1] 557   4
-> names(newdata)
-[1] "age"      "sex"      "bmi"      "alk.phos"
-
> summary(modelsum(alk.phos ~ ., data=newdata))
+
> newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(age,sex, bmi:alk.phos))
+> dim(mockstudy)
+[1] 1499   14
+> table(mockstudy$arm)
+
+   A: IFL F: FOLFOX   G: IROX 
+      428       691       380 
+> dim(newdata)
+[1] 557   4
+> names(newdata)
+[1] "age"      "sex"      "bmi"      "alk.phos"
+
> summary(modelsum(alk.phos ~ ., data=newdata))
@@ -2132,7 +2090,7 @@

5. Subset the dataset used in the analysis

  • The second approach does the same analysis but uses the subset argument within modelsum to subset the data.
-
> summary(modelsum(log(alk.phos) ~ sex + ps + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy))
+
> summary(modelsum(log(alk.phos) ~ sex + ps + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy))
@@ -2203,7 +2161,7 @@

5. Subset the dataset used in the analysis

-
> summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset = age>50 & bmi<24, data=mockstudy))
+
> summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset = age>50 & bmi<24, data=mockstudy))
@@ -2266,7 +2224,7 @@

5. Subset the dataset used in the analysis

-
> summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset=1:30, data=mockstudy))
+
> summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset=1:30, data=mockstudy))
@@ -2338,14 +2296,14 @@

5. Subset the dataset used in the analysis

-
-

6. Create combinations of variables on the fly

-
> ## create a variable combining the levels of mdquality.s and sex
-> with(mockstudy, table(interaction(mdquality.s,sex)))
-
-  0.Male   1.Male 0.Female 1.Female 
-      77      686       47      437 
-
> summary(modelsum(age ~ interaction(mdquality.s,sex), data=mockstudy))
+
+

7. Create combinations of variables on the fly

+
> ## create a variable combining the levels of mdquality.s and sex
+> with(mockstudy, table(interaction(mdquality.s,sex)))
+
+  0.Male   1.Male 0.Female 1.Female 
+      77      686       47      437 
+
> summary(modelsum(age ~ interaction(mdquality.s,sex), data=mockstudy))
@@ -2401,11 +2359,11 @@

6. Create combinations of variables on the fly

-
-

9. Transform variables on the fly

+
+

8. Transform variables on the fly

Certain transformations need to be surrounded by I() so that R knows to treat it as a variable transformation and not some special model feature. If the transformation includes any of the symbols / - + ^ * then surround the new variable by I().

-
> summary(modelsum(arm=="F: FOLFOX" ~ I(age/10) + log(bmi) + mdquality.s,
-+                  data=mockstudy, family=binomial))
+
> summary(modelsum(arm=="F: FOLFOX" ~ I(age/10) + log(bmi) + mdquality.s,
++                  data=mockstudy, family=binomial))
@@ -2485,11 +2443,11 @@

9. Transform variables on the fly

-
-

10. Change the ordering of the variables or delete a variable

-
> mytab <- modelsum(bmi ~ sex + alk.phos + age, data=mockstudy)
-> mytab2 <- mytab[c('age','sex','alk.phos')]
-> summary(mytab2)
+
+

9. Change the ordering of the variables or delete a variable

+
> mytab <- modelsum(bmi ~ sex + alk.phos + age, data=mockstudy)
+> mytab2 <- mytab[c('age','sex','alk.phos')]
+> summary(mytab2)
@@ -2560,7 +2518,7 @@

10. Change the ordering of the variables or delete a variable

-
> summary(mytab[c('age','sex')])
+
> summary(mytab[c('age','sex')])
@@ -2609,7 +2567,7 @@

10. Change the ordering of the variables or delete a variable

-
> summary(mytab[c(3,1)])
+
> summary(mytab[c(3,1)])
@@ -2659,26 +2617,26 @@

10. Change the ordering of the variables or delete a variable

-
-

11. Merge two modelsum objects together

+
+

10. Merge two modelsum objects together

It is possible to combine two modelsum objects so that they print out together, however you need to pay attention to the columns that are being displayed. It is easier to combine two models of the same family (such as two sets of linear models). If you want to combine linear and logistic model results then you would want to display the beta coefficients for the logistic model.

-
> ## demographics
-> tab1 <- modelsum(bmi ~ sex + age, data=mockstudy)
-> ## lab data
-> tab2 <- modelsum(mdquality.s ~ hgb + alk.phos, data=mockstudy, family=binomial)
->                 
-> tab12 <- merge(tab1,tab2)
-> class(tab12)
+
> ## demographics
+> tab1 <- modelsum(bmi ~ sex + age, data=mockstudy)
+> ## lab data
+> tab2 <- modelsum(mdquality.s ~ hgb + alk.phos, data=mockstudy, family=binomial)
+>                 
+> tab12 <- merge(tab1,tab2)
+> class(tab12)

[1] “modelsumList”

-
> 
-> ##ERROR: The merge works, but not the summary
-> #summary(tab12)
+
> 
+> ##ERROR: The merge works, but not the summary
+> #summary(tab12)
-
-

12. Add a title to the table

+
+

11. Add a title to the table

When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table.

-
> t1 <- modelsum(bmi ~ sex + age, data=mockstudy)
-> summary(t1, title='Demographics')
+
> t1 <- modelsum(bmi ~ sex + age, data=mockstudy)
+> summary(t1, title='Demographics')
@@ -2729,24 +2687,24 @@

12. Add a title to the table

Demographics
-
-

13. Modify how missing values are treated

+
+

12. Modify how missing values are treated

Depending on the report you are writing you have the following options:

    -
  • Use all values available for each variable
  • -
  • Use only those subjects who have measurements available for all the variables
  • +
  • Use all values available for each variable

  • +
  • Use only those subjects who have measurements available for all the variables

-
> ## look at how many missing values there are for each variable
-> apply(is.na(mockstudy),2,sum)
+
> ## look at how many missing values there are for each variable
+> apply(is.na(mockstudy),2,sum)
        case         age         arm         sex        race     fu.time 
-          0           0           0           0           7           0 
+          0           0           0           0           7           0 
     fu.stat          ps         hgb         bmi    alk.phos         ast 
-          0         266         266          33         266         266 
+          0         266         266          33         266         266 
 mdquality.s     age.ord 
-        252           0 
-
> ## Show how many subjects have each variable (non-missing)
-> summary(modelsum(bmi ~ ast + age, data=mockstudy,
-+                 control=modelsum.control(gaussian.stats=c("N","estimate"))))
+ 252 0
+
> ## Show how many subjects have each variable (non-missing)
+> summary(modelsum(bmi ~ ast + age, data=mockstudy,
++                 control=modelsum.control(gaussian.stats=c("N","estimate"))))
@@ -2783,10 +2741,10 @@

13. Modify how missing values are treated

-
> 
-> ## Always list the number of missing values
-> summary(modelsum(bmi ~ ast + age, data=mockstudy,
-+                 control=modelsum.control(gaussian.stats=c("Nmiss2","estimate"))))
+
> 
+> ## Always list the number of missing values
+> summary(modelsum(bmi ~ ast + age, data=mockstudy,
++                 control=modelsum.control(gaussian.stats=c("Nmiss2","estimate"))))
@@ -2823,10 +2781,10 @@

13. Modify how missing values are treated

-
> 
-> ## Only show the missing values if there are some (default)
-> summary(modelsum(bmi ~ ast + age, data=mockstudy, 
-+                 control=modelsum.control(gaussian.stats=c("Nmiss","estimate"))))
+
> 
+> ## Only show the missing values if there are some (default)
+> summary(modelsum(bmi ~ ast + age, data=mockstudy, 
++                 control=modelsum.control(gaussian.stats=c("Nmiss","estimate"))))
@@ -2863,10 +2821,10 @@

13. Modify how missing values are treated

-
> 
-> ## Don't show N at all
-> summary(modelsum(bmi ~ ast + age, data=mockstudy, 
-+                 control=modelsum.control(gaussian.stats=c("estimate"))))
+
> 
+> ## Don't show N at all
+> summary(modelsum(bmi ~ ast + age, data=mockstudy, 
++                 control=modelsum.control(gaussian.stats=c("estimate"))))
@@ -2898,16 +2856,16 @@

13. Modify how missing values are treated

-
-

14. Modify the number of digits used

+
+

13. Modify the number of digits used

Within modelsum.control function there are 4 options for controlling the number of significant digits shown.

    -
  • digits: controls the number of significant digits (counting both before and after the decimal point) for continuous variables
  • -
  • nsmall: controls the number of digits after the decimal point for the beta and standard error
  • -
  • nsmall.ratio: controls the number of digits for the ratio statistics (OR, HR, RR), default=2
  • -
  • digits.test: controls the number of digits after the decimal point for p-values (default=3)
  • +
  • digits: controls the number of significant digits (counting both before and after the decimal point) for continuous variables

  • +
  • nsmall: controls the number of digits after the decimal point for the beta and standard error

  • +
  • nsmall.ratio: controls the number of digits for the ratio statistics (OR, HR, RR), default=2

  • +
  • digits.test: controls the number of digits after the decimal point for p-values (default=3)

-
> summary(modelsum(bmi ~ sex + age + fu.time, data=mockstudy), digits=4, digits.test=2)
+
> summary(modelsum(bmi ~ sex + age + fu.time, data=mockstudy), digits=4, digits.test=2)
@@ -2971,59 +2929,59 @@

14. Modify the number of digits used

It is important to understand how R treats the digits argument. Here are some summaries for the variable pi. Note that with 4 digits, the number after the decimal point changes after multiplying pi by 10 or 100. However, the nsmall option specifies the number of values after the decimal point. The two can be used together (see the help file for format for more details on how that works).

-
> format(pi, digits=1)
-[1] "3"
-> format(pi, digits=3)
-[1] "3.14"
-> format(pi, digits=4)
-[1] "3.142"
-> format(pi*10, digits=4)
-[1] "31.42"
-> format(pi*100, digits=4)
-[1] "314.2"
-> format(pi*100, nsmall=4)
-[1] "314.1593"
-> format(pi*100, nsmall=2, digits=4)
-[1] "314.16"
+
> format(pi, digits=1)
+[1] "3"
+> format(pi, digits=3)
+[1] "3.14"
+> format(pi, digits=4)
+[1] "3.142"
+> format(pi*10, digits=4)
+[1] "31.42"
+> format(pi*100, digits=4)
+[1] "314.2"
+> format(pi*100, nsmall=4)
+[1] "314.1593"
+> format(pi*100, nsmall=2, digits=4)
+[1] "314.16"
-
-

15. Use case-weights in the models

+
+

14. Use case-weights in the models

Occasionally it is of interest to fit models using case weights. The modelsum function allows you to pass on the weights to the models and it will do the appropriate fit.

-
> mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE)
-> 
-> ## create weights based on agegp and sex distribution
-> tab1 <- with(mockstudy,table(agegp, sex))
-> tab1
+
> mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE)
+> 
+> ## create weights based on agegp and sex distribution
+> tab1 <- with(mockstudy,table(agegp, sex))
+> tab1
          sex
 agegp     Male Female
-  [18,50)  152    110
-  [50,60)  258    178
-  [60,70)  295    173
-  [70,90)  211    122
-> tab2 <- with(mockstudy, table(agegp, sex, arm))
-> gpwts <- rep(tab1, length(unique(mockstudy$arm)))/tab2
-> 
-> ## apply weights to subjects
-> index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(arm)))) 
-> mockstudy$wts <- gpwts[index]
-> 
-> ## show weights by treatment arm group
-> tapply(mockstudy$wts,mockstudy$arm, summary)
-$`A: IFL`
+  [18,50)  152    110
+  [50,60)  258    178
+  [60,70)  295    173
+  [70,90)  211    122
+> tab2 <- with(mockstudy, table(agegp, sex, arm))
+> gpwts <- rep(tab1, length(unique(mockstudy$arm)))/tab2
+> 
+> ## apply weights to subjects
+> index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(arm)))) 
+> mockstudy$wts <- gpwts[index]
+> 
+> ## show weights by treatment arm group
+> tapply(mockstudy$wts,mockstudy$arm, summary)
+$`A: IFL`
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-  2.923   3.225   3.548   3.502   3.844   4.045 
+  2.923   3.225   3.548   3.502   3.844   4.045 
 
-$`F: FOLFOX`
+$`F: FOLFOX`
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-  2.033   2.070   2.201   2.169   2.263   2.303 
+  2.033   2.070   2.201   2.169   2.263   2.303 
 
-$`G: IROX`
+$`G: IROX`
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-  3.667   3.734   4.023   3.945   4.031   4.471 
-
> mockstudy$newvarA <- as.numeric(mockstudy$arm=='A: IFL')
-> tab1 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), 
-+                  family=binomial)
-> summary(tab1, title='No Case Weights used')
+ 3.667 3.734 4.023 3.945 4.031 4.471
+
> mockstudy$newvarA <- as.numeric(mockstudy$arm=='A: IFL')
+> tab1 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), 
++                  family=binomial)
+> summary(tab1, title='No Case Weights used')
@@ -3103,10 +3061,12 @@

15. Use case-weights in the models

No Case Weights used
-
> 
-> tab2 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), 
-+                  weights=wts, family=binomial)
-> summary(tab2, title='Case Weights used')
+
> 
+> suppressWarnings({
++ tab2 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), 
++                  weights=wts, family=binomial)
++ summary(tab2, title='Case Weights used')
++ })
@@ -3187,8 +3147,8 @@

15. Use case-weights in the models

Case Weights used
-
-

16. Use modelsum within an Sweave document

+
+

15. Use modelsum within an Sweave document

For those users who wish to create tables within an Sweave document, the following code seems to work.

\documentclass{article}
 
@@ -3219,51 +3179,51 @@ 

16. Use modelsum within an Sweave document

\end{document}
-
-

17. Export modelsum results to a .CSV file

+
+

16. Export modelsum results to a .CSV file

When looking at multiple variables it is sometimes useful to export the results to a csv file. The as.data.frame function creates a data frame object that can be exported or further manipulated within R.

-
> summary(tab2, text=T)
+
> summary(tab2, text=T)
 -----------------------------------------------------------------------------------------------------------
-                   OR             CI.lower.OR    CI.upper.OR    p.value        concordance    Nmiss        
------------------ -------------- -------------- -------------- -------------- -------------- --------------
-(Intercept)       NA             NA             NA             0.504          0.55           210           
-ast               1              1              1.01           0.068          .              .             
-(Intercept)       NA             NA             NA             0.820          0.5            29            
-bmi               1              0.988          1.02           0.780          .              .             
-(Intercept)       NA             NA             NA             0.039          0.514          210           
-hgb               0.956          0.913          1              0.058          .              .             
+                   OR             CI.lower.OR    CI.upper.OR    p.value        concordance    Nmiss        
+----------------- -------------- -------------- -------------- -------------- -------------- --------------
+(Intercept)       NA             NA             NA             0.504          0.55           210           
+ast               1              1              1.01           0.068          .              .             
+(Intercept)       NA             NA             NA             0.820          0.5            29            
+bmi               1              0.988          1.02           0.780          .              .             
+(Intercept)       NA             NA             NA             0.039          0.514          210           
+hgb               0.956          0.913          1              0.058          .              .             
 -----------------------------------------------------------------------------------------------------------
-> tmp <- as.data.frame(tab2)
-> tmp
+> tmp <- as.data.frame(tab2)
+> tmp
          term model endpoint    OR CI.lower.OR CI.upper.OR p.value
-1 (Intercept)     1  newvarA    NA          NA          NA   0.504
-2         ast     1  newvarA 1.000       1.000        1.01   0.068
-3 (Intercept)     2  newvarA    NA          NA          NA   0.820
-4         bmi     2  newvarA 1.000       0.988        1.02   0.780
-5 (Intercept)     3  newvarA    NA          NA          NA   0.039
-6         hgb     3  newvarA 0.956       0.913        1.00   0.058
+1 (Intercept)     1  newvarA    NA          NA          NA   0.504
+2         ast     1  newvarA 1.000       1.000        1.01   0.068
+3 (Intercept)     2  newvarA    NA          NA          NA   0.820
+4         bmi     2  newvarA 1.000       0.988        1.02   0.780
+5 (Intercept)     3  newvarA    NA          NA          NA   0.039
+6         hgb     3  newvarA 0.956       0.913        1.00   0.058
   concordance Nmiss
-1       0.550   210
-2       0.550   210
-3       0.500    29
-4       0.500    29
-5       0.514   210
-6       0.514   210
-> # write.csv(tmp, '/my/path/here/mymodel.csv')
+1 0.550 210 +2 0.550 210 +3 0.500 29 +4 0.500 29 +5 0.514 210 +6 0.514 210 +> # write.csv(tmp, '/my/path/here/mymodel.csv')
-
-

18. Write modelsum object to a separate Word or HTML file

-
> ## write to an HTML document
-> # write2html(tab2, "~/ibm/trash.html")
-> 
-> ## write to a Word document
-> # write2word(tab2, "~/ibm/trash.doc", title="My table in Word")
+
+

17. Write modelsum object to a separate Word or HTML file

+
> ## write to an HTML document
+> # write2html(tab2, "~/ibm/trash.html")
+> 
+> ## write to a Word document
+> # write2word(tab2, "~/ibm/trash.doc", title="My table in Word")

Available Function Options

-
-

Summary statistics

+
+

Summary statistics

The available summary statistics, by varible type, are:

  • binomial,quasibinomial: Logistic regression models
  • @@ -3317,19 +3277,19 @@

    Summary statistics

  • CI.lower.estimate, CI.upper.estimate: print the confidence interval for the beta coefficient
-
-

modelsum.control settings

+
+

modelsum.control settings

A quick way to see what arguments are possible to utilize in a function is to use the args() command. Settings involving the number of digits can be set in modelsum.control or in summary.modelsum.

-
> args(modelsum.control)
-function (digits = 3, nsmall = NULL, nsmall.ratio = 2, digits.test = 3, 
-    show.adjust = TRUE, show.intercept = TRUE, conf.level = 0.95, 
-    binomial.stats = c("OR", "CI.lower.OR", "CI.upper.OR", "p.value", 
-        "concordance", "Nmiss"), gaussian.stats = c("estimate", 
-        "std.error", "p.value", "adj.r.squared", "Nmiss"), poisson.stats = c("RR", 
-        "CI.lower.RR", "CI.upper.RR", "p.value", "concordance", 
-        "Nmiss"), survival.stats = c("HR", "CI.lower.HR", "CI.upper.HR", 
-        "p.value", "concordance", "Nmiss"), ...) 
-NULL
+
> args(modelsum.control)
+function (digits = 3, nsmall = NULL, nsmall.ratio = 2, digits.test = 3, 
+    show.adjust = TRUE, show.intercept = TRUE, conf.level = 0.95, 
+    binomial.stats = c("OR", "CI.lower.OR", "CI.upper.OR", "p.value", 
+        "concordance", "Nmiss"), gaussian.stats = c("estimate", 
+        "std.error", "p.value", "adj.r.squared", "Nmiss"), poisson.stats = c("RR", 
+        "CI.lower.RR", "CI.upper.RR", "p.value", "concordance", 
+        "Nmiss"), survival.stats = c("HR", "CI.lower.HR", "CI.upper.HR", 
+        "p.value", "concordance", "Nmiss"), ...) 
+NULL

Settings:

  • digits=3 (number of significant digits for beta coefficient and standard error)
  • @@ -3345,15 +3305,15 @@

    modelsum.control settings

  • poisson.stats, quasipoisson.stats
-
-

summary.modelsum settings

+
+

summary.modelsum settings

The summary.modelsum function has options that modify how the table appears (such as adding a title or modifying labels).

-
> args(arsenal:::summary.modelsum)
-function (object, title = NULL, labelTranslations = NULL, digits = NA, 
-    nsmall = NA, nsmall.ratio = NA, digits.test = NA, show.intercept = NA, 
-    show.adjust = NA, text = FALSE, removeBlanks = text, labelSize = 1.2, 
-    pfootnote = TRUE, ...) 
-NULL
+
> args(arsenal:::summary.modelsum)
+function (object, title = NULL, labelTranslations = NULL, digits = NA, 
+    nsmall = NA, nsmall.ratio = NA, digits.test = NA, show.intercept = NA, 
+    show.adjust = NA, text = FALSE, removeBlanks = text, labelSize = 1.2, 
+    pfootnote = TRUE, ...) 
+NULL

Settings:

  • title
  • @@ -3374,19 +3334,6 @@

    summary.modelsum settings

    - -
- - - - - - - - + - - - - - + - - - - -
- - - - - - - - - - - - - @@ -172,19 +123,19 @@

29 December, 2016

Introduction

One of the most common tables in medical literature includes summary statistics for a set of variables, often stratified by some group (e.g. treatment arm). Locally, the SAS macros %table and %summary were written to create summary tables with a single call. With the increasing interest in R, we have developed the function tableby to create similar tables within the R environment.

In developing the tableby function, the goal was to bring the best features of these macros into an R function. However, the task was not simply to duplicate all the functionality, but rather to make use of R’s strengths (modeling, method dispersion, flexibility in function definition and output format) and make a tool that fits the needs of R users. Additionally, the results needed to fit within the general reproducible research framework so the tables could be displayed within an R markdown report.

-

This report provides step-by-step directions for using the functions associated with tableby. All functions presented here are available within the arsenal package. An assumption is made that users are somewhat familiar with R markdown documents. For those who are new to the topic, a good initial resource is available at rmarkdown.rstudio.com.

+

This report provides step-by-step directions for using the functions associated with tableby. All functions presented here are available within the arsenal package. An assumption is made that users are somewhat familiar with R markdown documents. For those who are new to the topic, a good initial resource is available at rmarkdown.rstudio.com.

Simple Example

The first step when using the tableby function is to load the arsenal package. All the examples in this report use a dataset called mockstudy made available by Paul Novotny which includes a variety of types of variables (character, numeric, factor, ordered factor, survival) to use as examples.

-
require(arsenal)
-require(knitr)
-require(survival)
-data(mockstudy) ##load data
-dim(mockstudy)  ##look at how many subjects and variables are in the dataset 
+
require(arsenal)
+require(knitr)
+require(survival)
+data(mockstudy) ##load data
+dim(mockstudy)  ##look at how many subjects and variables are in the dataset 
## [1] 1499   14
-
# help(mockstudy) ##learn more about the dataset and variables
-str(mockstudy) ##quick look at the data
+
# help(mockstudy) ##learn more about the dataset and variables
+str(mockstudy) ##quick look at the data
## 'data.frame':    1499 obs. of  14 variables:
 ##  $ case       : int  110754 99706 105271 105001 112263 86205 99508 90158 88989 90515 ...
 ##  $ age        : atomic  67 74 50 71 69 56 50 57 51 63 ...
@@ -205,12 +156,12 @@ 

Simple Example

## $ mdquality.s: int NA 1 1 1 NA 1 1 1 1 1 ... ## $ age.ord : Ord.factor w/ 8 levels "10-19"<"20-29"<..: 6 7 4 7 6 5 4 5 5 6 ...

To create a simple table stratified by treament arm, use a formula statement to specify the variables that you want summarized. The example below uses age (a continuous variable) and sex (a factor).

-
tab1 <- tableby(arm ~ sex + age, data=mockstudy)
+
tab1 <- tableby(arm ~ sex + age, data=mockstudy)

If you want to take a quick look at the table, you can use summary on your tableby object and the table will print out as text in your R console window. If you use summary without any options you will see a number of \(\&nbsp;\) statements which translates to “space” in HTML.

-
-

Pretty text version of table

+
+

Pretty text version of table

If you want a nicer version in your console window then adding the text=TRUE option.

-
summary(tab1, text=TRUE)
+
summary(tab1, text=TRUE)
## ---------------------------------------------------------------------------------------------------------------------------
 ##                          A: IFL (N=428)      F: FOLFOX (N=691)   G: IROX (N=380)     Total (N=1499)      p value           
 ## ----------------------- ------------------- ------------------- ------------------- ------------------- -------------------
@@ -223,13 +174,10 @@ 

Pretty text version of table

## Range 27 - 88 19 - 88 26 - 85 19 - 88 ## ---------------------------------------------------------------------------------------------------------------------------
-
-

Pretty Rmarkdown version of table

+
+

Pretty Rmarkdown version of table

In order for the report to look nice within an R markdown (knitr) report, you just need to specify results="asis" when creating the r chunk. This changes the layout slightly (compresses it) and bolds the variable names.

-

```{r, results=“asis”}

-

summary(tab1)

-

```

-
summary(tab1)
+
summary(tab1)
@@ -309,29 +257,42 @@

Pretty Rmarkdown version of table

-
-

Summaries using standard R code

-
## base R frequency example
-tmp <- table(Gender=mockstudy$sex, "Study Arm"=mockstudy$arm)
-tmp
+
+

Data frame version of table

+

If you want a data.frame version, simply use as.data.frame.

+
as.data.frame(tab1)
+
##           term variable A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
+## 1          Sex      sex                                                                   0.190
+## 2         Male      sex    277 (64.7%)       411 (59.5%)       228 (60%)    916 (61.1%)        
+## 3       Female      sex    151 (35.3%)       280 (40.5%)       152 (40%)    583 (38.9%)        
+## 4 Age in Years      age                                                                   0.614
+## 5    Mean (SD)      age    59.7 (11.4)       60.3 (11.6)     59.8 (11.5)      60 (11.5)        
+## 6       Q1, Q3      age         53, 68            52, 69          52, 68         52, 68        
+## 7        Range      age        27 - 88           19 - 88         26 - 85        19 - 88
+
+
+

Summaries using standard R code

+
## base R frequency example
+tmp <- table(Gender=mockstudy$sex, "Study Arm"=mockstudy$arm)
+tmp
##         Study Arm
 ## Gender   A: IFL F: FOLFOX G: IROX
 ##   Male      277       411     228
 ##   Female    151       280     152
-
# Note: The continuity correction is applied by default in R (not used in %table)
-chisq.test(tmp) 
+
# Note: The continuity correction is applied by default in R (not used in %table)
+chisq.test(tmp) 
## 
 ##  Pearson's Chi-squared test
 ## 
 ## data:  tmp
 ## X-squared = 3.3168, df = 2, p-value = 0.1904
-
## gmodels frequency example
-#require(gmodels)
-#CrossTable(mockstudy$sex, mockstudy$arm, prop.r=F, prop.t=F, 
-#           prop.chisq=F, chisq=T, dnn=c('Gender','Study Arm'))
+
## gmodels frequency example
+#require(gmodels)
+#CrossTable(mockstudy$sex, mockstudy$arm, prop.r=F, prop.t=F, 
+#           prop.chisq=F, chisq=T, dnn=c('Gender','Study Arm'))
 
 ## base R numeric summary example
-tapply(mockstudy$age, mockstudy$arm, summary)
+tapply(mockstudy$age, mockstudy$arm, summary)
## $`A: IFL`
 ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 ##   27.00   53.00   61.00   59.67   68.00   88.00 
@@ -343,7 +304,7 @@ 

Summaries using standard R code

## $`G: IROX` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 26.00 52.00 61.00 59.76 68.00 85.00
-
summary(aov(age ~ arm, data=mockstudy))
+
summary(aov(age ~ arm, data=mockstudy))
##               Df Sum Sq Mean Sq F value Pr(>F)
 ## arm            2    129    64.7   0.487  0.614
 ## Residuals   1496 198628   132.8
@@ -351,23 +312,23 @@

Summaries using standard R code

Modifying Output

-
-

Add labels

+
+

Add labels

In the above example, age is shown with a label (Age in Years), but sex is listed “as is” with lower case letters. This is because the data was created in SAS and in the SAS dataset, age had a label but sex did not. The label is stored as an attribute within R.

-
## Look at one variable's label
-attr(mockstudy$age,'label')
+
## Look at one variable's label
+attr(mockstudy$age,'label')
## [1] "Age in Years"
-
## See all the variables with a label
-unlist(lapply(mockstudy,'attr','label'))
-
##                        age                        arm 
-##             "Age in Years"            "Treatment Arm" 
-##                       race                        bmi 
-##                     "Race" "Body Mass Index (kg/m^2)"
+
## See all the variables with a label
+unlist(lapply(mockstudy,'attr','label'))
+
##                        age                        arm                       race 
+##             "Age in Years"            "Treatment Arm"                     "Race" 
+##                        bmi 
+## "Body Mass Index (kg/m^2)"

If you want to add labels to other variables, there are a couple of options. First, you could add labels to the variables in your dataset.

-
attr(mockstudy$sex,'label')  <- 'Gender'
+
attr(mockstudy$sex,'label')  <- 'Gender'
 
-tab1 <- tableby(arm ~ sex + age, data=mockstudy)
-summary(tab1)
+tab1 <- tableby(arm ~ sex + age, data=mockstudy) +summary(tab1)
@@ -447,8 +408,8 @@

Add labels

Another option is to add labels after you have created the table

-
mylabels <- list( sex = "SEX", age ="Age, yrs")
-summary(tab1, labelTranslations = mylabels)
+
mylabels <- list( sex = "SEX", age ="Age, yrs")
+summary(tab1, labelTranslations = mylabels)
@@ -528,14 +489,14 @@

Add labels

Alternatively, you can check the variable labels and manipulate them with a function called labels, which works on the tableby object.

-
labels(tab1)
+
labels(tab1)
##             arm             sex             age 
 ## "Treatment Arm"        "Gender"  "Age in Years"
-
labels(tab1) <- c(arm="Treatment Assignment", age="Baseline Age (yrs)")
-labels(tab1)
+
labels(tab1) <- c(arm="Treatment Assignment", age="Baseline Age (yrs)")
+labels(tab1)
##                    arm                    sex                    age 
 ## "Treatment Assignment"               "Gender"   "Baseline Age (yrs)"
-
summary(tab1)
+
summary(tab1)
@@ -615,16 +576,16 @@

Add labels

-
-

Change summary statistics globally

+
+

Change summary statistics globally

Currently the default behavior is to summarize continuous variables with: Number of missing values, Mean (SD), 25th - 75th quantiles, and Minimum-Maximum values with an ANOVA (t-test with equal variances) p-value. For categorical variables the default is to show: Number of missing values and count (column percent) with a chi-square p-value. This behavior can be modified using the tableby.control function. In fact, you can save your standard settings and use that for future tables. Note that test=FALSE and total=FALSE results in the total column and p-value column not being printed.

-
mycontrols  <- tableby.control(test=FALSE, total=FALSE,
-                               numeric.test="kwt", cat.test="chisq",
-                               numeric.stats=c("N", "median", "q1q3"),
-                               cat.stats=c("countpct"),
-                               stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3'))                            
-tab2 <- tableby(arm ~ sex + age, data=mockstudy, control=mycontrols)
-summary(tab2)
+
mycontrols  <- tableby.control(test=FALSE, total=FALSE,
+                               numeric.test="kwt", cat.test="chisq",
+                               numeric.stats=c("N", "median", "q1q3"),
+                               cat.stats=c("countpct"),
+                               stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3'))
+tab2 <- tableby(arm ~ sex + age, data=mockstudy, control=mycontrols)
+summary(tab2)
@@ -686,9 +647,9 @@

Change summary statistics globally

You can also change these settings directly in the tableby call.

-
tab3 <- tableby(arm ~ sex + age, data=mockstudy, test=FALSE, total=FALSE, 
-                numeric.stats=c("median","q1q3"), numeric.test="kwt")
-summary(tab3)
+
tab3 <- tableby(arm ~ sex + age, data=mockstudy, test=FALSE, total=FALSE, 
+                numeric.stats=c("median","q1q3"), numeric.test="kwt")
+summary(tab3)
@@ -744,17 +705,17 @@

Change summary statistics globally

-
-

Change summary statistics within the formula

+
+

Change summary statistics within the formula

In addition to modifying summary options globally, it is possible to modify the test and summary statistics for specific variables within the formula statement. For example, both the kwt (Kruskal-Wallis rank-based) and anova (asymptotic analysis of variance) tests apply to numeric variables and we can use one for the variable “age” and another for the variable “ast”. A list of all the options is shown at the end of the vignette.

The tests function can do a quick check on what tests were performed on each variable in tableby.

-
tab.test <- tableby(arm ~ kwt(age) + anova(bmi) + kwt(ast), data=mockstudy)
-tests(tab.test)
+
tab.test <- tableby(arm ~ kwt(age) + anova(bmi) + kwt(ast), data=mockstudy)
+tests(tab.test)
##                     Variable    p.value                       Method
 ## age             Age in Years 0.63906143 Kruskal-Wallis rank sum test
 ## bmi Body Mass Index (kg/m^2) 0.89165522           Linear Model ANOVA
 ## ast                      ast 0.03902803 Kruskal-Wallis rank sum test
-
summary(tab.test)
+
summary(tab.test)
@@ -890,9 +851,9 @@

Change summary statistics within the formula

Summary statistics for any individual variable can also be modified, but it must be done as secondary arguments to the test function. The function names must be strings that are functions already written for tableby, built-in R functions like mean and range, or user-defined functions.

-
tab.test <- tableby(arm ~ kwt(ast, "Nmiss2","median") + anova(age, "N","mean") +
-                    kwt(bmi, "Nmiss","median"), data=mockstudy)
-summary(tab.test)
+
tab.test <- tableby(arm ~ kwt(ast, "Nmiss2","median") + anova(age, "N","mean") +
+                    kwt(bmi, "Nmiss","median"), data=mockstudy)
+summary(tab.test)
@@ -988,8 +949,8 @@

Change summary statistics within the formula

-
-

Modifying the look & feel in Word documents

+
+

Modifying the look & feel in Word documents

You can easily create Word versions of tableby output via an Rmarkdown report and the default options will give you a reasonable table in Word - just select the “Knit Word” option in RStudio.

The functionality listed in this next paragraph is coming soon but needs an upgraded version of RStudio If you want to modify fonts used for the table, then you’ll need to add an extra line to your header at the beginning of your file. You can take the WordStylesReference01.docx file and modify the fonts (storing the format preferences in your project directory). To see how this works, run your report once using WordStylesReference01.docx and then WordStylesReference02.docx.

output: word_document
@@ -1000,10 +961,10 @@ 

Modifying the look & feel in Word documents

Additional Examples

Here are multiple examples showing how to use some of the different options.

-
-

1. Summarize without a group/by variable

-
tab.noby <- tableby(~ bmi + sex + age, data=mockstudy)
-summary(tab.noby)
+
+

1. Summarize without a group/by variable

+
tab.noby <- tableby(~ bmi + sex + age, data=mockstudy)
+summary(tab.noby)
@@ -1067,9 +1028,9 @@

1. Summarize without a group/by variable

-
-

2. Display footnotes indicating which “test” was used

-
summary(tab.test) #, pfootnote=TRUE)
+
+

2. Display footnotes indicating which “test” was used

+
summary(tab.test) #, pfootnote=TRUE)
@@ -1165,11 +1126,11 @@

2. Display footnotes indicating which “test” was used

-
-

3. Summarize an ordered factor

+
+

3. Summarize an ordered factor

When comparing groups of ordered data there are a couple of options. The default uses a general independence test available from the coin package. For two-group comparisons, this is essentially the Armitage trend test. The other option is to specify the Kruskal Wallis test. The example below shows both options.

-
mockstudy$age.ordnew <- ordered(c("a",NA,as.character(mockstudy$age.ord[-(1:2)])))
-table(mockstudy$age.ord, mockstudy$sex)
+
mockstudy$age.ordnew <- ordered(c("a",NA,as.character(mockstudy$age.ord[-(1:2)])))
+table(mockstudy$age.ord, mockstudy$sex)
##        
 ##         Male Female
 ##   10-19    1      0
@@ -1180,7 +1141,7 @@ 

3. Summarize an ordered factor

## 60-69 298 170 ## 70-79 168 101 ## 80-89 20 9
-
table(mockstudy$age.ordnew, mockstudy$sex)
+
table(mockstudy$age.ordnew, mockstudy$sex)
##        
 ##         Male Female
 ##   10-19    1      0
@@ -1192,9 +1153,9 @@ 

3. Summarize an ordered factor

## 70-79 168 100 ## 80-89 20 9 ## a 1 0
-
class(mockstudy$age.ord)
+
class(mockstudy$age.ord)
## [1] "ordered" "factor"
-
summary(tableby(sex ~ age.ordnew, data = mockstudy)) #, pfootnote = TRUE)
+
summary(tableby(sex ~ age.ordnew, data = mockstudy)) #, pfootnote = TRUE)
@@ -1292,7 +1253,7 @@

3. Summarize an ordered factor

-
summary(tableby(sex ~ kwt(age.ord), data = mockstudy)) #) #, pfootnote = TRUE)
+
summary(tableby(sex ~ kwt(age.ord), data = mockstudy)) #) #, pfootnote = TRUE)
@@ -1384,16 +1345,16 @@

3. Summarize an ordered factor

-
-

4. Summarize a survival variable

+
+

4. Summarize a survival variable

First look at the information that is presented by the survfit function, then see how the same results can be seen with tableby. The default is to show the median survival (time at which the probability of survival = 50%).

-
survfit(Surv(fu.time, fu.stat)~sex, data=mockstudy)
+
survfit(Surv(fu.time, fu.stat)~sex, data=mockstudy)
## Call: survfit(formula = Surv(fu.time, fu.stat) ~ sex, data = mockstudy)
 ## 
 ##              n events median 0.95LCL 0.95UCL
 ## sex=Male   916    829    550     515     590
 ## sex=Female 583    527    543     511     575
-
survdiff(Surv(fu.time, fu.stat)~sex, data=mockstudy)
+
survdiff(Surv(fu.time, fu.stat)~sex, data=mockstudy)
## Call:
 ## survdiff(formula = Surv(fu.time, fu.stat) ~ sex, data = mockstudy)
 ## 
@@ -1402,7 +1363,7 @@ 

4. Summarize a survival variable

## sex=Female 583 527 526 0.000583 0.000956 ## ## Chisq= 0 on 1 degrees of freedom, p= 0.975
-
summary(tableby(sex ~ Surv(fu.time, fu.stat), data=mockstudy))
+
summary(tableby(sex ~ Surv(fu.time, fu.stat), data=mockstudy))
@@ -1445,7 +1406,7 @@

4. Summarize a survival variable

It is also possible to obtain summaries of the %survival at certain time points (say the probability of surviving 1-year).

-
summary(survfit(Surv(fu.time/365.25, fu.stat)~sex, data=mockstudy), times=1:5)
+
summary(survfit(Surv(fu.time/365.25, fu.stat)~sex, data=mockstudy), times=1:5)
## Call: survfit(formula = Surv(fu.time/365.25, fu.stat) ~ sex, data = mockstudy)
 ## 
 ##                 sex=Male 
@@ -1463,7 +1424,7 @@ 

4. Summarize a survival variable

## 3 95 90 0.1701 0.0157 0.1420 0.204 ## 4 51 32 0.1093 0.0133 0.0861 0.139 ## 5 18 12 0.0745 0.0126 0.0534 0.104
-
summary(tableby(sex ~ Surv(fu.time/365.25, fu.stat), data=mockstudy, times=1:5, surv.stats=c("NeventsSurv","NriskSurv")))
+
summary(tableby(sex ~ Surv(fu.time/365.25, fu.stat), data=mockstudy, times=1:5, surv.stats=c("NeventsSurv","NriskSurv")))
## Warning in tableby(sex ~ Surv(fu.time/365.25, fu.stat), data = mockstudy, : unused arguments: times
@@ -1577,14 +1538,14 @@

4. Summarize a survival variable

-
-

5. Summarize date variables

+
+

5. Summarize date variables

Date variables by default are summarized with the number of missing values, the median, and the range. For example purposes we’ve created a random date. Missing values are introduced for impossible February dates.

-
set.seed(100)
-N <- nrow(mockstudy)
-mockstudy$dtentry <- mdy.Date(month=sample(1:12,N,replace=T), day=sample(1:29,N,replace=T), 
-                              year=sample(2005:2009,N,replace=T))
-summary(tableby(sex ~ dtentry, data=mockstudy))
+
set.seed(100)
+N <- nrow(mockstudy)
+mockstudy$dtentry <- mdy.Date(month=sample(1:12,N,replace=T), day=sample(1:29,N,replace=T), 
+                              year=sample(2005:2009,N,replace=T))
+summary(tableby(sex ~ dtentry, data=mockstudy))
@@ -1634,22 +1595,22 @@

5. Summarize date variables

-
-

6. Summarize multiple variables without typing them out

+
+

6. Summarize multiple variables without typing them out

Often one wants to summarize a number of variables. Instead of typing by hand each individual variable, an alternative approach is to create a formula using the paste command with the collapse="+" option.

-
## create a vector specifying the variable names
-myvars <- names(mockstudy)
+
## create a vector specifying the variable names
+myvars <- names(mockstudy)
 
 ## select the 8th through the last variables
 ## paste them together, separated by the + sign
-RHS <- paste(myvars[8:10], collapse="+")
-RHS
+RHS <- paste(myvars[8:10], collapse="+") +RHS

[1] “ps+hgb+bmi”

-
## create a formula using the as.formula function
-as.formula(paste('arm ~ ', RHS))
+
## create a formula using the as.formula function
+as.formula(paste('arm ~ ', RHS))

arm ~ ps + hgb + bmi

-
## use the formula in the tableby function
-summary(tableby(as.formula(paste('arm ~', RHS)), data=mockstudy))
+
## use the formula in the tableby function
+summary(tableby(as.formula(paste('arm ~', RHS)), data=mockstudy))
@@ -1793,15 +1754,15 @@

6. Summarize multiple variables without typing them out

These steps can also be done using the formulize function.

-
## The formulize function does the paste and as.formula steps
-tmp <- formulize('arm',myvars[8:10])
-tmp
-

arm ~ ps + hgb + bmi <environment: 0x6593680>

-
## More complex formulas could also be written using formulize
-tmp2 <- formulize('arm',c('ps','hgb^2','bmi'))
+
## The formulize function does the paste and as.formula steps
+tmp <- formulize('arm',myvars[8:10])
+tmp
+

arm ~ ps + hgb + bmi <environment: 0x5ca23f8>

+
## More complex formulas could also be written using formulize
+tmp2 <- formulize('arm',c('ps','hgb^2','bmi'))
 
 ## use the formula in the tableby function
-summary(tableby(tmp, data=mockstudy))
+summary(tableby(tmp, data=mockstudy))
@@ -1945,24 +1906,24 @@

6. Summarize multiple variables without typing them out

-
-

7. Subset the dataset used in the analysis

+
+

7. Subset the dataset used in the analysis

Here are two ways to get the same result (limit the analysis to subjects age>5 and in the F: FOLFOX treatment group).

  • The first approach uses the subset function applied to the dataset mockstudy. This example also selects a subset of variables. The tableby function is then applied to this subsetted data.
-
newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(sex,ps:bmi))
-dim(mockstudy)
+
newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(sex,ps:bmi))
+dim(mockstudy)
## [1] 1499   16
-
table(mockstudy$arm)
+
table(mockstudy$arm)
## 
 ##    A: IFL F: FOLFOX   G: IROX 
 ##       428       691       380
-
dim(newdata)
+
dim(newdata)
## [1] 557   4
-
names(newdata)
+
names(newdata)
## [1] "sex" "ps"  "hgb" "bmi"
-
summary(tableby(sex ~ ., data=newdata))
+
summary(tableby(sex ~ ., data=newdata))
@@ -2091,7 +2052,7 @@

7. Subset the dataset used in the analysis

  • The second approach does the same analysis but uses the subset argument within tableby to subset the data.
-
summary(tableby(sex ~ ps + hgb + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy))
+
summary(tableby(sex ~ ps + hgb + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy))
@@ -2218,14 +2179,14 @@

7. Subset the dataset used in the analysis

-
-

8. Create combinations of variables on the fly

-
## create a variable combining the levels of mdquality.s and sex
-with(mockstudy, table(interaction(mdquality.s,sex)))
+
+

8. Create combinations of variables on the fly

+
## create a variable combining the levels of mdquality.s and sex
+with(mockstudy, table(interaction(mdquality.s,sex)))
## 
 ##   0.Male   1.Male 0.Female 1.Female 
 ##       77      686       47      437
-
summary(tableby(arm ~ interaction(mdquality.s,sex), data=mockstudy))
+
summary(tableby(arm ~ interaction(mdquality.s,sex), data=mockstudy))
@@ -2296,8 +2257,8 @@

8. Create combinations of variables on the fly

-
## create a new grouping variable with combined levels of arm and sex
-summary(tableby(interaction(mdquality.s, sex) ~  age + bmi, data=mockstudy, subset=arm=="F: FOLFOX"))
+
## create a new grouping variable with combined levels of arm and sex
+summary(tableby(interaction(mdquality.s, sex) ~  age + bmi, data=mockstudy, subset=arm=="F: FOLFOX"))
@@ -2404,12 +2365,12 @@

8. Create combinations of variables on the fly

-
-

9. Transform variables on the fly

+
+

9. Transform variables on the fly

Certain transformations need to be surrounded by I() so that R knows to treat it as a variable transformation and not some special model feature. If the transformation includes any of the symbols / - + ^ * then surround the new variable by I().

-
trans <- tableby(arm ~ I(age/10) + log(bmi) + factor(mdquality.s, levels=0:1, labels=c('N','Y')),
-                 data=mockstudy)
-summary(trans)
+
trans <- tableby(arm ~ I(age/10) + log(bmi) + factor(mdquality.s, levels=0:1, labels=c('N','Y')),
+                 data=mockstudy)
+summary(trans)
@@ -2537,7 +2498,7 @@

9. Transform variables on the fly

The labels for these variables isn’t exactly what we’d like so we can change modify those after the fact. Instead of typing out the very long variable names you can modify specific labels by position.

-
labels(trans)
+
labels(trans)
##                                                           arm 
 ##                                               "Treatment Arm" 
 ##                                                     I(age/10) 
@@ -2546,8 +2507,8 @@ 

9. Transform variables on the fly

## "log(bmi)" ## factor(mdquality.s, levels = 0:1, labels = c("N", "Y")) ## "factor(mdquality.s, levels = 0:1, labels = c(\"N\", \"Y\"))"
-
labels(trans)[2:4] <- c('Age per 10 yrs', 'log(BMI)', 'MD Quality')
-labels(trans)
+
labels(trans)[2:4] <- c('Age per 10 yrs', 'log(BMI)', 'MD Quality')
+labels(trans)
##                                                     arm 
 ##                                         "Treatment Arm" 
 ##                                               I(age/10) 
@@ -2556,7 +2517,7 @@ 

9. Transform variables on the fly

## "log(BMI)" ## factor(mdquality.s, levels = 0:1, labels = c("N", "Y")) ## "MD Quality"
-
summary(trans)
+
summary(trans)
@@ -2684,9 +2645,9 @@

9. Transform variables on the fly

Note that if we had not changed mdquality.s to a factor, it would have been summarized as though it were a continuous variable.

-
class(mockstudy$mdquality.s)
+
class(mockstudy$mdquality.s)

[1] “integer”

-
summary(tableby(arm~mdquality.s, data=mockstudy))
+
summary(tableby(arm~mdquality.s, data=mockstudy))
@@ -2750,7 +2711,7 @@

9. Transform variables on the fly

Another option would be to specify the test and summary statistics. In fact, if I had a set of variables coded 0/1 and that was all I was summarizing, then I could change the global option for continuous variables to use the chi-square test and show countpct.

-
summary(tableby(arm ~ chisq(mdquality.s, "Nmiss","countpct"), data=mockstudy))
+
summary(tableby(arm ~ chisq(mdquality.s, "Nmiss","countpct"), data=mockstudy))
@@ -2806,11 +2767,11 @@

9. Transform variables on the fly

-
-

10. Change the ordering of the variables or delete a variable

-
mytab <- tableby(arm ~ sex + alk.phos + age, data=mockstudy)
-mytab2 <- mytab[c('age','sex','alk.phos')]
-summary(mytab2)
+
+

10. Change the ordering of the variables or delete a variable

+
mytab <- tableby(arm ~ sex + alk.phos + age, data=mockstudy)
+mytab2 <- mytab[c('age','sex','alk.phos')]
+summary(mytab2)
@@ -2929,7 +2890,7 @@

10. Change the ordering of the variables or delete a variable

-
summary(mytab[c('age','sex')], nsmall = 2)
+
summary(mytab[c('age','sex')], nsmall = 2)
@@ -3008,7 +2969,7 @@

10. Change the ordering of the variables or delete a variable

-
summary(mytab[c(3,1)], nsmall = 3)
+
summary(mytab[c(3,1)], nsmall = 3)
@@ -3088,26 +3049,26 @@

10. Change the ordering of the variables or delete a variable

-
-

11. Merge two tableby objects together

+
+

11. Merge two tableby objects together

It is possible to combine two tableby objects so that they print out together.

-
## demographics
-tab1 <- tableby(arm ~ sex + age, data=mockstudy,
-                control=tableby.control(numeric.stats=c("Nmiss","meansd"), total=FALSE))
+
## demographics
+tab1 <- tableby(arm ~ sex + age, data=mockstudy,
+                control=tableby.control(numeric.stats=c("Nmiss","meansd"), total=FALSE))
 ## lab data
-tab2 <- tableby(arm ~ hgb + alk.phos, data=mockstudy,
-                control=tableby.control(numeric.stats=c("Nmiss","median","q1q3"),
-                                        numeric.test="kwt", total=FALSE))
-names(tab1$x)
+tab2 <- tableby(arm ~ hgb + alk.phos, data=mockstudy, + control=tableby.control(numeric.stats=c("Nmiss","median","q1q3"), + numeric.test="kwt", total=FALSE)) +names(tab1$x)

[1] “sex” “age”

-
names(tab2$x)
+
names(tab2$x)

[1] “hgb” “alk.phos”

-
tab12 <- merge(tab1,tab2)
-class(tab12)
+
tab12 <- merge(tab1,tab2)
+class(tab12)

[1] “tableby”

-
names(tab12$x)
+
names(tab12$x)

[1] “sex” “age” “hgb” “alk.phos”

-
summary(tab12) #, pfootnote=TRUE)
+
summary(tab12) #, pfootnote=TRUE)
@@ -3220,11 +3181,11 @@

11. Merge two tableby objects together

-
-

12. Add a title to the table

+
+

12. Add a title to the table

When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table.

-
t1 <- tableby(arm ~ sex + age, data=mockstudy)
-summary(t1, title='Demographics')
+
t1 <- tableby(arm ~ sex + age, data=mockstudy)
+summary(t1, title='Demographics')
@@ -3305,20 +3266,24 @@

12. Add a title to the table

Demographics
-
-

13. Modify how missing values are displayed

-

Depending on the report you are writing you have the following options: * Show how many subjects have each variable * Show how many subjects are missing each variable * Show how many subjects are missing each variable only if there are any missing values * Don’t indicate missing values at all

-
## look at how many missing values there are for each variable
-apply(is.na(mockstudy),2,sum)
-
##        case         age         arm         sex        race     fu.time 
-##           0           0           0           0           7           0 
-##     fu.stat          ps         hgb         bmi    alk.phos         ast 
-##           0         266         266          33         266         266 
-## mdquality.s     age.ord  age.ordnew     dtentry 
-##         252           0           1           5
-
## Show how many subjects have each variable (non-missing)
-summary(tableby(sex ~ ast + age, data=mockstudy,
-                control=tableby.control(numeric.stats=c("N","median"), total=FALSE)))
+
+

13. Modify how missing values are displayed

+

Depending on the report you are writing you have the following options:

+
    +
  • Show how many subjects have each variable

  • +
  • Show how many subjects are missing each variable

  • +
  • Show how many subjects are missing each variable only if there are any missing values

  • +
  • Don’t indicate missing values at all

  • +
+
## look at how many missing values there are for each variable
+apply(is.na(mockstudy),2,sum)
+
##        case         age         arm         sex        race     fu.time     fu.stat          ps 
+##           0           0           0           0           7           0           0         266 
+##         hgb         bmi    alk.phos         ast mdquality.s     age.ord  age.ordnew     dtentry 
+##         266          33         266         266         252           0           1           5
+
## Show how many subjects have each variable (non-missing)
+summary(tableby(sex ~ ast + age, data=mockstudy,
+                control=tableby.control(numeric.stats=c("N","median"), total=FALSE)))
@@ -3373,9 +3338,9 @@

13. Modify how missing values are displayed

-
## Always list the number of missing values
-summary(tableby(sex ~ ast + age, data=mockstudy,
-                control=tableby.control(numeric.stats=c("Nmiss2","median"), total=FALSE)))
+
## Always list the number of missing values
+summary(tableby(sex ~ ast + age, data=mockstudy,
+                control=tableby.control(numeric.stats=c("Nmiss2","median"), total=FALSE)))
@@ -3430,9 +3395,9 @@

13. Modify how missing values are displayed

-
## Only show the missing values if there are some (default)
-summary(tableby(sex ~ ast + age, data=mockstudy, 
-                control=tableby.control(numeric.stats=c("Nmiss","mean"),total=FALSE)))
+
## Only show the missing values if there are some (default)
+summary(tableby(sex ~ ast + age, data=mockstudy, 
+                control=tableby.control(numeric.stats=c("Nmiss","mean"),total=FALSE)))
@@ -3481,9 +3446,9 @@

13. Modify how missing values are displayed

-
## Don't show N at all
-summary(tableby(sex ~ ast + age, data=mockstudy, 
-                control=tableby.control(numeric.stats=c("mean"),total=FALSE)))
+
## Don't show N at all
+summary(tableby(sex ~ ast + age, data=mockstudy, 
+                control=tableby.control(numeric.stats=c("mean"),total=FALSE)))
@@ -3527,16 +3492,16 @@

13. Modify how missing values are displayed

-
-

14. Modify the number of digits used

+
+

14. Modify the number of digits used

Within tableby.control function there are 4 options for controlling the number of significant digits shown.

    -
  • digits: controls the number of significant digits (counting both before and after the decimal point) for continuous variables
  • -
  • nsmall: controls the number of digits after the decimal point for continous variables
  • -
  • nsmall.pct: controls the number of digits after the decimal point for percentages
  • -
  • digits.test: controls the number of digits after the decimal point for p-values (default=3)
  • +
  • digits: controls the number of significant digits (counting both before and after the decimal point) for continuous variables

  • +
  • nsmall: controls the number of digits after the decimal point for continous variables

  • +
  • nsmall.pct: controls the number of digits after the decimal point for percentages

  • +
  • digits.test: controls the number of digits after the decimal point for p-values (default=3)

-
summary(tableby(arm ~ sex + age + fu.time, data=mockstudy), digits=4, digits.test=2, nsmall.pct=1)
+
summary(tableby(arm ~ sex + age + fu.time, data=mockstudy), digits=4, digits.test=2, nsmall.pct=1)
@@ -3648,31 +3613,31 @@

14. Modify the number of digits used

It is important to understand how R treats the digits argument. Here are some summaries for the variable pi. Note that with 4 digits, the number after the decimal point changes after multiplying pi by 10 or 100. However, the nsmall option specifies the number of values after the decimal point. The two can be used together (see the help file for format for more details on how that works).

-
format(pi, digits=1)
+
format(pi, digits=1)
## [1] "3"
-
format(pi, digits=3)
+
format(pi, digits=3)
## [1] "3.14"
-
format(pi, digits=4)
+
format(pi, digits=4)
## [1] "3.142"
-
format(pi*10, digits=4)
+
format(pi*10, digits=4)
## [1] "31.42"
-
format(pi*100, digits=4)
+
format(pi*100, digits=4)
## [1] "314.2"
-
format(pi*100, nsmall=4)
+
format(pi*100, nsmall=4)
## [1] "314.1593"
-
format(pi*100, nsmall=2, digits=4)
+
format(pi*100, nsmall=2, digits=4)
## [1] "314.16"
-
-

15. Create a user-defined summary statistic

+
+

15. Create a user-defined summary statistic

For purposes of this example, the code below creates a trimmed mean function (trims 10%) and use that to summarize the data. Note the use of the ... which tells R to pass extra arguments on - this is required for user-defined functions. In this case, na.rm=T is passed to myfunc. The weights argument is also required, even though it isn’t passed on to the internal function in this particular example.

-
myfunc <- function(x, weights=rep(1,length(x)), ...){
-  mean(x, trim=.1, ...)
+
myfunc <- function(x, weights=rep(1,length(x)), ...){
+  mean(x, trim=.1, ...)
 }
 
-summary(tableby(sex ~ hgb, data=mockstudy, 
-                control=tableby.control(numeric.stats=c("Nmiss","myfunc"), numeric.test="kwt",
-                    stats.labels=list(Nmiss='Missing values', myfunc="Trimmed Mean, 10%"))))
+summary(tableby(sex ~ hgb, data=mockstudy, + control=tableby.control(numeric.stats=c("Nmiss","myfunc"), numeric.test="kwt", + stats.labels=list(Nmiss='Missing values', myfunc="Trimmed Mean, 10%"))))
@@ -3715,20 +3680,20 @@

15. Create a user-defined summary statistic

-
-

16. Use case-weights for creating summary statistics

+
+

16. Use case-weights for creating summary statistics

When comparing groups, they are often unbalanced when it comes to nuisances such as age and sex. The tableby function allows you to create weighted summary statistics. If this option us used then p-values are not calculated (test=FALSE).

-
##create fake group that is not balanced by age/sex 
-set.seed(200)
-mockstudy$fake_arm <- ifelse(mockstudy$age>60 & mockstudy$sex=='Female',sample(c('A','B'),replace=T, prob=c(.2,.8)),
-                            sample(c('A','B'),replace=T, prob=c(.8,.4)))
+
##create fake group that is not balanced by age/sex 
+set.seed(200)
+mockstudy$fake_arm <- ifelse(mockstudy$age>60 & mockstudy$sex=='Female',sample(c('A','B'),replace=T, prob=c(.2,.8)),
+                            sample(c('A','B'),replace=T, prob=c(.8,.4)))
 
-mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE)
+mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE)
 
 ## create weights based on agegp and sex distribution
-tab1 <- with(mockstudy,table(agegp, sex))
-tab2 <- with(mockstudy, table(agegp, sex, fake_arm))
-tab2
+tab1 <- with(mockstudy,table(agegp, sex)) +tab2 <- with(mockstudy, table(agegp, sex, fake_arm)) +tab2
## , , fake_arm = A
 ## 
 ##          sex
@@ -3746,15 +3711,15 @@ 

16. Use case-weights for creating summary statistics

## [50,60) 130 84 ## [60,70) 156 166 ## [70,90) 109 122
-
gpwts <- rep(tab1, length(unique(mockstudy$fake_arm)))/tab2
-gpwts[gpwts>50] <- 30
+
gpwts <- rep(tab1, length(unique(mockstudy$fake_arm)))/tab2
+gpwts[gpwts>50] <- 30
 
 ## apply weights to subjects
-index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(fake_arm)))) 
-mockstudy$wts <- gpwts[index]
+index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(fake_arm)))) 
+mockstudy$wts <- gpwts[index]
 
 ## show weights by treatment arm group
-tapply(mockstudy$wts,mockstudy$fake_arm, summary)
+tapply(mockstudy$wts,mockstudy$fake_arm, summary)
## $A
 ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 ##   1.774   1.894   2.069   2.276   2.082  24.710 
@@ -3762,8 +3727,8 @@ 

16. Use case-weights for creating summary statistics

## $B ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.000 1.042 1.924 1.677 1.985 2.292
-
orig <- tableby(fake_arm ~ age + sex + Surv(fu.time/365, fu.stat), data=mockstudy, test=FALSE)
-summary(orig, title='No Case Weights used')
+
orig <- tableby(fake_arm ~ age + sex + Surv(fu.time/365, fu.stat), data=mockstudy, test=FALSE)
+summary(orig, title='No Case Weights used')
@@ -3843,8 +3808,8 @@

16. Use case-weights for creating summary statistics

No Case Weights used
-
tab1 <- tableby(fake_arm ~ age + sex + Surv(fu.time/365, fu.stat), data=mockstudy, weights=wts)
-summary(tab1, title='Case Weights used')
+
tab1 <- tableby(fake_arm ~ age + sex + Surv(fu.time/365, fu.stat), data=mockstudy, weights=wts)
+summary(tab1, title='Case Weights used')
@@ -3925,15 +3890,15 @@

16. Use case-weights for creating summary statistics

Case Weights used
-
-

17. Create your own p-value and add it to the table

+
+

17. Create your own p-value and add it to the table

When using weighted summary statistics, it is often desirable to then show a p-value from a model that corresponds to the weighted analysis. It is possible to add your own p-value and modify the column title for that new p-value. Another use for this would be to add standardized differences or confidence intervals instead of a p-value.

To add the p-value you simply need to create a data frame and use the function modpval.tableby. The first 2 columns in the dataframe are required and are the variable name and the new p-value. The third column can be used to indicate what method was used to calculate the p-value. If you specify use.pname=TRUE then the column name indicating the p-value will be also be used in the tableby summary.

-
mypval <- data.frame(variable=c('age','sex','Surv(fu.time/365, fu.stat)'), 
-                     adj.pvalue=c(.953,.811,.01), 
-                     method=c('Age/Sex adjusted model results'))
-tab2 <- modpval.tableby(tab1, mypval, use.pname=TRUE)
-summary(tab2, title='Case Weights used, p-values added') #, pfootnote=TRUE)
+
mypval <- data.frame(variable=c('age','sex','Surv(fu.time/365, fu.stat)'), 
+                     adj.pvalue=c(.953,.811,.01), 
+                     method=c('Age/Sex adjusted model results'))
+tab2 <- modpval.tableby(tab1, mypval, use.pname=TRUE)
+summary(tab2, title='Case Weights used, p-values added') #, pfootnote=TRUE)
@@ -4026,13 +3991,13 @@

17. Create your own p-value and add it to the table

Case Weights used, p-values added
-
-

18. For two-level categorical variables, only display one level.

+
+

18. For two-level categorical variables, only display one level.

If the cat.simplify option is set to TRUE then only the second level of the group. In the example below sex has the levels and “Female” is the second level, hence only the %female is shown in the table. Similarly, “mdquality.s” was turned to a factor and “1” is the second level, hence

-
levels(mockstudy$sex)
+
levels(mockstudy$sex)

[1] “Male” “Female”

-
table2 <- tableby(arm~sex + factor(mdquality.s), data=mockstudy, cat.simplify=TRUE)
-summary(table2, labelTranslations=c(sex="Female", "factor(mdquality.s)"="MD Quality"))
+
table2 <- tableby(arm~sex + factor(mdquality.s), data=mockstudy, cat.simplify=TRUE)
+summary(table2, labelTranslations=c(sex="Female", "factor(mdquality.s)"="MD Quality"))
@@ -4088,8 +4053,8 @@

18. For two-level categorical variables, only display one level.

-
-

19. Use tableby within an Sweave document

+
+

19. Use tableby within an Sweave document

For those users who wish to create tables within an Sweave document, the following code seems to work.

\documentclass{article}
 
@@ -4120,11 +4085,11 @@ 

19. Use tableby within an Sweave document

\end{document}
-
-

20. Export tableby object to a .CSV file

+
+

20. Export tableby object to a .CSV file

When looking at multiple variables it is sometimes useful to export the results to a csv file. The as.data.frame function creates a data frame object that can be exported or further manipulated within R.

-
tab1 <- tableby(arm~sex+age, data=mockstudy)
-summary(tab1, text=T)
+
tab1 <- tableby(arm~sex+age, data=mockstudy)
+summary(tab1, text=T)
## ---------------------------------------------------------------------------------------------------------------------------
 ##                          A: IFL (N=428)      F: FOLFOX (N=691)   G: IROX (N=380)     Total (N=1499)      p value           
 ## ----------------------- ------------------- ------------------- ------------------- ------------------- -------------------
@@ -4136,40 +4101,32 @@ 

20. Export tableby object to a .CSV file

## Q1, Q3 53, 68 52, 69 52, 68 52, 68 ## Range 27 - 88 19 - 88 26 - 85 19 - 88 ## ---------------------------------------------------------------------------------------------------------------------------
-
tmp <- as.data.frame(tab1)
-tmp
-
##           term variable A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380)
-## 1       Gender      sex                                                 
-## 2         Male      sex    277 (64.7%)       411 (59.5%)       228 (60%)
-## 3       Female      sex    151 (35.3%)       280 (40.5%)       152 (40%)
-## 4 Age in Years      age                                                 
-## 5    Mean (SD)      age    59.7 (11.4)       60.3 (11.6)     59.8 (11.5)
-## 6       Q1, Q3      age         53, 68            52, 69          52, 68
-## 7        Range      age        27 - 88           19 - 88         26 - 85
-##   Total (N=1499) p value
-## 1                  0.190
-## 2    916 (61.1%)        
-## 3    583 (38.9%)        
-## 4                  0.614
-## 5      60 (11.5)        
-## 6         52, 68        
-## 7        19 - 88
-
# write.csv(tmp, '/my/path/here/mymodel.csv')
+
tmp <- as.data.frame(tab1)
+tmp
+
##           term variable A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
+## 1       Gender      sex                                                                   0.190
+## 2         Male      sex    277 (64.7%)       411 (59.5%)       228 (60%)    916 (61.1%)        
+## 3       Female      sex    151 (35.3%)       280 (40.5%)       152 (40%)    583 (38.9%)        
+## 4 Age in Years      age                                                                   0.614
+## 5    Mean (SD)      age    59.7 (11.4)       60.3 (11.6)     59.8 (11.5)      60 (11.5)        
+## 6       Q1, Q3      age         53, 68            52, 69          52, 68         52, 68        
+## 7        Range      age        27 - 88           19 - 88         26 - 85        19 - 88
+
# write.csv(tmp, '/my/path/here/mymodel.csv')
-
-

21. Write tableby object to a separate Word or HTML file

-
## write to an HTML document
-tab1 <- tableby(arm ~ sex + age, data=mockstudy)
-# write2html(tab1, "~/ibm/trash.html")
+
+

21. Write tableby object to a separate Word or HTML file

+
## write to an HTML document
+tab1 <- tableby(arm ~ sex + age, data=mockstudy)
+write2html(tab1, "~/trash.html")
 
 ## write to a Word document
-# write2word(tab1, "~/ibm/trash.doc", title="My table in Word")
+write2word(tab1, "~/trash.doc", title="My table in Word")

Available Function Options

-
-

Summary statistics

+
+

Summary statistics

The default summary statistics, by varible type, are:

  • cont: Continuous variables will show by default Nmiss, meansd, q1q3, range
  • @@ -4193,8 +4150,8 @@

    Summary statistics

  • medsurv: print the median survival
-
-

Testing options

+
+

Testing options

The tests used to calculate p-values differ by the variable type, but can be specified explicitly in the formula statement or in the control function.

The following tests are accepted:

    @@ -4206,10 +4163,10 @@

    Testing options

  • trend: The independence_test function from the coin is used to test for trends. Whenthe grouping variable has two levels, it is equivalent to the Armitage trend test. This is the default for ordered factors

-
-

tableby.control settings

+
+

tableby.control settings

A quick way to see what arguments are possible to utilize in a function is to use the args() command. Settings involving the number of digits can be set in tableby.control or in summary.tableby.

-
args(tableby.control)
+
args(tableby.control)
## function (test = TRUE, total = TRUE, test.pname = NULL, cat.simplify = FALSE, 
 ##     numeric.test = "anova", cat.test = "chisq", ordered.test = "trend", 
 ##     surv.test = "logrank", date.test = "kwt", numeric.stats = c("Nmiss", 
@@ -4239,10 +4196,10 @@ 

tableby.control settings

  • surv.test, surv.stats
  • -
    -

    summary.tableby settings

    -

    The summary.tableby function has options that modify how the table appears (such as adding a title or modifying labels).

    -
    args(arsenal:::summary.tableby)
    +
    +

    summary.tableby settings

    +

    The summary.tableby function has options that modify how the table appears (such as adding a title or modifying labels).

    +
    args(arsenal:::summary.tableby)
    ## function (object, title = NULL, labelTranslations = NULL, digits = NA, 
     ##     nsmall = NA, nsmall.pct = NA, digits.test = NA, text = FALSE, 
     ##     removeBlanks = text, labelSize = 1.2, test = NA, test.pname = NA, 
    @@ -4267,19 +4224,6 @@ 

    summary.tableby settings

    - -
    - - - + + + diff --git a/man/arsenal.Rd b/man/arsenal.Rd index 7452805..77734dd 100644 --- a/man/arsenal.Rd +++ b/man/arsenal.Rd @@ -9,6 +9,8 @@ An Arsenal of 'R' functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in 'R' and 'RStudio' and which use formulas and versatile summary statistics for summary tables and models. + +The package download, NEWS, and README are available on CRAN: \url{https://cran.r-project.org/package=arsenal} } \section{Functions}{ @@ -23,6 +25,8 @@ Below are listed some of the most widely used functions available in \code{arsen \code{\link{write2word}}, \code{\link{write2html}}, \code{\link{write2pdf}}: Functions to generate a word, html, or pdf document containing a single table. +\code{\link{write2}}: Functions to generate a single document containing a single table. (Also the S3 backbone behind the \code{write2*} functions.) + \code{\link{formulize}}: A shortcut to generate one-, two-, or many-sided formulas. \code{\link{mdy.Date}} and \code{\link{Date.mdy}}: Convert numeric dates for month, day, and year to Date object, and vice versa. diff --git a/man/as.data.frame.modelsum.Rd b/man/as.data.frame.modelsum.Rd index 0bf4fe8..ed42a90 100644 --- a/man/as.data.frame.modelsum.Rd +++ b/man/as.data.frame.modelsum.Rd @@ -17,7 +17,7 @@ \item{title}{Title for the table, defaults to \code{NULL} (no title)} \item{labelTranslations}{List where name is the label in the output, and value is the label you -want displayed e.g. \code{list (q1q3: "Q1, Q3", medsurv = "Median Survival")}.} +want displayed e.g. \code{list(q1q3 = "Q1, Q3", medsurv = "Median Survival")}.} \item{digits}{Maximum number of digits to display for floating point numbers. If \code{NA} (default), it uses the value from \code{object$control$digits} diff --git a/man/modelsum.Rd b/man/modelsum.Rd index dcbe264..1fad245 100644 --- a/man/modelsum.Rd +++ b/man/modelsum.Rd @@ -59,13 +59,14 @@ Fit and summarize models for each independent (x) variable with a response varia data(mockstudy) -tab1 <- modelsum(bmi ~ sex + age, data=mockstudy) -summary(tab1, text=TRUE) +tab1 <- modelsum(bmi ~ sex + age, data = mockstudy) +summary(tab1, text = TRUE) -tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~age + sex, family="gaussian",data=mockstudy) -summary(tab2, text=TRUE) +tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust = ~ age + sex, + family = "gaussian", data = mockstudy) +summary(tab2, text = TRUE) -summary(tab2, show.intercept=FALSE, text=TRUE) +summary(tab2, show.intercept = FALSE, text = TRUE) tab2.df <- as.data.frame(tab2) diff --git a/man/summary.modelsum.Rd b/man/summary.modelsum.Rd index f19222b..7ffe351 100644 --- a/man/summary.modelsum.Rd +++ b/man/summary.modelsum.Rd @@ -15,7 +15,7 @@ \item{title}{Title for the table, defaults to \code{NULL} (no title)} \item{labelTranslations}{List where name is the label in the output, and value is the label you -want displayed e.g. \code{list (q1q3: "Q1, Q3", medsurv = "Median Survival")}.} +want displayed e.g. \code{list(q1q3 = "Q1, Q3", medsurv = "Median Survival")}.} \item{digits}{Maximum number of digits to display for floating point numbers. If \code{NA} (default), it uses the value from \code{object$control$digits} diff --git a/man/tableby.Rd b/man/tableby.Rd index 79879de..eeaeddc 100644 --- a/man/tableby.Rd +++ b/man/tableby.Rd @@ -116,7 +116,7 @@ data(mockstudy) tab1 <- tableby(arm ~ sex + age, data=mockstudy) summary(tab1, text=TRUE) -mylabels <- list( sex = "SEX", age ="Age, yrs") +mylabels <- list(sex = "SEX", age ="Age, yrs") summary(tab1, labelTranslations = mylabels, text=TRUE) tab3 <- tableby(arm ~ sex + age, data=mockstudy, test=FALSE, total=FALSE, diff --git a/man/write2.Rd b/man/write2.Rd index 6d194ac..0d15c5b 100644 --- a/man/write2.Rd +++ b/man/write2.Rd @@ -2,64 +2,93 @@ % Please edit documentation in R/write2.R \name{write2} \alias{write2} -\alias{write2html} -\alias{write2pdf} -\alias{write2word} -\title{write2word, write2html, write2pdf} +\alias{write2.character} +\alias{write2.default} +\alias{write2.freqlist} +\alias{write2.knitr_kable} +\alias{write2.modelsum} +\alias{write2.tableby} +\alias{write2.xtable} +\title{write2} \usage{ -write2word(object, file, ..., keep.md = FALSE) +write2(object, file, ..., keep.md, output_format) -write2pdf(object, file, ..., keep.md = FALSE) +\method{write2}{tableby}(object, file, ..., keep.md = FALSE, + output_format = NULL) -write2html(object, file, ..., keep.md = FALSE) +\method{write2}{modelsum}(object, file, ..., keep.md = FALSE, + output_format = NULL) + +\method{write2}{freqlist}(object, file, ..., keep.md = FALSE, + output_format = NULL) + +\method{write2}{knitr_kable}(object, file, ..., keep.md = FALSE, + output_format = NULL) + +\method{write2}{xtable}(object, file, ..., keep.md = FALSE, + output_format = NULL) + +\method{write2}{character}(object, file, ..., keep.md = FALSE, + output_format = NULL) + +\method{write2}{default}(object, file, FUN, ..., keep.md = FALSE, + output_format = NULL) } \arguments{ -\item{object}{An object whose \code{summary} output looks "good" when using \code{results='asis'} in markdown.} +\item{object}{An object.} \item{file}{A single character string denoting the filename for the output document.} -\item{...}{Additional arguments to be passed to \code{summary}, \code{rmarkdown::render}, etc. +\item{...}{Additional arguments to be passed to \code{FUN}, \code{rmarkdown::render}, etc. One popular option is to use \code{quiet = TRUE} to suppress the command line output.} \item{keep.md}{Logical, denoting whether to keep the intermediate \code{.md} file.} + +\item{output_format}{One of the following: +\enumerate{ + \item{An output format object, e.g. \code{rmarkdown::\link[rmarkdown]{html_document}(...)}.} + \item{A character string denoting such a format function, e.g. \code{"html_document"}. In this case, the \code{"..."} are NOT passed.} + \item{The format function itself, e.g. \code{rmarkdown::html_document}. In this case, the \code{"..."} arguments are passed.} + \item{One of \code{"html"}, \code{"pdf"}, and \code{"word"}, shortcuts implemented here. In this case, the \code{"..."} arguments are passed.} + \item{\code{NULL}, in which the output is HTML by default.} +} +See \code{rmarkdown::\link[rmarkdown]{render}} for details.} + +\item{FUN}{The summary-like or print-like function to use to generate the markdown content. Can be passed as a function or a +character string. It's expected that \code{FUN(object, ...)} looks "good" when using \code{results='asis'} in markdown.} } \value{ \code{object} is returned invisibly, and \code{file} is written. } \description{ -Functions to generate a word, html, or pdf document containing a single table. +Functions to generate a single document containing a single table. (Also the S3 backbone behind the \code{write2*} functions.) } \details{ -This is (kind of) an S3 method (the real S3 method is \code{write2}),and the default - (used for \code{\link{tableby}}, \code{\link{modelsum}}, \code{\link{freqlist}}, etc.) assumes - that there is a \code{summary} method implemented. - - To generate the appropriate file type, the default uses one of \code{rmarkdown::word_document}, \code{rmarkdown::html_document}, - and \code{rmarkdown::pdf_document} to get the job done. \code{"..."} arguments are passed to these functions, too. +\code{write2} is an S3 method, and the default + assumes that there is a \code{FUN} method implemented which looks 'good' in Rmarkdown. + + There are methods implemented for \code{\link{tableby}}, \code{\link{modelsum}}, and \code{\link{freqlist}}, all of which use the + \code{summary} function. There are also methods compatible with \code{\link[knitr]{kable}}, \code{\link[xtable]{xtable}}, + and \code{\link[pander]{pander_return}}. } \examples{ \dontrun{ data(mockstudy) # tableby example tab1 <- tableby(arm ~ sex + age, data=mockstudy) -write2html(tab1, "~/ibm/trash.html") - -# freqlist example -tab.ex <- table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany") -noby <- freqlist(tab.ex, na.options = "include") -write2pdf(noby, "~/ibm/trash2.pdf") - -# A more complicated example -write2word(tab1, "~/ibm/trash.doc", keep.md = TRUE, - reference_docx = mystyles.docx, # passed to rmarkdown::word_document +write2(tab1, "~/trash.rtf", + toc = TRUE, # passed to rmarkdown::rtf_document, though in this case it's not practical quiet = TRUE, # passed to rmarkdown::render - title = "My cool new title" # passed to summary.tableby + title = "My cool new title", # passed to summary.tableby + output_format = rmarkdown::rtf_document) } } \author{ Ethan Heinzen, adapted from code from Krista Goergen } \seealso{ -\code{\link[rmarkdown]{render}}, \code{\link[rmarkdown]{word_document}}, \code{\link[rmarkdown]{html_document}}, \code{\link[rmarkdown]{pdf_document}} +\code{\link{write2word}}, \code{\link{write2pdf}}, \code{\link{write2html}}, + \code{\link[rmarkdown]{render}}, \code{\link[rmarkdown]{word_document}}, \code{\link[rmarkdown]{html_document}}, \code{\link[rmarkdown]{pdf_document}}, + \code{\link[rmarkdown]{rtf_document}}, \code{\link[rmarkdown]{md_document}}, \code{\link[rmarkdown]{odt_document}} } diff --git a/man/write2specific.Rd b/man/write2specific.Rd new file mode 100644 index 0000000..a62d2df --- /dev/null +++ b/man/write2specific.Rd @@ -0,0 +1,62 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/write2.R +\name{write2specific} +\alias{write2html} +\alias{write2pdf} +\alias{write2specific} +\alias{write2word} +\title{write2word, write2html, write2pdf} +\usage{ +write2word(object, file, ..., keep.md = FALSE) + +write2pdf(object, file, ..., keep.md = FALSE) + +write2html(object, file, ..., keep.md = FALSE) +} +\arguments{ +\item{object}{An object.} + +\item{file}{A single character string denoting the filename for the output document.} + +\item{...}{Additional arguments to be passed to \code{FUN}, \code{rmarkdown::render}, etc. +One popular option is to use \code{quiet = TRUE} to suppress the command line output.} + +\item{keep.md}{Logical, denoting whether to keep the intermediate \code{.md} file.} +} +\value{ +\code{object} is returned invisibly, and \code{file} is written. +} +\description{ +Functions to generate a word, html, or pdf document containing a single table. +} +\details{ +To generate the appropriate file type, the \code{write2*} functions use one of \code{rmarkdown::word_document}, \code{rmarkdown::html_document}, + and \code{rmarkdown::pdf_document} to get the job done. \code{"..."} arguments are passed to these functions, too. +} +\examples{ +\dontrun{ +data(mockstudy) +# tableby example +tab1 <- tableby(arm ~ sex + age, data=mockstudy) +write2html(tab1, "~/trash.html") + +# freqlist example +tab.ex <- table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany") +noby <- freqlist(tab.ex, na.options = "include") +write2pdf(noby, "~/trash2.pdf") + +# A more complicated example +write2word(tab1, "~/trash.doc", + keep.md = TRUE, + reference_docx = mystyles.docx, # passed to rmarkdown::word_document + quiet = TRUE, # passed to rmarkdown::render + title = "My cool new title") # passed to summary.tableby +} +} +\author{ +Ethan Heinzen, adapted from code from Krista Goergen +} +\seealso{ +\code{\link{write2}} +} + diff --git a/tests/testthat/test_formulize.R b/tests/testthat/test_formulize.R index 7db4245..26bea20 100644 --- a/tests/testthat/test_formulize.R +++ b/tests/testthat/test_formulize.R @@ -98,3 +98,7 @@ test_that("Two-sided formula, mixed input", { ) }) + +########################################################################################################### +#### Reported bugs for formulize +########################################################################################################### \ No newline at end of file diff --git a/tests/testthat/test_freqlist.R b/tests/testthat/test_freqlist.R index 243e835..65dbce5 100644 --- a/tests/testthat/test_freqlist.R +++ b/tests/testthat/test_freqlist.R @@ -251,3 +251,8 @@ test_that("Changing the labels", { ) }) + + +########################################################################################################### +#### Reported bugs for freqlist +########################################################################################################### diff --git a/tests/testthat/test_modelsum.R b/tests/testthat/test_modelsum.R index f9352b6..ab73e7e 100644 --- a/tests/testthat/test_modelsum.R +++ b/tests/testthat/test_modelsum.R @@ -24,7 +24,7 @@ attr(mdat$trt, "label") <- "Treatment Arm" attr(mdat$Age, "label") <- "Age in Years" ########################################################################################################### -#### Basic two-sided tableby +#### Basic modelsum call ########################################################################################################### test_that("A basic modelsum call--no labels, no missings", { @@ -110,3 +110,38 @@ test_that("A basic modelsum call--suppressing intercept and/or adjustment vars", ) ) }) + +########################################################################################################### +#### Reported bugs for modelsum +########################################################################################################### + +set.seed(3248) +dat <- data.frame(short.name = rnorm(100), really.long.name = rnorm(100), + why.would.you.name.something = rnorm(100), + as.long.as.this = rnorm(100)) + + +test_that("01/26/2017: Brendan Broderick's Bold Text Wrapping Problem", { + expect_identical( + capture.output(summary(modelsum(short.name ~ really.long.name + as.long.as.this, adjust = ~ why.would.you.name.something, data = dat))), + c("--------------------------------------------------------------------------------------------", + " estimate std.error p.value adj.r.squared ", + "-------------------- ----------------- ----------------- ----------------- -----------------", + "(Intercept) 0.035 0.099 0.721 -0.001 ", + "" , + "**really.long.name** 0.099 0.099 0.319 . ", + "" , + "**why.would.you.name -0.083 0.09 0.361 . ", + ".something** ", + "" , + "(Intercept) 0.048 0.097 0.624 0.023 ", + "" , + "**as.long.as.this** 0.198 0.106 0.066 . ", + "" , + "**why.would.you.name -0.09 0.089 0.314 . ", + ".something** ", + "" , + "--------------------------------------------------------------------------------------------" + ) + ) +}) diff --git a/tests/testthat/test_tableby.R b/tests/testthat/test_tableby.R index 6c6c682..a53d673 100644 --- a/tests/testthat/test_tableby.R +++ b/tests/testthat/test_tableby.R @@ -397,5 +397,8 @@ test_that("Changing tests", { }) +########################################################################################################### +#### Reported bugs for tableby +########################################################################################################### diff --git a/tests/testthat/test_write2.R b/tests/testthat/test_write2.R new file mode 100644 index 0000000..ca65fce --- /dev/null +++ b/tests/testthat/test_write2.R @@ -0,0 +1,92 @@ +## Tests for write2 + + +context("Testing the write2 output") + +data(mockstudy) + +expect_write2_worked <- function(FUN, object, reference, ...) +{ + FUN <- match.fun(FUN) + filename <- tempfile(fileext = ".html") + on.exit(expect_true(file.remove(filename))) + if(!file.exists(reference)) skip("Couldn't find the reference file.") + if(!file.create(filename)) skip("Couldn't create the temporary file.") + if(!grepl("/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/", getwd(), fixed = TRUE)) skip("These tests only run in Ethan's space.") + FUN(object, file = filename, ...) + generated <- readLines(filename) + expect_output_file(cat(generated, sep = "\n"), reference) +} + +########################################################################################################### +#### HTML output +########################################################################################################### + +test_that("write2.tableby -> HTML", { + expect_write2_worked(write2html, tableby(arm ~ sex + age, data=mockstudy), reference = "write2.tableby.html", quiet = TRUE, + title = "My test table", labelTranslations = list(sex = "SEX", age ="Age, yrs"), total = FALSE) +}) + +test_that("write2.modelsum -> HTML", { + expect_write2_worked(write2html, modelsum(alk.phos ~ arm + ps + hgb, adjust= ~ age + sex, family = "gaussian", data = mockstudy), + reference = "write2.modelsum.html", quiet = TRUE, + title = "My test table", show.intercept = FALSE, digits = 5) +}) + +test_that("write2.freqlist -> HTML", { + expect_write2_worked(write2html, freqlist(table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany"), groupBy = c("arm", "sex")), + reference = "write2.freqlist.html", quiet = TRUE, single = TRUE) +}) + +test_that("write2.knitr_kable -> HTML", { + if(require(knitr)) + { + expect_write2_worked(write2html, knitr::kable(head(mockstudy)), reference = "write2.kable.html", quiet = TRUE) + } else skip("library(knitr) not available.") +}) + +test_that("write2.xtable -> HTML", { + if(require(xtable)) + { + expect_write2_worked(write2html, xtable::xtable(head(mockstudy), caption = "My xtable"), reference = "write2.xtable.html", quiet = TRUE, + type = "html", comment = FALSE, include.rownames = FALSE, caption.placement = 'top') + } else skip("library(xtable) not available.") +}) + + +test_that("write2.character (pander) -> HTML", { + if(require(pander)) + { + expect_write2_worked(write2html, pander::pander_return(head(mockstudy)), reference = "write2.pander.html", quiet = TRUE) + } else skip("library(pander) not available.") +}) + +########################################################################################################### +#### Code used to generate the files +########################################################################################################### +# +# write2html(tableby(arm ~ sex + age, data=mockstudy), "/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/tests/testthat/write2.tableby.html", +# title = "My test table", labelTranslations = list(sex = "SEX", age ="Age, yrs"), total = FALSE) +# +# write2html(modelsum(alk.phos ~ arm + ps + hgb, adjust= ~ age + sex, family = "gaussian", data = mockstudy), +# "/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/tests/testthat/write2.modelsum.html", +# title = "My test table", show.intercept = FALSE, digits = 5) +# +# write2html(freqlist(table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany"), groupBy = c("arm", "sex")), +# "/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/tests/testthat/write2.freqlist.html", single = TRUE) +# +# write2html(knitr::kable(head(mockstudy)), +# "/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/tests/testthat/write2.kable.html") +# +# write2html(xtable::xtable(head(mockstudy), caption = "My xtable"), +# "/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/tests/testthat/write2.xtable.html", +# type = "html", comment = FALSE, include.rownames = FALSE, caption.placement = "top") +# +# write2html(pander::pander_return(head(mockstudy)), +# "/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/tests/testthat/write2.pander.html") + +########################################################################################################### +#### Reported bugs for write2 +########################################################################################################### + + diff --git a/tests/testthat/write2.freqlist.html b/tests/testthat/write2.freqlist.html new file mode 100644 index 0000000..dca7c90 --- /dev/null +++ b/tests/testthat/write2.freqlist.html @@ -0,0 +1,326 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    armsexmdquality.sFreqcumFreqfreqPercentcumPercent
    A: IFLMale0292910.4710.47
    121424377.2687.73
    NA3427712.27100.00
    Female012127.957.95
    111813078.1586.09
    NA2115113.91100.00
    F: FOLFOXMale031317.547.54
    128531669.3476.89
    NA9541123.11100.00
    Female021217.507.50
    119821970.7178.21
    NA6128021.79100.00
    G: IROXMale017177.467.46
    118720482.0289.47
    NA2422810.53100.00
    Female014149.219.21
    112113579.6188.82
    NA1715211.18100.00
    + + + + +
    + + + + + + + + diff --git a/tests/testthat/write2.kable.html b/tests/testthat/write2.kable.html new file mode 100644 index 0000000..9b20e4e --- /dev/null +++ b/tests/testthat/write2.kable.html @@ -0,0 +1,274 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    caseagearmsexracefu.timefu.statpshgbbmialk.phosastmdquality.sage.ord
    111075467F: FOLFOXMaleCaucasian9222011.525.0986116035NA60-69
    29970674A: IFLFemaleCaucasian2702110.719.4978629052170-79
    410527150A: IFLFemaleCaucasian1752111.1NA700100140-49
    510500171G: IROXFemaleCaucasian1282112.629.4292277168170-79
    711226369F: FOLFOXFemaleNA2332013.026.3535235035NA60-69
    88620556G: IROXMaleCaucasian1202010.219.0367356927150-59
    + + + + +
    + + + + + + + + diff --git a/tests/testthat/write2.modelsum.html b/tests/testthat/write2.modelsum.html new file mode 100644 index 0000000..90a931d --- /dev/null +++ b/tests/testthat/write2.modelsum.html @@ -0,0 +1,240 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + + + + + + + + +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    My test table
    estimatestd.errorp.valueadj.r.squared
    Treatment Arm F: FOLFOX-13.78.72960.117-7e-04
    Treatment Arm G: IROX-2.2459.860.820.
    sex Female3.0167.5210.688.
    Age in Years-0.017410.318780.956.
    ps46.7215.987<0.0010.04501
    sex Female1.16937.3430.874.
    Age in Years-0.08420.311080.787.
    hgb-13.842.137<0.0010.03078
    sex Female-5.987.51620.426.
    Age in Years0.094550.314030.763.
    + + + + +
    + + + + + + + + diff --git a/tests/testthat/write2.pander.html b/tests/testthat/write2.pander.html new file mode 100644 index 0000000..ab1acf5 --- /dev/null +++ b/tests/testthat/write2.pander.html @@ -0,0 +1,322 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + + + + + + + + ++++++++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table continues below
     caseagearmsexracefu.timefu.statpshgb
    111075467F: FOLFOXMaleCaucasian9222011.5
    29970674A: IFLFemaleCaucasian2702110.7
    410527150A: IFLFemaleCaucasian1752111.1
    510500171G: IROXFemaleCaucasian1282112.6
    711226369F: FOLFOXFemaleNA2332013
    88620556G: IROXMaleCaucasian1202010.2
    + ++++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
     bmialk.phosastmdquality.sage.ord
    125.116035NA60-69
    219.529052170-79
    4NA700100140-49
    529.4377168170-79
    726.3535035NA60-69
    819.0456927150-59
    + + + + +
    + + + + + + + + diff --git a/tests/testthat/write2.tableby.html b/tests/testthat/write2.tableby.html new file mode 100644 index 0000000..51972b4 --- /dev/null +++ b/tests/testthat/write2.tableby.html @@ -0,0 +1,219 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + + + + + + + + +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    My test table
    A: IFL (N=428)F: FOLFOX (N=691)G: IROX (N=380)p value
    SEX0.190
        Male277 (64.7%)411 (59.5%)228 (60%)
        Female151 (35.3%)280 (40.5%)152 (40%)
    Age, yrs0.614
        Mean (SD)59.7 (11.4)60.3 (11.6)59.8 (11.5)
        Q1, Q353, 6852, 6952, 68
        Range27 - 8819 - 8826 - 85
    + + + + +
    + + + + + + + + diff --git a/tests/testthat/write2.xtable.html b/tests/testthat/write2.xtable.html new file mode 100644 index 0000000..8c7b931 --- /dev/null +++ b/tests/testthat/write2.xtable.html @@ -0,0 +1,458 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +My xtable +
    +case + +age + +arm + +sex + +race + +fu.time + +fu.stat + +ps + +hgb + +bmi + +alk.phos + +ast + +mdquality.s + +age.ord +
    +110754 + +67 + +F: FOLFOX + +Male + +Caucasian + +922 + +2 + +0 + +11.50 + +25.10 + +160 + +35 + + +60-69 +
    +99706 + +74 + +A: IFL + +Female + +Caucasian + +270 + +2 + +1 + +10.70 + +19.50 + +290 + +52 + +1 + +70-79 +
    +105271 + +50 + +A: IFL + +Female + +Caucasian + +175 + +2 + +1 + +11.10 + + +700 + +100 + +1 + +40-49 +
    +105001 + +71 + +G: IROX + +Female + +Caucasian + +128 + +2 + +1 + +12.60 + +29.43 + +771 + +68 + +1 + +70-79 +
    +112263 + +69 + +F: FOLFOX + +Female + + +233 + +2 + +0 + +13.00 + +26.35 + +350 + +35 + + +60-69 +
    +86205 + +56 + +G: IROX + +Male + +Caucasian + +120 + +2 + +0 + +10.20 + +19.04 + +569 + +27 + +1 + +50-59 +
    + + + + +
    + + + + + + + + diff --git a/vignettes/freqlist.Rmd b/vignettes/freqlist.Rmd index 74111f0..61b9e75 100644 --- a/vignettes/freqlist.Rmd +++ b/vignettes/freqlist.Rmd @@ -3,12 +3,9 @@ title: "The freqlist function" author: "Tina Gunderson" date: '`r format(Sys.time(),"%d %B, %Y")`' output: - pdf_document: + rmarkdown::html_vignette: toc: yes toc_depth: 3 - html_document: - toc: yes - toc_depth: '3' header-includes: \usepackage{tabularx} vignette: | %\VignetteIndexEntry{The freqlist function} @@ -19,11 +16,6 @@ vignette: | ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, tidy.opts=list(width.cutoff=80), tidy=TRUE, comment=NA) options(width=80, max.print=1000) - -require(arsenal) -# source("/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/R/freqlist.R") -# source("/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/R/summary.freqlist.R") -# source("/data5/bsi/adhoc/s200555.R-infrastructure/devel/eph/arsenal-eph/R/freqlist.internal.R") ``` \newpage @@ -34,6 +26,10 @@ require(arsenal) `freqlist` provides options for handling missing or sparse data and can provide cumulative counts and percentages based on subgroups. It depends on the `knitr` package for printing. +```{r message = FALSE} +require(arsenal) +``` + ## Sample dataset For our examples, we'll load the `mockstudy` data included with this package and use it to create a basic table. @@ -41,7 +37,7 @@ Because they have fewer levels, for brevity, we'll use the variables arm, sex, a We'll retain NAs in the table creation. See the appendix for notes regarding default NA handling and other useful information regarding tables in R. -```{r loading and setting up data} +```{r loading.data} # load the data data(mockstudy) @@ -68,7 +64,7 @@ noby <- freqlist(tab.ex) str(noby) # view the data frame portion of freqlist output -noby[["freqlist"]] +noby[["freqlist"]] ## or use as.data.frame(noby) ``` \newpage @@ -91,7 +87,11 @@ summary(noby, caption="Basic freqlist output") ``` You can also easily pull out the `freqlist` data frame for more complicated formatting or manipulation -(e.g. with another function such as `xtable` or `pander`). See below. +(e.g. with another function such as `xtable` or `pander`) using `as.data.frame`: + +```{r} +head(as.data.frame(noby)) +``` \newpage @@ -138,9 +138,11 @@ summary(freqlist(tab.ex, na.options="remove")) ## Frequency counts and percentages subset by factor levels -The groupBy argument internally subsets the data by the specified factor prior to calculating cumulative counts and percentages. By default, when used each subset will print in a separate table. Using the `single = TRUE` option when printing will collapse the subsetted result into a single table. +The groupBy argument internally subsets the data by the specified factor prior to calculating cumulative counts and percentages. +By default, when used each subset will print in a separate table. Using the `single = TRUE` option when printing will collapse +the subsetted result into a single table. -```{r frequency counts, results='asis'} +```{r freq.counts, results='asis'} withby <- freqlist(tab.ex, groupBy = c("arm","sex")) summary(withby) @@ -168,14 +170,12 @@ summary(noby, labelTranslations = c("Hi there", "What up", "Bye")) Fair warning: `xtable` has kind of a steep learning curve. These examples are given without explanation for more advanced users. -```{r xtable setup} +```{r xtable.setup} require(xtable) -#turn off xtable header -options(xtable.comment = FALSE) -#set up custom function for xtable text +# set up custom function for xtable text italic <- function(x){ -paste0('{\\emph{ ', x, '}}') +paste0('', x, '') } ``` @@ -187,7 +187,7 @@ xftbl <- xtable(noby[["freqlist"]], # change the column names names(xftbl)[1:3] <- c("Arm", "Gender", "LASA QOL") -print(xftbl, sanitize.colnames.function = italic, include.rownames = FALSE) +print(xftbl, sanitize.colnames.function = italic, include.rownames = FALSE, type = "html", comment = FALSE) ``` \newpage diff --git a/vignettes/modelsum.Rmd b/vignettes/modelsum.Rmd index 03dc754..97a23b6 100644 --- a/vignettes/modelsum.Rmd +++ b/vignettes/modelsum.Rmd @@ -3,15 +3,9 @@ title: "The modelsum function" author: "Beth Atkinson, Pat Votruba, Jason Sinnwell, Shannon McDonnell and Greg Dougherty" date: '`r format(Sys.time(),"%d %B, %Y")`' output: - html_document: - toc: yes - toc_depth: '3' - pdf_document: + rmarkdown::html_vignette: toc: yes toc_depth: 3 - word_document: - toc: yes - toc_depth: '3' vignette: | %\VignetteIndexEntry{The modelsum function} %\VignetteEncoding{UTF-8} @@ -26,17 +20,12 @@ require(MASS) require(pROC) require(rpart) -opts_chunk$set(comment = NA, echo=TRUE, prompt=TRUE ,collapse=TRUE) +opts_chunk$set(comment = NA, echo=TRUE, prompt=TRUE, collapse=TRUE) -#vignette: > -# %\VignetteIndexEntry{modelsum} -# %\VignetteEngine{knitr::rmarkdown} -# \usepackage[utf8]{inputenc} ``` -Introduction -============= +# Introduction Very often we are asked to summarize model results from multiple fits into a nice table. The endpoint might be of different types (e.g., survival, case/control, continuous) and there @@ -54,10 +43,9 @@ so the tables could be displayed within an R markdown report. This report provides step-by-step directions for using the functions associated with `modelsum`. All functions presented here are available within the `arsenal` package. An assumption is made that users are somewhat familiar with R markdown documents. For those who are new to the topic, a good initial -resource is available at [rmarkdown.rstudio.com](rmarkdown.rstudio.com). +resource is available at [rmarkdown.rstudio.com](http://rmarkdown.rstudio.com). -Simple Example -================ +# Simple Example The first step when using the `modelsum` function is to load the `arsenal` package. All the examples in this report use a dataset called `mockstudy` made available by Paul Novotny which includes a variety of types of variables @@ -82,7 +70,7 @@ If you want to take a quick look at the table, you can use `summary` on your mod print out as text in your R console window. If you use `summary` without any options you will see a number of $\ $ statements which translates to "space" in HTML. -### Pretty text version of table +## Pretty text version of table If you want a nicer version in your console window then adding the `text=TRUE` option. @@ -90,23 +78,25 @@ If you want a nicer version in your console window then adding the `text=TRUE` o summary(tab1, text=TRUE) ``` -### Pretty Rmarkdown version of table +## Pretty Rmarkdown version of table In order for the report to look nice within an R markdown (knitr) report, you just need to specify `results="asis"` when creating the r chunk. This changes the layout slightly (compresses it) and bolds -the variable names. The three single quotes are often located above the tab key. - -`r ''` ```{r results="asis"} - - summary(tab1) - -``` +the variable names. ```{r simple-markdown, results='asis'} summary(tab1) ``` -### Add an adjustor to the model +## Data frame version of table + +If you want a data.frame version, simply use `as.data.frame`. + +```{r} +as.data.frame(tab1) +``` + +## Add an adjustor to the model The argument `adjust` allows the user to indicate that all the variables should be adjusted for these terms. @@ -116,16 +106,14 @@ summary(tab2) ``` -Models for each endpoint type -================================== +# Models for each endpoint type To make sure the correct model is run you need to specify "family". The options available right now are : gaussian, binomial, survival, and poisson. If there is enough interest, additional models can be added. -Gaussian ------------ +## Gaussian -### fit and summarize linear regression model +### Fit and summarize linear regression model Look at whether there is any evidence that AlkPhos values vary by study arm after adjusting for sex and age (assuming a linear age relationship). @@ -163,7 +151,7 @@ termplot(fit3, term=2, se=T, rug=T) In this instance it looks like there isn't enough evidence to say that the relationship is non-linear. -### extract data using the `broom` package +### Extract data using the `broom` package The `broom` package makes it easy to extract information from the fit. @@ -175,7 +163,7 @@ tmp glance(fit3) ``` -### create a summary table using modelsum +### Create a summary table using modelsum ```{r, results="asis"} ms.logy <- modelsum(log(alk.phos) ~ arm + ps + hgb, data=mockstudy, adjust= ~age + sex, @@ -184,10 +172,9 @@ ms.logy <- modelsum(log(alk.phos) ~ arm + ps + hgb, data=mockstudy, adjust= ~age summary(ms.logy) ``` -Binomial ----------- +## Binomial -### fit and summarize logistic regression model +### Fit and summarize logistic regression model ```{r} boxplot(age ~ mdquality.s, data=mockstudy, ylab=attr(mockstudy$age,'label'), xlab='mdquality.s') @@ -216,7 +203,7 @@ tmp$auc ``` -### extract data using `broom` package +### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. @@ -226,7 +213,7 @@ tidy(fit, exp=T, conf.int=T) # coefficients, p-values, conf.intervals glance(fit) # model summary statistics ``` -### create a summary table using modelsum +### Create a summary table using modelsum ```{r, results="asis"} summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial)) @@ -237,10 +224,9 @@ summary(fitall) ``` -Survival ---------- +## Survival -### fit and summarize a Cox regression model +### Fit and summarize a Cox regression model ```{r survival} require(survival) @@ -271,7 +257,7 @@ summary(fit2)$concordance survConcordance(Surv(fu.time, fu.stat) ~ predict(fit2), data=mockstudy) ``` -### extract data using `broom` package +### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. @@ -281,7 +267,7 @@ tidy(fit) # coefficients, p-values glance(fit) # model summary statistics ``` -### create a summary table using modelsum +### Create a summary table using modelsum ```{r results="asis"} ##Note: You must use quotes when specifying family="survival" @@ -295,8 +281,7 @@ summary(modelsum(Surv(fu.time, fu.stat) ~ arm, ``` -Poisson --------- +## Poisson Poisson regression is useful when predicting an outcome variable representing counts. It can also be useful when looking at survival data. Cox models and Poisson models are very closely @@ -331,7 +316,7 @@ fit2 <- glm(skips ~ Opening + Solder + Mask, data=solder, family=quasipoisson) summary(fit2) ``` -### extract data using `broom` package +### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. @@ -342,7 +327,7 @@ glance(fit) # model summary statistics ``` -### create a summary table using modelsum +### Create a summary table using modelsum ```{r results='asis'} summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="quasipoisson")) @@ -373,7 +358,7 @@ fit2 <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, summary(fit2) ``` -### extract data using `broom` package +### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. @@ -384,7 +369,7 @@ glance(fit) ##model summary statistics ``` -### create a summary table using modelsum +### Create a summary table using `modelsum` Remember that the result from `modelsum` is different from the `fit` above. The `modelsum` summary shows the results for `age + offset(log(fu.time+.01))` then `sex + offset(log(fu.time+.01))` @@ -397,12 +382,12 @@ summary(modelsum(fu.stat ~ age, adjust=~offset(log(fu.time+.01))+ sex + arm, ``` -Additional Examples -==================== +# Additional Examples + Here are multiple examples showing how to use some of the different options. -###1. Change summary statistics globally +## 1. Change summary statistics globally There are standard settings for each type of model regarding what information is summarized in the table. This behavior can be modified using the modelsum.control function. In fact, you can save your standard @@ -425,7 +410,7 @@ tab3 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy, summary(tab3) ``` -###2. Add labels to independent variables +## 2. Add labels to independent variables In the above example, age is shown with a label (Age in Years), but sex is listed "as is". This is because the data was created in SAS and in the SAS dataset, age had a label but sex did not. @@ -470,20 +455,20 @@ labels(tab1) summary(tab1) ``` -###2. Don't show intercept values +## 3. Don't show intercept values ```{r, results='asis'} summary(modelsum(age~mdquality.s+sex, data=mockstudy), show.intercept=FALSE) ``` -###3. Don't show results for adjustment variables +## 4. Don't show results for adjustment variables ```{r, results='asis'} summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial), show.adjust=FALSE) ``` -###4. Summarize multiple variables without typing them out +## 5. Summarize multiple variables without typing them out Often one wants to summarize a number of variables. Instead of typing by hand each individual variable, an alternative approach is to create a formula using the `paste` command with the `collapse="+"` option. @@ -519,7 +504,7 @@ summary(modelsum(tmp, data=mockstudy, family=binomial)) ``` -###5. Subset the dataset used in the analysis +## 6. Subset the dataset used in the analysis Here are two ways to get the same result (limit the analysis to subjects age>50 and in the F: FOLFOX treatment group). @@ -548,7 +533,7 @@ summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset = age>50 & bmi<24, dat summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset=1:30, data=mockstudy)) ``` -###6. Create combinations of variables on the fly +## 7. Create combinations of variables on the fly ```{r} ## create a variable combining the levels of mdquality.s and sex @@ -559,7 +544,7 @@ with(mockstudy, table(interaction(mdquality.s,sex))) summary(modelsum(age ~ interaction(mdquality.s,sex), data=mockstudy)) ``` -###9. Transform variables on the fly +## 8. Transform variables on the fly Certain transformations need to be surrounded by `I()` so that R knows to treat it as a variable transformation and not some special model feature. If the transformation includes any of the @@ -572,7 +557,7 @@ summary(modelsum(arm=="F: FOLFOX" ~ I(age/10) + log(bmi) + mdquality.s, ``` -###10. Change the ordering of the variables or delete a variable +## 9. Change the ordering of the variables or delete a variable ```{r, results='asis'} mytab <- modelsum(bmi ~ sex + alk.phos + age, data=mockstudy) @@ -582,7 +567,7 @@ summary(mytab[c('age','sex')]) summary(mytab[c(3,1)]) ``` -###11. Merge two modelsum objects together +## 10. Merge two `modelsum` objects together It is possible to combine two modelsum objects so that they print out together, however you need to pay attention to the columns that are being displayed. It is easier to combine two models of the same @@ -602,7 +587,7 @@ class(tab12) #summary(tab12) ``` -###12. Add a title to the table +## 11. Add a title to the table When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table. @@ -612,11 +597,12 @@ t1 <- modelsum(bmi ~ sex + age, data=mockstudy) summary(t1, title='Demographics') ``` -###13. Modify how missing values are treated +## 12. Modify how missing values are treated Depending on the report you are writing you have the following options: * Use all values available for each variable + * Use only those subjects who have measurements available for all the variables ```{r} @@ -642,13 +628,16 @@ summary(modelsum(bmi ~ ast + age, data=mockstudy, control=modelsum.control(gaussian.stats=c("estimate")))) ``` -###14. Modify the number of digits used +## 13. Modify the number of digits used Within modelsum.control function there are 4 options for controlling the number of significant digits shown. * digits: controls the number of significant digits (counting both before and after the decimal point) for continuous variables + * nsmall: controls the number of digits after the decimal point for the beta and standard error + * nsmall.ratio: controls the number of digits for the ratio statistics (OR, HR, RR), default=2 + * digits.test: controls the number of digits after the decimal point for p-values (default=3) ```{r, results='asis'} @@ -670,7 +659,7 @@ format(pi*100, nsmall=4) format(pi*100, nsmall=2, digits=4) ``` -###15. Use case-weights in the models +## 14. Use case-weights in the models Occasionally it is of interest to fit models using case weights. The `modelsum` function allows you to pass on the weights to the models and it will do the appropriate fit. @@ -692,18 +681,20 @@ mockstudy$wts <- gpwts[index] tapply(mockstudy$wts,mockstudy$arm, summary) ``` -```{r results='asis', warning=FALSE} +```{r results='asis'} mockstudy$newvarA <- as.numeric(mockstudy$arm=='A: IFL') tab1 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), family=binomial) summary(tab1, title='No Case Weights used') +suppressWarnings({ tab2 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), weights=wts, family=binomial) summary(tab2, title='Case Weights used') +}) ``` -###16. Use `modelsum` within an Sweave document +## 15. Use `modelsum` within an Sweave document For those users who wish to create tables within an Sweave document, the following code seems to work. @@ -737,7 +728,7 @@ render("Test.md", pdf_document(keep_tex=TRUE)) \end{document} ``` -###17. Export `modelsum` results to a .CSV file +## 16. Export `modelsum` results to a .CSV file When looking at multiple variables it is sometimes useful to export the results to a csv file. The `as.data.frame` function creates a data frame object that can be exported or further manipulated within R. @@ -750,7 +741,7 @@ tmp # write.csv(tmp, '/my/path/here/mymodel.csv') ``` -###18. Write `modelsum` object to a separate Word or HTML file +## 17. Write `modelsum` object to a separate Word or HTML file ```{r} ## write to an HTML document @@ -760,10 +751,9 @@ tmp # write2word(tab2, "~/ibm/trash.doc", title="My table in Word") ``` -Available Function Options -================================== +# Available Function Options -### Summary statistics +## Summary statistics The available summary statistics, by varible type, are: @@ -827,7 +817,7 @@ The full description of these parameters that can be shown for models include: * `CI.lower.estimate, CI.upper.estimate`: print the confidence interval for the beta coefficient -### modelsum.control settings +## `modelsum.control` settings A quick way to see what arguments are possible to utilize in a function is to use the `args()` command. Settings involving the number of digits can be set in `modelsum.control` or in `summary.modelsum`. @@ -851,7 +841,7 @@ Settings: * poisson.stats, quasipoisson.stats -### summary.modelsum settings +## `summary.modelsum` settings The summary.modelsum function has options that modify how the table appears (such as adding a title or modifying labels). diff --git a/vignettes/tableby.Rmd b/vignettes/tableby.Rmd index 3e7d5a8..8628504 100755 --- a/vignettes/tableby.Rmd +++ b/vignettes/tableby.Rmd @@ -3,11 +3,8 @@ title: "The tableby function" author: "Beth Atkinson, Jason Sinnwell, Shannon McDonnell and Greg Dougherty" date: '`r format(Sys.time(),"%d %B, %Y")`' output: - html_document: + rmarkdown::html_vignette: toc: yes - toc_depth: '3' - pdf_document: - toc: true toc_depth: 3 vignette: | %\VignetteIndexEntry{The tableby function} @@ -15,8 +12,11 @@ vignette: | %\VignetteEngine{knitr::rmarkdown} --- -Introduction -============= +```{r echo = FALSE} +options(width = 100) +``` + +# Introduction One of the most common tables in medical literature includes summary statistics for a set of variables, often stratified by some group (e.g. treatment arm). Locally, the SAS macros `%table` and `%summary` were @@ -32,10 +32,9 @@ so the tables could be displayed within an R markdown report. This report provides step-by-step directions for using the functions associated with `tableby`. All functions presented here are available within the `arsenal` package. An assumption is made that users are somewhat familiar with R markdown documents. For those who are new to the topic, a good initial resource -is available at [rmarkdown.rstudio.com](rmarkdown.rstudio.com). +is available at [rmarkdown.rstudio.com](http://rmarkdown.rstudio.com). -Simple Example -================ +# Simple Example The first step when using the `tableby` function is to load the `arsenal` package. All the examples in this report use a dataset called `mockstudy` made available by Paul Novotny which includes a variety of types of variables (character, @@ -62,7 +61,7 @@ If you want to take a quick look at the table, you can use `summary` on your tab will print out as text in your R console window. If you use `summary` without any options you will see a number of $\ $ statements which translates to "space" in HTML. -### Pretty text version of table +## Pretty text version of table If you want a nicer version in your console window then adding the `text=TRUE` option. @@ -70,22 +69,24 @@ If you want a nicer version in your console window then adding the `text=TRUE` o summary(tab1, text=TRUE) ``` -### Pretty Rmarkdown version of table +## Pretty Rmarkdown version of table In order for the report to look nice within an R markdown (knitr) report, you just need to specify `results="asis"` when creating the r chunk. This changes the layout slightly (compresses it) and bolds the variable names. -`r ''` ```{r, results="asis"} - - summary(tab1) - -``` - ```{r, simple-markdown, results='asis'} summary(tab1) ``` -### Summaries using standard R code +## Data frame version of table + +If you want a data.frame version, simply use `as.data.frame`. + +```{r} +as.data.frame(tab1) +``` + +## Summaries using standard R code ```{r} ## base R frequency example @@ -106,10 +107,9 @@ summary(aov(age ~ arm, data=mockstudy)) ``` -Modifying Output -================ +# Modifying Output -### Add labels +## Add labels In the above example, age is shown with a label (Age in Years), but sex is listed "as is" with lower case letters. This is because the data was created in SAS and in the SAS dataset, age had a label but sex did not. The label is stored as an attribute within R. @@ -150,7 +150,7 @@ labels(tab1) summary(tab1) ``` -### Change summary statistics globally +## Change summary statistics globally Currently the default behavior is to summarize continuous variables with: Number of missing values, Mean (SD), 25th - 75th quantiles, and Minimum-Maximum values with an ANOVA (t-test with equal variances) p-value. @@ -165,7 +165,7 @@ mycontrols <- tableby.control(test=FALSE, total=FALSE, numeric.test="kwt", cat.test="chisq", numeric.stats=c("N", "median", "q1q3"), cat.stats=c("countpct"), - stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3')) + stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3')) tab2 <- tableby(arm ~ sex + age, data=mockstudy, control=mycontrols) summary(tab2) ``` @@ -179,7 +179,7 @@ summary(tab3) ``` -### Change summary statistics within the formula +## Change summary statistics within the formula In addition to modifying summary options globally, it is possible to modify the test and summary statistics for specific variables within the formula statement. For example, both the kwt (Kruskal-Wallis rank-based) and anova @@ -207,7 +207,7 @@ tab.test <- tableby(arm ~ kwt(ast, "Nmiss2","median") + anova(age, "N","mean") + summary(tab.test) ``` -### Modifying the look & feel in Word documents +## Modifying the look & feel in Word documents You can easily create Word versions of `tableby` output via an Rmarkdown report and the default options will give you a reasonable table in Word - just select the "Knit Word" option in RStudio. @@ -224,25 +224,24 @@ output: word_document For more informating on changing the look/feel of your Word document, see the [Rmarkdown documentation](http://rmarkdown.rstudio.com/word_document_format.html) website. -Additional Examples -============================ +# Additional Examples Here are multiple examples showing how to use some of the different options. -###1. Summarize without a group/by variable +## 1. Summarize without a group/by variable ```{r, nobyvar, results='asis'} tab.noby <- tableby(~ bmi + sex + age, data=mockstudy) summary(tab.noby) ``` -###2. Display footnotes indicating which "test" was used +## 2. Display footnotes indicating which "test" was used ```{r, results="asis"} summary(tab.test) #, pfootnote=TRUE) ``` -###3. Summarize an ordered factor +## 3. Summarize an ordered factor When comparing groups of ordered data there are a couple of options. The **default** uses a general independence test available from the `coin` package. For two-group comparisons, this is essentially the Armitage trend test. The other option is to specify the Kruskal Wallis test. @@ -260,7 +259,7 @@ summary(tableby(sex ~ age.ordnew, data = mockstudy)) #, pfootnote = TRUE) summary(tableby(sex ~ kwt(age.ord), data = mockstudy)) #) #, pfootnote = TRUE) ``` -###4. Summarize a survival variable +## 4. Summarize a survival variable First look at the information that is presented by the `survfit` function, then see how the same results can be seen with tableby. The default is to show the median survival (time at which the probability of survival = 50%). @@ -284,7 +283,7 @@ summary(survfit(Surv(fu.time/365.25, fu.stat)~sex, data=mockstudy), times=1:5) summary(tableby(sex ~ Surv(fu.time/365.25, fu.stat), data=mockstudy, times=1:5, surv.stats=c("NeventsSurv","NriskSurv"))) ``` -###5. Summarize date variables +## 5. Summarize date variables Date variables by default are summarized with the number of missing values, the median, and the range. For example purposes we've created a random date. Missing values are introduced for impossible February dates. @@ -297,7 +296,7 @@ mockstudy$dtentry <- mdy.Date(month=sample(1:12,N,replace=T), day=sample(1:29,N, summary(tableby(sex ~ dtentry, data=mockstudy)) ``` -###6. Summarize multiple variables without typing them out +## 6. Summarize multiple variables without typing them out Often one wants to summarize a number of variables. Instead of typing by hand each individual variable, an alternative approach is to create a formula using the `paste` command with the `collapse="+"` option. @@ -332,7 +331,7 @@ tmp2 <- formulize('arm',c('ps','hgb^2','bmi')) summary(tableby(tmp, data=mockstudy)) ``` -###7. Subset the dataset used in the analysis +## 7. Subset the dataset used in the analysis Here are two ways to get the same result (limit the analysis to subjects age>5 and in the F: FOLFOX treatment group). @@ -359,7 +358,7 @@ argument within `tableby` to subset the data. summary(tableby(sex ~ ps + hgb + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy)) ``` -###8. Create combinations of variables on the fly +## 8. Create combinations of variables on the fly ```{r} ## create a variable combining the levels of mdquality.s and sex @@ -375,7 +374,7 @@ summary(tableby(arm ~ interaction(mdquality.s,sex), data=mockstudy)) summary(tableby(interaction(mdquality.s, sex) ~ age + bmi, data=mockstudy, subset=arm=="F: FOLFOX")) ``` -###9. Transform variables on the fly +## 9. Transform variables on the fly Certain transformations need to be surrounded by `I()` so that R knows to treat it as a variable transformation and not some special model feature. If the transformation includes any of the symbols `/ - + ^ *` then surround the new variable by `I()`. @@ -416,7 +415,7 @@ summary(tableby(arm ~ chisq(mdquality.s, "Nmiss","countpct"), data=mockstudy)) ``` -###10. Change the ordering of the variables or delete a variable +## 10. Change the ordering of the variables or delete a variable ```{r, results='asis'} mytab <- tableby(arm ~ sex + alk.phos + age, data=mockstudy) @@ -427,7 +426,7 @@ summary(mytab[c(3,1)], nsmall = 3) ``` -###11. Merge two tableby objects together +## 11. Merge two `tableby` objects together It is possible to combine two tableby objects so that they print out together. @@ -447,7 +446,7 @@ names(tab12$x) summary(tab12) #, pfootnote=TRUE) ``` -###12. Add a title to the table +## 12. Add a title to the table When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table. @@ -456,12 +455,16 @@ t1 <- tableby(arm ~ sex + age, data=mockstudy) summary(t1, title='Demographics') ``` -###13. Modify how missing values are displayed +## 13. Modify how missing values are displayed Depending on the report you are writing you have the following options: + * Show how many subjects have each variable + * Show how many subjects are missing each variable + * Show how many subjects are missing each variable only if there are any missing values + * Don't indicate missing values at all ```{r} @@ -487,13 +490,16 @@ summary(tableby(sex ~ ast + age, data=mockstudy, control=tableby.control(numeric.stats=c("mean"),total=FALSE))) ``` -###14. Modify the number of digits used +## 14. Modify the number of digits used Within tableby.control function there are 4 options for controlling the number of significant digits shown. * digits: controls the number of significant digits (counting both before and after the decimal point) for continuous variables + * nsmall: controls the number of digits after the decimal point for continous variables + * nsmall.pct: controls the number of digits after the decimal point for percentages + * digits.test: controls the number of digits after the decimal point for p-values (default=3) ```{r, results='asis'} @@ -515,9 +521,12 @@ format(pi*100, nsmall=4) format(pi*100, nsmall=2, digits=4) ``` -###15. Create a user-defined summary statistic +## 15. Create a user-defined summary statistic -For purposes of this example, the code below creates a trimmed mean function (trims 10%) and use that to summarize the data. Note the use of the `...` which tells R to pass extra arguments on - this is required for user-defined functions. In this case, `na.rm=T` is passed to `myfunc`. The *weights* argument is also required, even though it isn't passed on to the internal function in this particular example. +For purposes of this example, the code below creates a trimmed mean function (trims 10%) +and use that to summarize the data. Note the use of the `...` which tells R to pass extra arguments on - this is required +for user-defined functions. In this case, `na.rm=T` is passed to `myfunc`. The *weights* argument is also required, even though +it isn't passed on to the internal function in this particular example. ```{r, results='asis'} myfunc <- function(x, weights=rep(1,length(x)), ...){ @@ -530,7 +539,7 @@ summary(tableby(sex ~ hgb, data=mockstudy, ``` -###16. Use case-weights for creating summary statistics +## 16. Use case-weights for creating summary statistics When comparing groups, they are often unbalanced when it comes to nuisances such as age and sex. The `tableby` function allows you to create weighted summary statistics. If this option us used then p-values are not calculated (`test=FALSE`). @@ -566,7 +575,7 @@ summary(tab1, title='Case Weights used') ``` -###17. Create your own p-value and add it to the table +## 17. Create your own p-value and add it to the table When using weighted summary statistics, it is often desirable to then show a p-value from a model that corresponds to the weighted analysis. It is possible to add your own p-value and modify the column title for that new p-value. Another use for this would be to add standardized @@ -586,7 +595,7 @@ tab2 <- modpval.tableby(tab1, mypval, use.pname=TRUE) summary(tab2, title='Case Weights used, p-values added') #, pfootnote=TRUE) ``` -###18. For two-level categorical variables, only display one level. +## 18. For two-level categorical variables, only display one level. If the `cat.simplify` option is set to TRUE then only the second level of the group. In the example below sex has the levels and "Female" is the second level, hence only the %female is shown in the table. Similarly, "mdquality.s" @@ -598,7 +607,7 @@ table2 <- tableby(arm~sex + factor(mdquality.s), data=mockstudy, cat.simplify=TR summary(table2, labelTranslations=c(sex="Female", "factor(mdquality.s)"="MD Quality")) ``` -###19. Use `tableby` within an Sweave document +## 19. Use `tableby` within an Sweave document For those users who wish to create tables within an Sweave document, the following code seems to work. @@ -633,7 +642,7 @@ render("Test.md", pdf_document(keep_tex=TRUE)) \end{document} ``` -###20. Export `tableby` object to a .CSV file +## 20. Export `tableby` object to a .CSV file When looking at multiple variables it is sometimes useful to export the results to a csv file. The `as.data.frame` function creates a data frame object that can be exported or further manipulated within R. @@ -647,22 +656,21 @@ tmp # write.csv(tmp, '/my/path/here/mymodel.csv') ``` -###21. Write `tableby` object to a separate Word or HTML file +## 21. Write `tableby` object to a separate Word or HTML file -```{r} +```{r eval = FALSE} ## write to an HTML document tab1 <- tableby(arm ~ sex + age, data=mockstudy) -# write2html(tab1, "~/ibm/trash.html") +write2html(tab1, "~/trash.html") ## write to a Word document -# write2word(tab1, "~/ibm/trash.doc", title="My table in Word") +write2word(tab1, "~/trash.doc", title="My table in Word") ``` -Available Function Options -================================== +# Available Function Options -### Summary statistics +## Summary statistics The **default** summary statistics, by varible type, are: @@ -687,7 +695,7 @@ however there are a number of extra functions defined specifically for the table * `Nevents`: print number of events for a survival object within each grouping level * `medsurv`: print the median survival -### Testing options +## Testing options The tests used to calculate p-values differ by the variable type, but can be specified explicitly in the formula statement or in the control function. @@ -712,7 +720,7 @@ The following tests are accepted: * `trend`: The `independence_test` function from the `coin` is used to test for trends. Whenthe grouping variable has two levels, it is equivalent to the Armitage trend test. This is the default for ordered factors -### tableby.control settings +## `tableby.control` settings A quick way to see what arguments are possible to utilize in a function is to use the `args()` command. Settings involving the number of digits can be set in `tableby.control` or in `summary.tableby`. @@ -736,9 +744,9 @@ Settings: * ordered.test, ordered.stats * surv.test, surv.stats -### summary.tableby settings +## `summary.tableby` settings -The summary.tableby function has options that modify how the table appears (such as adding a title or modifying labels). +The `summary.tableby` function has options that modify how the table appears (such as adding a title or modifying labels). ```{r} args(arsenal:::summary.tableby) diff --git a/vignettes/write2.Rmd b/vignettes/write2.Rmd new file mode 100644 index 0000000..d652713 --- /dev/null +++ b/vignettes/write2.Rmd @@ -0,0 +1,177 @@ +--- +title: "The write2 function" +author: "Ethan Heinzen" +date: '`r format(Sys.time(),"%d %B, %Y")`' +output: + rmarkdown::html_vignette: + toc: yes + toc_depth: 3 +vignette: | + %\VignetteIndexEntry{The write2 function} + %\VignetteEncoding{UTF-8} + %\VignetteEngine{knitr::rmarkdown} +--- + +```{r include = FALSE} +knitr::opts_chunk$set(eval = FALSE) +``` + +# Introduction + +The `write2*()` functions were designed as an alternative to SAS's `ODS` procedure for useRs who want to save R Markdown tables +to separate Word, HTML, or PDF files without needing separate R Markdown programs. + +There are three shortcut functions for the most common output types: HTML, PDF, and Word. Each of these three functions calls +`write2()`, an S3 function which accepts many file output types (see the help pages for `rmarkdown::render()`). Methods have been implemented for +`tableby()`, `modelsum()`, and `freqlist()`, but also `knitr::kable()`, `xtable::xtable()`, and `pander::pander_return()`. + +The two most important things to recognize with `write2()` are the following: + +1. Which function is being used to output the object. Sometimes the `write2` functions use `summary()`, while other times they will use `print()`. + The details for each object specifically are described below. + +2. How the `...` arguments are passed. To change the options for the summary-like or print-like function, you can pass named arguments which will + in turn get passed to the appropriate function. Details for each object specifically are described below. + +# Examples Using `arsenal` Objects + +```{r} +library(arsenal) +data(mockstudy) +tmpdir <- tempdir() +``` + +## `tableby` + +For `tableby` objects, the output function in `write2()` is `summary()`. For available arguments, see the help pages for `summary.tableby()`. +Don't use the option `text = TRUE` with the `write2` functions. + +```{r} +mylabels <- list(sex = "SEX", age ="Age, yrs") +tab1 <- tableby(arm ~ sex + age, data=mockstudy) + +write2html(tab1, paste0(tmpdir, "/test.tableby.html"), quiet = TRUE, + title = "My test table", # passed to summary.tableby + labelTranslations = mylabels, # passed to summary.tableby + total = FALSE # passed to summary.tableby + ) +``` + +## `modelsum` + +For `modelsum` objects, the output function in `write2()` is `summary()`. For available arguments, see the help pages for `summary.modelsum()`. +Don't use the option `text = TRUE` with the `write2` functions. + +```{r} +tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~ age + sex, family = "gaussian", data = mockstudy) + +write2pdf(tab2, paste0(tmpdir, "/test.modelsum.pdf"), quiet = TRUE, + title = "My test table", # passed to summary.modelsum + show.intercept = FALSE, # passed to summary.modelsum + digits = 5 # passed to summary.modelsum + ) +``` + +## `freqlist` + +For `freqlist` objects, the output function in `write2()` is `summary()`. For available arguments, see the help pages for `summary.freqlist()`. + +```{r} +tab3 <- freqlist(table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany"), groupBy = c("arm", "sex")) + +write2word(tab3, paste0(tmpdir, "/test.freqlist.doc"), quiet = TRUE, + single = FALSE # passed to summary.freqlist + ) +``` + +# Examples Using Other Objects + +## `knitr::kable()` + +For objects resulting from a call to `kable()`, the output function in `write2()` is `print()`. There aren't any arguments to the `print.knitr_kable()` function. + +```{r} +write2html(knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.html"), quiet = TRUE) +``` + +## `xtable::xtable()` + +For `xtable` objects, the output function in `write2()` is `print()`. For available arguments, see the help pages for `print.xtable()`. + +```{r} +write2pdf(xtable::xtable(head(mockstudy), caption = "My xtable"), paste0(tmpdir, "/test.xtable.pdf"), quiet = TRUE, + comment = FALSE, # passed to print.xtable to turn off the default message about xtable version + include.rownames = FALSE, # passed to print.xtable + caption.placement = "top" # passed to print.xtable + ) +``` + +To make an HTML document, use the `print.xtable()` option `type = "html"`. + +```{r} +write2html(xtable::xtable(head(mockstudy), caption = "My xtable"), paste0(tmpdir, "/test.xtable.html"), quiet = TRUE, + type = "html", # passed to print.xtable + comment = FALSE, # passed to print.xtable to turn off the default message about xtable version + include.rownames = FALSE, # passed to print.xtable + caption.placement = "top" # passed to print.xtable + ) +``` + +User beware! `xtable()` is not compatible with `write2word()`. + +## `pander::pander_return()` + +Pander is a little bit more tricky. Since `pander::pander()` doesn't return an object, the useR should instead use +`pander::pander_return()`. For this (and for all character vectors), the the output function in `write2()` is `cat(sep = '\n')`. + +```{r} +write2word(pander::pander_return(head(mockstudy)), file = paste0(tmpdir, "/test.pander.doc"), quiet = TRUE) +``` + +# FAQs + +## How do I suppress the note about my document getting rendered? + +This is easily accomplished by using the argument `quiet = TRUE` (passed to the `rmarkdown::render()` function). + +```{r} +write2html(knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.quiet.html"), + quiet = TRUE # passed to rmarkdown::render + ) +``` + +## How do I look at the temporary `.md` file? + +This is easily accomplished by using the option `keep.md = TRUE`. + +```{r} +write2html(knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.keep.md.html"), + quiet = TRUE, # passed to rmarkdown::render + keep.md = TRUE + ) +``` + +## How do I tweak the default format from `write2word()`, `write2html()`, or `write2pdf()`? + +You can pass arguments to the format functions used behind the scenes. + +```{r} +write2html(knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.theme.html"), + quiet = TRUE, # passed to rmarkdown::render + theme = "yeti" # passed to rmarkdown::html_document + ) +``` + +See the help pages for `rmarkdown::word_document()`, `rmarkdown::html_document()`, and `rmarkdown::pdf_document()`. + +## How do I output to a file format other than word, HTML, and PDF? + +This can be done using the generic `write2()` function. The last argument in the function can be another format specification. +For details on the acceptable inputs, see the help page for `write2()`. + +```{r} +write2(knitr::kable(head(mockstudy[, 1:4])), paste0(tmpdir, "/test.kable.rtf"), + quiet = TRUE, # passed to rmarkdown::render + output_format = rmarkdown::rtf_document + ) +```