Skip to content

Commit

Permalink
version 0.2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
George Ostrouchov authored and cran-robot committed Oct 14, 2023
1 parent 4da62b5 commit ec9cc3a
Show file tree
Hide file tree
Showing 43 changed files with 3,622 additions and 1,041 deletions.
14 changes: 8 additions & 6 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: clustra
Version: 0.1.6
Date: 2022-01-15
Version: 0.2.0
Date: 2023-09-30
Title: Clustering Longitudinal Trajectories
Authors@R: c(person("George", "Ostrouchov", role = c("aut", "cre"), email =
"ostrouchovg@ornl.gov"),
Expand All @@ -15,7 +15,7 @@ Authors@R: c(person("George", "Ostrouchov", role = c("aut", "cre"), email =
Depends: R (>= 3.5.0)
Imports: data.table, graphics, grDevices, methods, mgcv, MixSim,
parallel, stats
Suggests: ggplot2, knitr, rmarkdown
Suggests: haven, knitr, rmarkdown
Description: Clusters longitudinal trajectories over time (can be unequally
spaced, unequal length time series and/or partially overlapping series) on
a common time axis. Performs k-means clustering on a single continuous
Expand All @@ -29,10 +29,12 @@ LazyLoad: yes
License: BSD 2-clause License + file LICENSE
Encoding: UTF-8
Maintainer: George Ostrouchov <ostrouchovg@ornl.gov>
RoxygenNote: 7.1.2
RoxygenNote: 7.2.3
VignetteBuilder: knitr
LazyData: true
LazyDataCompression: xz
NeedsCompilation: no
Packaged: 2022-01-16 06:17:17 UTC; ost
Packaged: 2023-10-14 02:36:54 UTC; ost
Author: George Ostrouchov [aut, cre],
David Gagnon [aut],
Hanna Gerlovin [aut],
Expand All @@ -42,4 +44,4 @@ Author: George Ostrouchov [aut, cre],
U.S. Department of Veteran's Affairs [fnd] (Project: Million Veteran
Program Data Core)
Repository: CRAN
Date/Publication: 2022-01-16 06:42:41 UTC
Date/Publication: 2023-10-14 04:50:02 UTC
70 changes: 41 additions & 29 deletions MD5
Original file line number Diff line number Diff line change
@@ -1,33 +1,45 @@
ae121f6809579bf05fda0c8c6b771de2 *DESCRIPTION
bf6bb4c2da96b4aef130bbaa08961d2c *DESCRIPTION
d77b82eba56279e0051c4ed0aa2d3d14 *LICENSE
f07e83dd5ecbba2409157731fb119f18 *NAMESPACE
948199ff0d18147f65ade7d783792867 *NEWS.md
1fbafabaa8a7849792b161b5f0252624 *R/clustra-package.R
12792f1999d8d48b88abdad420cb30d1 *R/deltime.R
ad486d951b8311f77df2bf3067ad186c *R/evaluate.R
3534dc49b1e832c9ed4a506849b8387d *R/generate.R
44128efe7b6205859cd88cd0cf7cab30 *R/trajectories.R
f892964351d35b44e315b7e485409138 *README.md
a2a47c4ee05a09fac1d105406f210c03 *build/vignette.rds
a79ae835c09a1f148886243d109e19cc *inst/doc/clustra_vignette.R
d3f1d4f9581adb344d7377abf0d8cc12 *inst/doc/clustra_vignette.Rmd
dc47946aecb755214952c762f7bb2a7e *inst/doc/clustra_vignette.html
d78e251d946302a2e73d1a9a82d437b2 *man/allpair_RandIndex.Rd
a67570371de37a3c8f08593f14d583ce *NAMESPACE
30162adb1ed353c1d203859cc523d23c *NEWS.md
485f56b7aab5249415ff87d4c11785d8 *R/clustra-package.R
05ab1d058667c2f85eac5de8a9258c94 *R/data.R
6e76e9734a495c2a70cb1601aee44f88 *R/evaluate.R
b72e454bd78fbd237a2ac2ad28b7b100 *R/generate.R
cfa3f1aae5bfe9214be83d4154e48354 *R/graphics.R
843c1f1e0f9eb5706d4b4dcc54a44faa *R/trajectories.R
1094de6c85a3e5ba4926087ebfec355c *README.md
b2318daec94d5cbeee1a0b3668469cfe *build/vignette.rds
c73a08a398ce2fb7f459a9af27f7dc07 *data/bp10k.rda
49182df60f0f80bbdfa6bc7ff0589ae3 *inst/doc/clustra_bp_vignette.R
6103e7b56c9560c3dfa3b307b6a87854 *inst/doc/clustra_bp_vignette.Rmd
784628d47f98150562f38319061bcf23 *inst/doc/clustra_bp_vignette.html
ba3c6507e9b05a0f9ebef0bee0a0b380 *inst/doc/clustra_vignette.R
32bebaa7b43e992247be9ac095896fa1 *inst/doc/clustra_vignette.Rmd
2436255775a196fde9eb9cbbf9a7801f *inst/doc/clustra_vignette.html
9a621a0814463bdcd4e84273d306887b *man/allpair_RandIndex.Rd
e41cf6db975649f7790ca4d52f5e7d53 *man/bp10k.Rd
11bddacb11edeea0ff4abefa8b87eb78 *man/check_df.Rd
db5cb195b00920258ca6a8728934da9d *man/clustra-package.Rd
63ddbad49eb963063df940092b813a8e *man/clustra.Rd
f040a52147eb8e8739466bb7e815ae29 *man/clustra_rand.Rd
f41032dde7adc59b0bf5758a64b8c848 *man/clustra_sil.Rd
e53f041da70c153a81d22241522dc212 *man/deltime.Rd
87aa255659bca54218a4bd5eb3cef0e6 *man/gen_traj_data.Rd
8c191ed6fae6f480296c8476d0b4aa49 *man/gendata.Rd
f8435bedba26420d29e656d945ca270d *man/mse_g.Rd
abf510cc017f615edf09a99a05723011 *man/oneid.Rd
c85df281c34b43faca311dcb588ea8d9 *man/clustra-package.Rd
4d73c629f7b45c0463b7d47d87def60f *man/clustra.Rd
0954142e2cfe56f0917c00e7c0ed08d4 *man/clustra_rand.Rd
5fecd414147a086986b5af10c3eb56e8 *man/clustra_sil.Rd
14e08f58acd7f4f6a56311ca9c77f8c8 *man/deltime.Rd
f925a1187aa090c6007b8ea1328efbbc *man/gen_traj_data.Rd
a018bd3229b45ac6d5168beb22b57275 *man/gendata.Rd
5c00608c226ef6459cb151c6f6a6f697 *man/ic_fun.Rd
0b4220c7dd63ddfde815583fcefbf0a3 *man/kchoose.Rd
6c51bf8e160fcac3b51019271056d9bb *man/mse_g.Rd
913b11585b088045bfb87b7af658f4f2 *man/oneid.Rd
48b0c57ef49281a57b54981dab41839b *man/plot_sample.Rd
0bad7f54613a3456ba96a6856ca1fa08 *man/plot_silhouette.Rd
37ce3f143ec9a4b7264eecf32546689f *man/plot_smooths.Rd
b66ba2ec5b253d50d3917b897f1a3666 *man/pred_g.Rd
6d9ea4624be633b222d40a3232fd372d *man/rand_plot.Rd
a1e7dee9687951730fed122a853581e7 *man/start_groups.Rd
a92b8f68410dbaaa8c6d748d11e55a21 *man/tps_g.Rd
b95504dbbbd1027877deacebe23f7a24 *man/traj_rep.Rd
6d5f50137a1a76dbe65cafcaced178c8 *man/trajectories.Rd
e0f7e1b3cb8155538b837ff0426f327f *man/rand_plot.Rd
4c9ca622008540e6f08be6b46ec8b69b *man/start_groups.Rd
a630b92c92702f317adfc8ae8786232a *man/tps_g.Rd
8fa349672b5650ad6c853756de3e115a *man/traj_rep.Rd
9415e9e55b7c46daef2e893a65e1842a *man/trajectories.Rd
e06416343fabc98d9e050a0c6889edbf *man/xit_report.Rd
d3f1d4f9581adb344d7377abf0d8cc12 *vignettes/clustra_vignette.Rmd
6103e7b56c9560c3dfa3b307b6a87854 *vignettes/clustra_bp_vignette.Rmd
32bebaa7b43e992247be9ac095896fa1 *vignettes/clustra_vignette.Rmd
15 changes: 14 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,35 @@ export(clustra)
export(clustra_rand)
export(clustra_sil)
export(deltime)
export(deltimeT)
export(gen_traj_data)
export(plot_sample)
export(plot_silhouette)
export(plot_smooths)
export(rand_plot)
export(start_groups)
export(trajectories)
importFrom(data.table,":=")
importFrom(grDevices,colorRampPalette)
importFrom(grDevices,dev.off)
importFrom(grDevices,pdf)
importFrom(graphics,abline)
importFrom(graphics,axis)
importFrom(graphics,barplot)
importFrom(graphics,box)
importFrom(graphics,close.screen)
importFrom(graphics,image)
importFrom(graphics,layout)
importFrom(graphics,legend)
importFrom(graphics,lines)
importFrom(graphics,mtext)
importFrom(graphics,par)
importFrom(graphics,points)
importFrom(graphics,screen)
importFrom(graphics,split.screen)
importFrom(graphics,text)
importFrom(methods,is)
importFrom(stats,dist)
importFrom(stats,median)
importFrom(stats,predict)
importFrom(stats,rnorm)
importFrom(stats,rpois)
Expand Down
13 changes: 13 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
# clustra v0.2.0
* Added a 10,000 id data set `bp10k`
* Added a second vignette to reproduce associated paper graphics and more
* Reduced vignette dependence to only base graphics
* New function `plot_sample` built on base graphics
* New function `plot_smooths` built on base graphics
* New function `plot_silhouette` built on base graphics
* Expanded capability of `gen_traj_data` beyond 3 clusters with `type` and `intercept` parameters
* Modified parameters in `clustra`
* Added `starts = "distant"` option in function `start_groups`
* Added parameter `starts` in `clustra_sil` and `clustra_rand`
* New internal functions `kchoose` and `ic_fun` (not exported)

# clustra v0.1.6
* Original ids are now included in `clustra` output
* Added the ability to reuse previous clustra runs by `clustra_sil` for silhouette plots
Expand Down
28 changes: 21 additions & 7 deletions R/clustra-package.R
Original file line number Diff line number Diff line change
@@ -1,18 +1,32 @@
#' clustra-package
#'
#' Clusters medical trajectories (unequally spaced and unequal lengths) aligned
#' by an intervention time. Performs k-means clustering, where each mean is a
#' thin plate spline fit to all points in a cluster. Distance is MSE across
#' trajectory points to cluster spline. Provides silhouette plots and Adjusted
#' Rand Index evaluations of the number of clusters. Scales well to large data
#' with multicore parallelism available to speed computation.
#' Clusters trajectories (unequally spaced and unequal length time series) on
#' a common time axis. Clustering proceeds by an EM algorithm that iterates
#' switching between fitting a thin plate spline (TPS) to combined responses
#' within each cluster (M-step) and reassigning cluster membership based on the
#' nearest fitted TPS (E-step). Initial cluster assignments are random or
#' distant trajectories. The fitting is done with the *mgcv* package function
#' *bam*, which scales well to very large data sets. Additional parallelism
#' available via multicore on unix and mac platforms.
#'
#' @name clustra-package
#' @docType package
#' @author George Ostrouchov, David Gagnon, Hanna Gerlovin
#' @keywords Package
#' @details
#' This research is based on data from the Million Veteran Program, Office of
#' Research and Development, Veterans Health Administration, and was supported
#' by award No.~MVP000. This research used resources from the Knowledge
#' Discovery Infrastructure (KDI) at Oak Ridge National Laboratory, which is
#' supported by the Office of Science of the US Department of Energy under
#' Contract No. DE-AC05-00OR22725.
#'
#' This research used resources of the Compute and Data Environment
#' for Science (CADES) at the Oak Ridge National Laboratory, which is supported
#' by the Office of Science of the U.S. Department of Energy under Contract No.
#' DE-AC05-00OR22725.
#'
#' # Import package operators
#' @importFrom data.table ":="
#'
NULL

26 changes: 26 additions & 0 deletions R/data.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#' Simulated blood pressure data
#'
#' A sample of 10,000 individuals from the full 80,000 individuals in a dataset
#' available on GitHub at
#' https://github.com/MVP-CHAMPION/clustra-SAS/raw/main/bp_data/simulated_data_27June2023.csv.gz
#'
#' The full data set contains 80,000 individuals, each with an average of about
#' 17 observations in 5 clusters with scatter. The individuals are generated
#' from a 5-cluster thin spline model of actual blood pressures collected from
#' roughly the same number of individuals at U.S. Department of Veterans Affairs
#' facilities in connection with the MVP-CHAMPION project. Each cluster-mean
#' generated individual has a random number of observations at random times with
#' one observation at intervention time 0, and with added standard normal error.
#' The resulting data has 1,353,910 rows and 4 columns.
#'
#' @format ## `bp10k`
#' A "data.table" and "data.frame" with 167,277 rows and 4 columns:
#' \describe{
#' \item{id}{An integer in 1:80000.}
#' \item{group}{An integer in 1:5.}
#' \item{time}{An integer between -365 and 730, giving observation day with
#' reference to an intervention at time 0.}
#' \item{response}{The systolic blood pressure on that day.}
#' }
#'
"bp10k"
17 changes: 0 additions & 17 deletions R/deltime.R

This file was deleted.

0 comments on commit ec9cc3a

Please sign in to comment.