Skip to content

Commit

Permalink
version 0.1.6
Browse files Browse the repository at this point in the history
  • Loading branch information
George Ostrouchov authored and cran-robot committed Jan 16, 2022
1 parent d7cb739 commit 4da62b5
Show file tree
Hide file tree
Showing 18 changed files with 325 additions and 222 deletions.
22 changes: 12 additions & 10 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: clustra
Version: 0.1.5
Date: 2021-11-19
Title: Clustering Trajectories Anchored at Intervention Time
Version: 0.1.6
Date: 2022-01-15
Title: Clustering Longitudinal Trajectories
Authors@R: c(person("George", "Ostrouchov", role = c("aut", "cre"), email =
"ostrouchovg@ornl.gov"),
person("David", "Gagnon", role = "aut"),
Expand All @@ -16,11 +16,13 @@ Depends: R (>= 3.5.0)
Imports: data.table, graphics, grDevices, methods, mgcv, MixSim,
parallel, stats
Suggests: ggplot2, knitr, rmarkdown
Description: Clusters medical trajectories (unequally spaced and unequal
length time series) aligned by an intervention time. Performs k-means
clustering, where each mean is a thin plate spline fit to all points in
a cluster. Distance is MSE across trajectory points to cluster spline.
Provides silhouette plots and Adjusted Rand Index evaluations of the number
Description: Clusters longitudinal trajectories over time (can be unequally
spaced, unequal length time series and/or partially overlapping series) on
a common time axis. Performs k-means clustering on a single continuous
variable measured over time, where each mean is defined by a thin plate
spline fit to all points in a cluster. Distance is MSE across trajectory
points to cluster spline. Provides graphs of derived cluster splines,
silhouette plots, and Adjusted Rand Index evaluations of the number
of clusters. Scales well to large data with multicore parallelism available
to speed computation.
LazyLoad: yes
Expand All @@ -30,7 +32,7 @@ Maintainer: George Ostrouchov <ostrouchovg@ornl.gov>
RoxygenNote: 7.1.2
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2021-11-22 00:34:34 UTC; ost
Packaged: 2022-01-16 06:17:17 UTC; ost
Author: George Ostrouchov [aut, cre],
David Gagnon [aut],
Hanna Gerlovin [aut],
Expand All @@ -40,4 +42,4 @@ Author: George Ostrouchov [aut, cre],
U.S. Department of Veteran's Affairs [fnd] (Project: Million Veteran
Program Data Core)
Repository: CRAN
Date/Publication: 2021-11-22 09:00:08 UTC
Date/Publication: 2022-01-16 06:42:41 UTC
33 changes: 17 additions & 16 deletions MD5
Original file line number Diff line number Diff line change
@@ -1,32 +1,33 @@
2b90c982f490ea6861928a4194e5d291 *DESCRIPTION
ae121f6809579bf05fda0c8c6b771de2 *DESCRIPTION
d77b82eba56279e0051c4ed0aa2d3d14 *LICENSE
f07e83dd5ecbba2409157731fb119f18 *NAMESPACE
948199ff0d18147f65ade7d783792867 *NEWS.md
1fbafabaa8a7849792b161b5f0252624 *R/clustra-package.R
12792f1999d8d48b88abdad420cb30d1 *R/deltime.R
d8e4525f577b00585133e851ddf49172 *R/evaluate.R
ad486d951b8311f77df2bf3067ad186c *R/evaluate.R
3534dc49b1e832c9ed4a506849b8387d *R/generate.R
7b145095ce61eebf3b441e3a19cf1fa3 *R/trajectories.R
fab2a33402c4f4d7c1ae27e46915d0a1 *README.md
3f83cf54103f0c01d6bf04e52e7ba77e *build/vignette.rds
695dd6b5a2122afb77e1fb93f15f220b *inst/doc/clustra_vignette.R
e3c91511b828afea2e7e59e5925c9776 *inst/doc/clustra_vignette.Rmd
08339c61af474a0ddfc9fe706b7f86d3 *inst/doc/clustra_vignette.html
44128efe7b6205859cd88cd0cf7cab30 *R/trajectories.R
f892964351d35b44e315b7e485409138 *README.md
a2a47c4ee05a09fac1d105406f210c03 *build/vignette.rds
a79ae835c09a1f148886243d109e19cc *inst/doc/clustra_vignette.R
d3f1d4f9581adb344d7377abf0d8cc12 *inst/doc/clustra_vignette.Rmd
dc47946aecb755214952c762f7bb2a7e *inst/doc/clustra_vignette.html
d78e251d946302a2e73d1a9a82d437b2 *man/allpair_RandIndex.Rd
11bddacb11edeea0ff4abefa8b87eb78 *man/check_df.Rd
db5cb195b00920258ca6a8728934da9d *man/clustra-package.Rd
d9de1bb033e5170c74caf3b5c8a71dd9 *man/clustra.Rd
4f086a1ed0c8bef3a17591080e3852a7 *man/clustra_rand.Rd
6b9bb428cf225a41c822b7b0659f0861 *man/clustra_sil.Rd
63ddbad49eb963063df940092b813a8e *man/clustra.Rd
f040a52147eb8e8739466bb7e815ae29 *man/clustra_rand.Rd
f41032dde7adc59b0bf5758a64b8c848 *man/clustra_sil.Rd
e53f041da70c153a81d22241522dc212 *man/deltime.Rd
87aa255659bca54218a4bd5eb3cef0e6 *man/gen_traj_data.Rd
8c191ed6fae6f480296c8476d0b4aa49 *man/gendata.Rd
f8435bedba26420d29e656d945ca270d *man/mse_g.Rd
abf510cc017f615edf09a99a05723011 *man/oneid.Rd
b66ba2ec5b253d50d3917b897f1a3666 *man/pred_g.Rd
6d9ea4624be633b222d40a3232fd372d *man/rand_plot.Rd
4670c6d1df26bd8446cad644c3bfc35e *man/start_groups.Rd
a1e7dee9687951730fed122a853581e7 *man/start_groups.Rd
a92b8f68410dbaaa8c6d748d11e55a21 *man/tps_g.Rd
eb0335f05f367751b08081a60ec30440 *man/traj_rep.Rd
b1a799b63d18b312a3ce87aca4be4e66 *man/trajectories.Rd
31c7dd60050ff23e5746382316f2a145 *man/xit_report.Rd
e3c91511b828afea2e7e59e5925c9776 *vignettes/clustra_vignette.Rmd
b95504dbbbd1027877deacebe23f7a24 *man/traj_rep.Rd
6d5f50137a1a76dbe65cafcaced178c8 *man/trajectories.Rd
e06416343fabc98d9e050a0c6889edbf *man/xit_report.Rd
d3f1d4f9581adb344d7377abf0d8cc12 *vignettes/clustra_vignette.Rmd
7 changes: 7 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# clustra v0.1.6
* Original ids are now included in `clustra` output
* Added the ability to reuse previous clustra runs by `clustra_sil` for silhouette plots
* `iter` parameter changed to `conv` = c(iter, minchange)

# clustra v0.1.5
Initial release on CRAN
80 changes: 59 additions & 21 deletions R/evaluate.R
Original file line number Diff line number Diff line change
Expand Up @@ -146,50 +146,90 @@ rand_plot = function(rand_pairs, name = NULL) {
invisible(name)
}

#' clustra_sil:
#' Performs \code{\link{clustra}} runs for several k and makes silhouette plots.
#' Computes a proxy silhouette index based on distances to cluster
#' clustra_sil: Prepare silhouette plot data for several k or for a previous
#' clustra run
#'
#' Performs \code{\link{clustra}} runs for several k and prepares silhouette
#' plot data. Computes a proxy silhouette index based on distances to cluster
#' centers rather than trajectory pairs. The cost is essentially that of
#' running clustra for several k as this information is
#' available directly from clustra.
#' running clustra for several k as this information is available directly from
#' clustra. Can also reuse a previous clustra run and produce data for a single
#' silhouette plot.
#'
#' @param data
#' The data (see \code{\link{clustra}} description).
#' Either a data.frame (`data` parameter of \code{\link{trajectories}})
#' or the output from a `clustra` run. See Details.
#' @param k
#' Vector of k values to try.
#' Vector of k values to try. If output from `clustra` is the `data` parameter,
#' `k` can be left NULL or set to the number of clusters used.
#' @param mccores
#' See \code{\link{trajectories}}.
#' @param maxdf
#' Fitting parameters. See \code{\link{trajectories}}.
#' @param iter
#' @param conv
#' Fitting parameters. See \code{\link{trajectories}}.
#' @param save
#' Logical. When TRUE, save all results as file `clustra_sil.Rdata`.
#' @param verbose
#' Logical. When TRUE, information about each run of clustra is printed.
#'
#' @details
#' When given the raw data as the first parameter (input `data` parameter of
#' \code{\link{trajectories}}), `k` can also specify a vector of cluster numbers
#' to run `clustra` and then produce silhouette plots for each of them.
#' Alternatively, the input can be the output from a `clustra` run, in which
#' case data for a single silhouette plot will be made without rerunning
#' `clustra`.
#'
#' @return Invisibly returns a list of length `length(k)`, where each element is
#' a matrix with `nrow(data)` rows and three columns `cluster`, `neighbor`,
#' `silhouette`. This list of matrices can be used to draw a silhouette plot.
#'
#' @export
clustra_sil = function(data, k, mccores, maxdf = 30, iter = 10,
clustra_sil = function(data, k = NULL, mccores = 1, maxdf = 30, conv = c(10, 0),
save = FALSE, verbose = FALSE) {

sil = function(x) {
ord = order(x)
ck = ord[1]
nk = ord[2]
s = (x[nk] - x[ck])/max(x[ck], x[nk])
c(ck, nk, s)
}

## verify data and k agreement
if(is.data.frame(data) && is.null(k)) {
cat("clustra_sil: error: must specify k \n")
return(NULL)
} else if(is.matrix(data$loss)) {
if(is.null(k)) {
k = ncol(data$loss)
} else if(k != ncol(data$loss)) {
cat("clustra: misspecified k")
return()
}
} else if(is.data.frame(data)) {
if(length(k) < 1) {
cat("clustra: misspecified k")
return(NULL)
}
} else {
cat("clustra_sil: error: Expecting data.frame or output from clustra.\n")
return(NULL)
}

results = vector("list", length(k))

for(j in 1:length(k)) {
kj = k[j]
a_0 = deltime()

f = clustra(data, kj, mccores = mccores, maxdf = maxdf, iter = iter,
if(is.data.frame(data)) {
f = clustra(data, kj, mccores = mccores, maxdf = maxdf, conv = conv,
verbose = verbose)
} else {
f = list(loss = data$loss)
}

## prepare data for silhouette plot
smat = as.data.frame(t(apply(f$loss, 1, sil)))
Expand All @@ -199,8 +239,6 @@ clustra_sil = function(data, k, mccores, maxdf = 30, iter = 10,
smat$cluster = as.factor(smat$cluster)
results[[j]] = smat

if(verbose) cat("\nSil:", kj, "Dev:", f$deviance, "Err:", f$try_errors, "LCh:",
f$changes, "\n")
rm(f, smat)
gc()
}
Expand All @@ -221,16 +259,16 @@ clustra_sil = function(data, k, mccores, maxdf = 30, iter = 10,
#' Integer number of clusters.
#' @param maxdf
#' Fitting parameters. See \code{\link{trajectories}}.
#' @param iter
#' @param conv
#' Fitting parameters. See \code{\link{trajectories}}.
#'
#' @return
#' See return of {\code{\link{trajectories}}}.
#'
traj_rep = function(group, data, k, maxdf, iter) {
traj_rep = function(group, data, k, maxdf, conv) {
id = ..group = NULL
data[, group:=..group[id]] # expand group to all data
trajectories(data, k, group, maxdf, iter, 1, verbose = FALSE)
trajectories(data, k, group, maxdf, conv, 1, verbose = FALSE)
}

#' clustra_rand: Rand Index cluster evaluation
Expand All @@ -251,7 +289,7 @@ traj_rep = function(group, data, k, maxdf, iter) {
#' Number of replicates for each k.
#' @param maxdf
#' Fitting parameters. See \code{link{trajectories}}.
#' @param iter
#' @param conv
#' Fitting parameters. See \code{link{trajectories}}.
#' @param save
#' Logical. When TRUE, save all results as file \code{results.Rdata}.
Expand All @@ -262,7 +300,7 @@ traj_rep = function(group, data, k, maxdf, iter) {
#'
#' @export
clustra_rand = function(data, k, mccores, replicates = 10, maxdf = 30,
iter = 10, save = FALSE, verbose = FALSE) {
conv = c(10, 0), save = FALSE, verbose = FALSE) {
id = .GRP = ..group = NULL # for data.table R CMD check
results = vector("list", replicates*length(k))

Expand All @@ -283,10 +321,10 @@ clustra_rand = function(data, k, mccores, replicates = 10, maxdf = 30,
## Set starting groups outside parallel section to guarantee reproducibility
grp = lapply(rep(kj, replicates), sample.int, size = n_id, replace = TRUE)
f = parallel::mclapply(grp, traj_rep, data = data, k = kj, maxdf = maxdf,
iter = iter, mc.cores = mccores)
fer = lapply(f, function(f, maxdf, iter)
if(!is.null( (er = xit_report(f, maxdf, iter)) )) {
return(er)} else return(NULL), maxdf = maxdf, iter = iter)
conv = conv, mc.cores = mccores)
fer = lapply(f, function(f, maxdf, conv)
if(!is.null( (er = xit_report(f, maxdf, conv)) )) {
return(er)} else return(NULL), maxdf = maxdf, conv = conv)
for(i in 1:replicates) {
results[[(j - 1)*replicates + i]] = list(k = as.integer(kj),
rep = as.integer(i),
Expand Down

0 comments on commit 4da62b5

Please sign in to comment.