Skip to content
Permalink
Browse files

version 1.0.1

  • Loading branch information...
Jarrett D. Phillips authored and cran-robot committed May 15, 2019
1 parent bb19c45 commit 68e492c488e2c9e74af7cd9202d42b604f6f8730
Showing with 73 additions and 32 deletions.
  1. +5 −5 DESCRIPTION
  2. +10 −10 MD5
  3. +1 −1 R/HAC.sim.R
  4. +2 −2 R/zzz.R
  5. BIN build/partial.rdb
  6. +12 −4 man/HAC.simrep.Rd
  7. +1 −1 man/HACClass.Rd
  8. +3 −3 man/HACHypothetical.Rd
  9. +3 −3 man/HACReal.Rd
  10. +1 −1 man/HACSim-package.Rd
  11. +35 −2 man/envr.Rd
@@ -2,17 +2,17 @@ Package: HACSim
Type: Package
Title: Iterative Extrapolation of Species' Haplotype Accumulation
Curves
Version: 1.0.0
Date: 2019-04-15
Version: 1.0.1
Date: 2019-05-15
Author: Jarrett D. Phillips [aut, cre], Steven H. French [ctb]
Maintainer: Jarrett D. Phillips <phillipsjarrett1@gmail.com>
Description: Performs iterative extrapolation of species' haplotype accumulation curves using a nonparametric stochastic (Monte Carlo) method for assessment of specimen sampling completeness based on the approach of Phillips et al. (2015) <doi:10.1515/dna-2015-0008> and Phillips et al. (2019) <doi:10.1002/ece3.4757>.
Description: Performs iterative extrapolation of species' haplotype accumulation curves using a nonparametric stochastic (Monte Carlo) optimization method for assessment of specimen sampling completeness based on the approach of Phillips et al. (2015) <doi:10.1515/dna-2015-0008> and Phillips et al. (2019) <doi:10.1002/ece3.4757>.
License: GPL-3
NeedsCompilation: yes
Imports: ape (>= 5.2), graphics (>= 3.5.1), pegas (>= 0.11), Rcpp (>=
1.0.0), stats (>= 3.5.1), utils (>= 3.5.1)
LinkingTo: Rcpp, RcppArmadillo
RoxygenNote: 6.1.1
Packaged: 2019-05-06 15:18:39 UTC; jarrettphillips
Packaged: 2019-05-15 20:17:31 UTC; jarrettphillips
Repository: CRAN
Date/Publication: 2019-05-09 14:10:06 UTC
Date/Publication: 2019-05-15 23:00:03 UTC
20 MD5
@@ -1,19 +1,19 @@
fc973a37a76d6ee7c1a3137fe6e834b4 *DESCRIPTION
62e4d24c58e3a22e6fde531936fb3faf *DESCRIPTION
53417a4458902b9315ab690311e5b445 *NAMESPACE
d76f255f9e9099a4c331e46aeb4626ce *R/HAC.object.R
7600eb39ac29d7c7fef3533f14aa652a *R/HAC.sim.R
a31679636ee0e33e826838a51fcc8dd9 *R/HAC.sim.R
61bd352de5e7168eb0721cb140f715c6 *R/HAC.simrep.R
8afb4bf5691029cd41ea83e229b4030d *R/RcppExports.R
e1d039cde7962991d78d10bc74aabe24 *R/zzz.R
edfdda266b46cc6d3d1ee183ce123c99 *build/partial.rdb
13e8241720af1c93ffd4bf3cdda63893 *R/zzz.R
65a6f6015940c543cb2e9e677187d4b0 *build/partial.rdb
21b38be82934c915752eced2746b9385 *man/HAC.sim.Rd
3221d93dbf5c97ccd6e4a103ee27cd14 *man/HAC.simrep.Rd
4f3ff492ab3ee6912505c6788774b1d4 *man/HACClass.Rd
8396e3c069912f9b8d969095103f6c30 *man/HACHypothetical.Rd
0ee9083f7c76d1f64dd6f97870ae9271 *man/HACReal.Rd
339680740f0eb72698318e8f6d8787d8 *man/HACSim-package.Rd
7a5b1ae99b74d3c0806f060e672ab2e0 *man/HAC.simrep.Rd
4e5dbbcd722356118d7526099aecd93c *man/HACClass.Rd
0f0b2fb64f29242565a48e6a8e684a46 *man/HACHypothetical.Rd
359eff1bac8fd4be1ef038155957c16a *man/HACReal.Rd
02f01569423acbc654e33ac8deb6dc35 *man/HACSim-package.Rd
dcb6636c0430f01f3d2cf34205a57dd8 *man/accumulate.Rd
71901a5c31abcd13f4a1077135703845 *man/envr.Rd
799f8ea31d8c2f9c09dd28025cbcac2f *man/envr.Rd
afe1990f43bde8ab9e65249cb2519208 *src/Makevars
61059660eb073d93e00e8ee054237071 *src/Makevars.win
d4d834c897f4ca7d807bd910c10f6e05 *src/RcppExports.cpp
@@ -102,7 +102,7 @@ HAC.sim <- function(N,
stop("H* must be greater than 1")
}

if (sum(probs) != 1) {
if (!isTRUE(all.equal(1, sum(probs), tolerance = .Machine$double.eps^0.25))) {
stop("probs must sum to 1")
}

@@ -5,8 +5,8 @@ envr <- NULL
}

.onAttach <- function(...) {
packageStartupMessage("This is HACSim 1.0.0 \n
packageStartupMessage("This is HACSim 1.0.1 \n
Type ?HACHypothetical to see how to set up objects to run \n simulations of haplotype accumulation for hypothetical species \n
Type ?HACReal to see how to set up objects to run \n simulations of haplotype accumulation for real species \n
Type ?HAC.sim to see how run simulations of haplotype \n accumulation curves")
Type ?HAC.simrep to see how to run simulations of haplotype \n accumulation curves")
}
BIN -565 Bytes (90%) build/partial.rdb
Binary file not shown.
@@ -3,16 +3,24 @@

\title{Run a simulation of haplotype accumulation curves for hypothetical or real species}

\description{Runs the \code{HACSim} algorithm to iteratively extrapolate haplotype
accumulation curves to determine likely specimen sample sizes for hypothetical or real species
\description{Runs the \code{HACSim} algorithm by successively calling \code{HAC.sim} to iteratively extrapolate haplotype accumulation curves to determine likely specimen sample sizes for hypothetical or real species

The algorithm employs the following iterative method when calculating the "Measures of Sampling Closeness":

\deqn{N^*_{i+1} = \frac{N_iH^*}{H_i},}

where \eqn{H_i} is stochastically determined through sampling from \code{probs}, the observed species' haplotype frequency distribution vector.
As the algorithm proceeds, \eqn{H_i} will approach \eqn{H^*} asymptotically (and hence, \eqn{N_i} will converge to \eqn{N^*}), but will likely fluctuate randomly from one iteration to the next. However, estimates of \eqn{N^*} found at each iteration will be monotonically-increasing.
}
\usage{HAC.simrep(HACSObject)}
\arguments{\item{HACSObject}{object containing desired simulation parameters}
\arguments{\item{HACSObject}{object containing the desired simulation parameters}
}
\value{Iteration results are outputted to the console and graphs displayed in plot window. Additionally, iteration results are optionally saved to a CSV file. Subsampled DNA sequences are saved to a FASTA file.}
\value{Iteration results are outputted to the console and graphs displayed in the plot window. Plots depict haplotype accumulation (along with shaded confidence intervals for the mean number of haplotypes found). Dashed lines correspond to the endpoint of the curve and reflect haplotype recovery for a user-defined cutoff (default \code{p} = 0.95, 95\% haplotype diversity). Output from the first iteration is useful for judging levels of haplotype diversity and recovery found in observed intraspecific sequence datasets, reflecting current sampling depth. The required sample size is displayed in the second-last iteration. All other information corresponding to the extrapolated sample size can be found in the last iteration.
Iteration results can optionally be saved to a CSV file. Subsampled DNA sequences are automatically saved to a FASTA file.}
\note{When simulating real species via \code{HACReal(...)}, a pop-up window will appear prompting the user to select an intraspecific FASTA file of aligned/trimmed DNA sequences. The alignment must not contain missing or ambiguous nucleotides (i.e., it should only contain A, C, G or T); otherwise, haplotype diversity may be overestimated. Excluding sequences or alignment sites with missing/ambiguous data is an option.
}
@@ -3,6 +3,6 @@

\title{Internal R code}

\description{\code{HACClass} comprises internal R code used to generate object used by \code{HAC.simrep}. It is not directly called by the user.
\description{\code{HACClass} comprises internal R code used to generate an object used by \code{HAC.simrep}. It is not directly called by the user.
}

@@ -15,13 +15,13 @@ progress = TRUE, filename = NULL)}
\item{perms}{Number of permutations (replications)}
\item{p}{Proportion of haplotypes to recover}
\item{conf.level}{Desired confidence level for gaphical output and interval estimation}
\item{subsample}{Is a subsample of haplotype labels desired (TRUE/FALSE)?}
\item{subsample}{Is a subsample of haplotype labels desired?}
\item{prop}{If subsample = TRUE, the proportion of haplotype labels to subsample}
\item{progress}{Should iteration output be printed to the R console? Default is TRUE.}
\item{progress}{Should iteration output be printed to the R console?}
\item{filename}{Name of file where simulation results are to be saved}
}
\value{An object with 12 elements that can be passed to HAC.simrep()
\value{An object with 13 elements that can be passed to \code{HAC.simrep}
}
\note{\code{N} must be greater than 1 and greater than or equal to \code{Hstar}.
@@ -11,13 +11,13 @@ subsample = FALSE, prop = NULL, progress = TRUE, filename = NULL)}
\arguments{\item{perms}{Number of permutations (replications)}
\item{p}{Proportion of haplotypes to recover}
\item{conf.level}{Desired confidence level for gaphical output and interval estimation}
\item{subsample}{Is a subsample of DNA sequences desired (TRUE/FALSE)?}
\item{subsample}{Is a subsample of DNA sequences desired?}
\item{prop}{If subsample = TRUE, the proportion of DNA sequences to subsample}
\item{progress}{Should iteration output be printed to the R console? Default is TRUE.}
\item{progress}{Should iteration output be printed to the R console?}
\item{filename}{Name of file where simulation results are to be saved}
}

\value{An object with 12 elements that can be passed to HAC.simrep()
\value{An object with 13 elements that can be passed to \code{HAC.simrep}
}

\examples{
@@ -5,7 +5,7 @@
\packageTitle{HACSim}
}
\description{
HACSim (\strong{H}aplotype \strong{A}ccumulation \strong{C}urve \strong{Sim}ulator) employs a novel nonparametric stochastic (Monte Carlo) method of iteratively generating species' haplotype accumulation curves through extrapolation to assess sampling completeness based on the approach outlined in Phillips et al. (2015) <doi:10.1515/dna-2015-0008> and Phillips et al. (2019) <doi:10.1002/ece3.4757>. The package outputs a number of useful summary statistics of sampling coverage ("Measures of Sampling Closeness"), including an estimate of the likely required sample size (along with desired level confidence intervals) necessary to recover a given number/proportion of observed unique species' haplotypes.
HACSim (\strong{H}aplotype \strong{A}ccumulation \strong{C}urve \strong{Sim}ulator) employs a novel nonparametric stochastic (Monte Carlo) optimization method of iteratively generating species' haplotype accumulation curves through extrapolation to assess sampling completeness based on the approach outlined in Phillips et al. (2015) <doi:10.1515/dna-2015-0008> and Phillips et al. (2019) <doi:10.1002/ece3.4757>. The package outputs a number of useful summary statistics of sampling coverage ("Measures of Sampling Closeness"), including an estimate of the likely required sample size (along with desired level confidence intervals) necessary to recover a given number/proportion of observed unique species' haplotypes.
}
\details{

@@ -1,7 +1,40 @@
\name{envr}
\alias{envr}

\title{Internal R code}
\title{Simulation variable storage environment}

\description{\code{envr} is a new environment that is created when \code{HACSim} is loaded.
\description{\code{envr} is a new (initially empty) environment that is created when \code{HACSim} is loaded.
}

\value{When a simulation is run via \code{HAC.simrep}, \code{envr} will contain 20 elements as follows:

\item{conf.level}{The desired confidence level. Default is \code{conf.level = 0.95}.}
\item{d}{A dataframe with three columns: specimens (specs), accumulated haplotypes (means), and standard deviation (sd)}
\item{filename}{The name of the file where results are to be saved. Default is NULL.}
\item{high}{The upper endpoint of the desired level confidence interval for the 'true' required sample size}
\item{Hstar}{Number of unique species' haplotypes}
\item{input.seqs}{Should DNA sequences be inputted? Default is FALSE.}
\item{iters}{The number of iterations required to reach convergence}
\item{low}{The upper endpoint of the desired level confidence interval for the 'true' required sample size}
\item{N}{The starting sample size used to initialize the algorithm}
\item{Nstar}{The final (extrapolated) sample size}
\item{p}{The user-specified level of haplotype recovery. Default is \code{p} = 0.95.}
\item{perms}{The user-specified number of permutations (replications). Default is \code{perms} = 10000.}
\item{probs}{Haplotype frequency distribution vector}
\item{progress}{Should iteration results be outputted to the console? Default is TRUE.}
\item{prop.haps}{If \code{subset.haps} = TRUE, the user-specified proportion of haplotype labels to recover}
\item{prop.seqs}{If \code{subset.seqs} = TRUE, the user-specified proportion of DNA sequencess to recover}
\item{ptm}{A timer to track progress of the algorithm in seconds}
\item{R}{The proportion of haplotypes recovered by the algorithm}
\item{subset.haps}{Should a subsample of haplotype labels be taken? Default is FALSE.}
\item{subset.seqs}{Should a subsample of DNA sequences be taken? Default is FALSE.}
}
\examples{
# Returns the frequencies of each haplotype in the extrapolated sample
max(envr$d$specs) * envr$probs
# Returns the extrapolated sample size corresponding to the dotted line
# in the last iteration plot
envr$d[which(envr$d$means >= envr$p * envr$Hstar), ][1, 1]
}

0 comments on commit 68e492c

Please sign in to comment.
You can’t perform that action at this time.