Skip to content

Commit

Permalink
version 1.8.2
Browse files Browse the repository at this point in the history
  • Loading branch information
squidlobster authored and cran-robot committed Jun 30, 2024
1 parent fe46204 commit 3fd8671
Show file tree
Hide file tree
Showing 8 changed files with 226 additions and 20 deletions.
8 changes: 4 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: castor
Type: Package
Title: Efficient Phylogenetics on Large Trees
Version: 1.8.0
Date: 2023-12-28
Version: 1.8.2
Date: 2024-06-29
Author: Stilianos Louca
Maintainer: Stilianos Louca <louca.research@gmail.com>
Description: Efficient phylogenetic analyses on massive phylogenies comprising up to millions of tips. Functions include pruning, rerooting, calculation of most-recent common ancestors, calculating distances from the tree root and calculating pairwise distances. Calculation of phylogenetic signal and mean trait depth (trait conservatism), ancestral state reconstruction and hidden character prediction of discrete characters, simulating and fitting models of trait evolution, fitting and simulating diversification models, dating trees, comparing trees, and reading/writing trees in Newick format. Citation: Louca, Stilianos and Doebeli, Michael (2017) <doi:10.1093/bioinformatics/btx701>.
Expand All @@ -14,6 +14,6 @@ Suggests: nloptr, ape
LinkingTo: Rcpp
RoxygenNote: 7.1.2
NeedsCompilation: yes
Packaged: 2024-01-09 08:47:48 UTC; stilianoslouca
Packaged: 2024-06-29 08:08:54 UTC; stilianoslouca
Repository: CRAN
Date/Publication: 2024-01-09 12:10:07 UTC
Date/Publication: 2024-06-29 08:40:02 UTC
12 changes: 7 additions & 5 deletions MD5
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
f6876a1ec47953692119c4c3829a2287 *DESCRIPTION
fee14e3b7d515e99cc60ec7460f90560 *NAMESPACE
889e58f560abe8a8314e74eac41fcfd1 *DESCRIPTION
ae7796e4b84cafa1cbe7df512c91bba8 *NAMESPACE
913da055bed292caf25ebe2a746bb557 *R/RcppExports.R
48bbb8e6675b8b36f45420f212377339 *R/asr_empirical_probabilities.R
989b7786c5d618f4b4a278f60d277a71 *R/asr_independent_contrasts.R
bd143fa1ee70e02d379843ae2ddd3b27 *R/asr_max_parsimony.R
854e7d283cf2ceec80a7bc8c38ba1758 *R/asr_mk_model.R
085126368e9a7434358a27646b8765a6 *R/asr_squared_change_parsimony.R
f6a1c71281b5713a4f8fe75745bbc76f *R/asr_subtree_averaging.R
38ac32e3c612fea30d0544eb4d9a300b *R/auxiliary_routines.R
78b4b583e527eebd9dd7ed1701cb8389 *R/auxiliary_routines.R
3d3a8ca9e053dbba55087b1d077daa4e *R/biom.R
e1de6dad7f288ea93beb0393dedda65d *R/clade_densities.R
3701dd6dd45a1e72c5d7e8ebeb49fb6d *R/collapse_monofurcations.R
Expand Down Expand Up @@ -49,7 +49,7 @@ d95ae2abead28302ed7476945a97b185 *R/fit_hbd_pdr_parametric.R
c51b79935430d889545599e31b0d8465 *R/fit_hbd_psr_parametric.R
aaa485599a20efed371b8bad2ab4f0c3 *R/fit_hbds_model_on_grid.R
0d41c4d362dd87f8b41d3d32a52d95a0 *R/fit_hbds_model_parametric.R
9c0e174bcd6dfa10e0e05bed4af7618f *R/fit_mk.R
cb39a7f3e67765cb1d98c0f9b7f96f18 *R/fit_mk.R
ef7baef13c0512d808601ab52a802020 *R/fit_musse.R
4bde1da98a75c0c0838a312e59114d80 *R/fit_sbm_const.R
a9826b9ab94cc1a59249349bb0bfea00 *R/fit_sbm_const_biased_sampling_unfinished.R
Expand Down Expand Up @@ -142,6 +142,7 @@ d7036d7e94303864d600a9a26026c408 *R/simulate_sbm.R
0f7486d79feff9043f34b7eda231ced7 *R/tree_distance.R
78773fd2f41a0b07b16a854592719f2b *R/tree_from_branching_ages.R
621f576a6ac533d6a4e6e050991978d6 *R/tree_from_sampling_branching_ages.R
f09e3f1b8aa56cd50042360fb270f403 *R/tree_from_taxa.R
5a1a2314ffe0ee81e70f345f0669f510 *R/tree_imbalance.R
ca129429443d9d6247752c0038da7f6b *R/trim_tree_at_height.R
5871b15620f4a6ff77501be1c50d0e13 *R/write_tree.R
Expand Down Expand Up @@ -179,7 +180,7 @@ d2864b744bb78215c2c8cb84cc94bd1b *man/extract_tip_radius.Rd
1f8f19a1dbf406b0c66c10709bbae720 *man/find_farthest_tips.Rd
df6f54e120cd7b965687f14a427c5953 *man/find_nearest_tips.Rd
474164c84fb3892a906b791c416915dc *man/find_root.Rd
e5b3c517b235b600c18a27ef3ce044a3 *man/find_root_of_monophyletic_tips.Rd
cc9b23c296ba59675d2da55541b68e89 *man/find_root_of_monophyletic_tips.Rd
9fc352fb0bea5b5d31e591f8c486f467 *man/fit_and_compare_bm_models.Rd
f2a33e470ae16209e620a26940590594 *man/fit_and_compare_sbm_const.Rd
ca8ee397cdfa23a943f3e3ec22a526cb *man/fit_bm_model.Rd
Expand Down Expand Up @@ -278,6 +279,7 @@ b1831a13ee7dc57a2a3c68a94690d293 *man/spline_coefficients.Rd
0b5cf3ee37b5379958a75967697805ac *man/tree_distance.Rd
850c5d0a06bf71e4ef12a955aa5ee27f *man/tree_from_branching_ages.Rd
8de1c40338e59229fc6faf72b1c1cd99 *man/tree_from_sampling_branching_ages.Rd
8183f1591f94d757938f5e07fc6c9475 *man/tree_from_taxa.Rd
6ebaa422630c326b0a09b0ce3192f4ca *man/tree_imbalance.Rd
6fa52aa5090a4e99df1b23f06119d825 *man/trim_tree_at_height.Rd
2d6d4490a937045e02b449eb19aac446 *man/write_tree.Rd
Expand Down
3 changes: 2 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -161,4 +161,5 @@ export(consensus_taxonomies)
export(correlate_phylo_geodistances)
export(expanded_tree_from_jplace)
export(expanded_tree_from_jplace)
export(place_tips_taxonomically)
export(place_tips_taxonomically)
export(tree_from_taxa)
16 changes: 10 additions & 6 deletions R/auxiliary_routines.R
Original file line number Diff line number Diff line change
Expand Up @@ -2456,16 +2456,16 @@ bootstrap_Kolmogorov_Smirnov_test = function( bootstraps, # list of length NB, l
empirical_KS = max(abs(empirical_CDF_values-reference_CDF_values))

# calculate KS distance of each boostrap to the reference CDF
boostrap_KSs = rep(NA, NB)
bootstrap_KSs = rep(NA, NB)
for(b in seq_len(NB)){
boostrap_KSs[b] = max(abs(bootstrap_CDF_values[b,]-reference_CDF_values))
bootstrap_KSs[b] = max(abs(bootstrap_CDF_values[b,]-reference_CDF_values))
}

# compare empirical_KS to bootstrap_KSs
mean_bootstrap_KS = mean(boostrap_KSs)
std_bootstrap_KS = sd(boostrap_KSs)
median_bootstrap_KS = median(boostrap_KSs)
Pvalue = mean(boostrap_KSs>=empirical_KS)
mean_bootstrap_KS = mean(bootstrap_KSs)
std_bootstrap_KS = sd(bootstrap_KSs)
median_bootstrap_KS = median(bootstrap_KSs)
Pvalue = mean(bootstrap_KSs>=empirical_KS)
return(list(empirical_KS = empirical_KS,
mean_bootstrap_KS = mean_bootstrap_KS,
std_bootstrap_KS = std_bootstrap_KS,
Expand Down Expand Up @@ -3170,3 +3170,7 @@ open_file = function(file_path, mode){
return (if(endsWith(tolower(file_path),".gz")) gzfile(file_path, mode) else file(file_path,mode))
}





6 changes: 3 additions & 3 deletions R/fit_mk.R
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ fit_mk = function( trees, # either a single tree in phylo format, or a
objective_function = function(dense_rates, trial){
if(any(is.nan(dense_rates)) || any(is.infinite(dense_rates))){
if(diagnostics) cat(sprintf("%s Trial %d: Objective requested for invalid rates: %s\n",verbose_prefix,trial,paste(sprintf("%.4g", dense_rates), collapse=", ")))
return(Inf)
return(if(optim_algorithm == "optim") 1e100 else Inf)
}
Q = get_transition_matrix_from_rate_vector(dense_rates, index_matrix, Nstates)
if(root_prior[1]=="stationary"){
Expand All @@ -215,7 +215,7 @@ fit_mk = function( trees, # either a single tree in phylo format, or a
max_polynomials = 1000)
if((!results$success) || is.na(results$loglikelihood) || is.nan(results$loglikelihood)){
if(diagnostics) cat(sprintf("%s Trial %d, tree %d (%d tips): Model evaluation failed: %s\n",verbose_prefix,trial,tr,length(focal_tree$tip.label),results$error))
return(Inf)
return(if(optim_algorithm == "optim") 1e100 else Inf)
}
loglikelihood = loglikelihood + results$loglikelihood
}
Expand Down Expand Up @@ -259,7 +259,7 @@ fit_mk = function( trees, # either a single tree in phylo format, or a
if(Ntrials>1){
# randomly choose multiple parameter starting points and keep the Ntrials-1 most promising ones (i.e., with smallest objective values) plus the defaut_start
Nscouts = (if(is.null(Nscouts)) min(10000,10*Nrates*Ntrials) else max(Ntrials-1,Nscouts))
if(verbose) cat(sprintf("%sGenerating %d random parameter starts and selecting the most promising ones..\n",verbose_prefix,Nscouts))
if(verbose) cat(sprintf("%sGenerating %d random parameter starts ('scouts') and selecting the most promising ones..\n",verbose_prefix,Nscouts))
starts_pool = lapply(seq_len(Nscouts), FUN=function(k) first_guess_rate * 10**runif(n=Nrates, min=-((k/Nscouts)^2)*power_range/2, max=((k/Nscouts)^2)*power_range/2))
# compute the objective values for each start in the pool
if((Nthreads>1) && (.Platform$OS.type!="windows")){
Expand Down
135 changes: 135 additions & 0 deletions R/tree_from_taxa.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Given a collection of taxon lists, construct a rooted taxonomic tree.
# Each taxon list is defined by an parent name and the names of its children (i.e., immediate descendants).
# Rules:
# Each taxon must appear at most once as an parent and at most once as a child.
# Any taxa found as parents but not as children, will be assumed to descend directly from the root. If only one such taxon was found, it will become the root itself.
# Any taxa found as a child but not as an parent, will become tips.
# Any parents without children will be considered tips.
# Since the returned tree is a taxonomy, it will contain no edge lengths.
tree_from_taxa = function(taxa){
# basic input checking
NTL = length(taxa)
if(!(class(taxa) %in% c("list"))) stop("taxa must be a list of vectors")
parent_names = names(taxa)
if(any(parent_names=="")) stop("One of the parent names is empty, which is not allowed")
duplicate_parents = unique(parent_names[duplicated(parent_names)])
if(length(duplicate_parents)>0) stop(sprintf("%d %s multiple times as parent: %s",length(duplicate_parents),ifelse(length(duplicate_parents)==1,"taxon appears","taxa appear"),paste(duplicate_parents,collapse=", ")))
child_names = unlist(taxa)
duplicate_children = unique(child_names[duplicated(child_names)])
if(length(duplicate_children)>0) stop(sprintf("%d %s multiple times as a child: %s",length(duplicate_children),ifelse(length(duplicate_children)==1,"taxon appears","taxa appear"),paste(duplicate_children,collapse=", ")))

# determine tips & nodes
taxon_sizes = sapply(taxa, FUN=function(taxon_list) length(taxon_list))
node_names = parent_names[taxon_sizes>=1]
tip_names = c(setdiff(child_names, node_names), parent_names[taxon_sizes==0])
root_names = setdiff(parent_names, child_names)
if(length(root_names)==0) stop("Missing root: All parents also appear as children")

# determine tree dimensions
Ntips = length(tip_names)
Nnodes = ifelse(length(root_names)==1, length(node_names), length(node_names)+1)
Nclades = Ntips + Nnodes
Nedges = sum(taxon_sizes) + ifelse(length(root_names)==1, 0, length(root_names))

# define edges
edges = matrix(0L, nrow=Nedges, ncol=2)
if(length(root_names)==1){
# found a unique root. By convention, this should be the first node in the tree.
node_names = c(root_names, setdiff(node_names,root_names))
clade_name2index = setNames(c(1:Nclades), nm=c(tip_names,node_names))
next_edge = 1
}else{
# found multiple roots, so combine them as children of a single true root with empty name ("")
# By convention, the root will be the first node in the tree
node_names = c("", node_names)
clade_name2index = setNames(c(1:Nclades), nm=c(tip_names,node_names))
edges[1:length(root_names),1] = Ntips+1
edges[1:length(root_names),2] = clade_name2index[root_names]
next_edge = 1 + length(root_names)
}
for(tl in seq_len(NTL)){
# define edges connecting the parent of this taxon list to all of its children
parent_clade = clade_name2index[parent_names[tl]]
child_clades = clade_name2index[taxa[[tl]]]
edges[next_edge:(next_edge+taxon_sizes[tl]-1),1] = parent_clade
edges[next_edge:(next_edge+taxon_sizes[tl]-1),2] = child_clades
next_edge = next_edge + taxon_sizes[tl]
}

# construct the tree object
tree = list(Nnode = Nnodes,
tip.label = tip_names,
node.label = node_names,
edge = edges,
root = Ntips+1)
class(tree) = "phylo"
attr(tree,"order") = NULL
return(tree)
}



# construct a rooted tree from taxon lists loaded from a text file
# Each taxon list in the text file begins with the parent name, followed by the children names in separate lines.
# Taxon lists are separated by at least one empty line.
# Comments are allowed, and must be preceded by the character #.
# For example:
# Enterobacteriaceae # family
# Enterobacter # genus
# Escherichia # genus
# Klebsielle # genus
#
# Erwiniaceae # family
# Erwinia # genus
# Buchnera # genus
# Pantoea # genus
#
# Escherichia # genus
# E. albertii # species
# E. coli # species
# E. hermannii # species
tree_from_taxa_file = function( file_path, # path to an input text file, containing taxon lists. This file may be gzipped.
prefix_sep = NULL){ # optional separator character, any part prior to which in taxon names will be omitted. For example, if ":", then "genus:Escherichia" will become "Escherichia", while "Staphylococcus" will remain "Staphylococcus".
fin = open_file(file_path, "rt")
lines = suppressWarnings(readLines(fin, file.info(file_path)$size))
close(fin)

new_taxon = TRUE
children = character(0)
parent = NULL
taxa = list()
for(line in lines){
line = gsub("#.*","",line,fixed=FALSE) # remove any comments
line = trimws(line) # remove any flanking whitespace

# remove prefix is applicable
if(!is.null(prefix_sep)){
parts = strsplit(line, prefix_sep, fixed=TRUE)[[1]]
if(length(parts)>1){
line = paste(parts[2:length(parts)], collapse=prefix_sep)
}
}

if(line==""){
if(!is.null(parent)){
taxa[[parent]] = children
children = character(0)
parent = NULL
}
}else if(is.null(parent)){
parent = line
}else{
children = c(children, line)
}
}

# add remaining taxon
if(!is.null(parent)){
taxa[[parent]] = children
children = character(0)
parent = NULL
}

return(list(taxa=taxa, tree=tree_from_taxa(taxa)))

}
2 changes: 1 addition & 1 deletion man/find_root_of_monophyletic_tips.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Logical, specifying whether the input tree can be assumed to be rooted. If you a
}

\details{
The input tree may include an arbitrary number of incoming and outgoing edges per node (but only one edge per tip), and the direction of these edges can be arbitrary. Of course, the undirected graph defined by all edges must still be a valid tree (i.e. a connected acyclic graph). Note that this function does not change the tree, it just determines which tip or node should be made root for the target tips to be a monophyletic group.
The input tree may include an arbitrary number of incoming and outgoing edges per node (but only one edge per tip), and the direction of these edges can be arbitrary. Of course, the undirected graph defined by all edges must still be a valid tree (i.e. a connected acyclic graph). This function does not change the tree, it just determines which tip or node should be made root for the target tips to be a monophyletic group. If the target tips do not form a monophyletic group regardless of root placement (this is typical if the tips are simply chosen randomly), this function returns \code{NA}.


The asymptotic time complexity of this function is O(Nedges).
Expand Down
64 changes: 64 additions & 0 deletions man/tree_from_taxa.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
\name{tree_from_taxa}
\alias{tree_from_taxa}
\title{
Construct a rooted tree from lists of taxa.
}
\description{
Given a collection of taxon lists, construct a rooted taxonomic tree. Each taxon list is defined by a parent name and the names of its children (i.e., immediate descendants).
}
\usage{
tree_from_taxa(taxa)

}
%- maybe also 'usage' for other objects documented here.
\arguments{
\item{taxa}{
Named list, whose elements are character vectors, each representing a parent and its children. The element names of \code{taxa} are parents. Each element \code{taxa[n]} is a character vector listing an arbitrary number of taxon names (the children), which immediately descend from taxon \code{names(taxa)[n]}.
}
}


\details{
The following rules apply:
\itemize{
\item Each taxon must appear at most once as a parent and at most once as a child.
\item Any taxa found as parents but not as children, will be assumed to descend directly from the root. If only one such taxon was found, it will become the root itself.
\item Any taxa found as a child but not as a parent, will become tips.
\item Any parents without children will be considered tips.
\item Empty parent names (i.e., "") are not allowed.
\item Taxa can be specified in any arbitrary order, including breadth-first, depth-first etc.
}
Since the returned tree is a taxonomy, it will contain no edge lengths.
}


\value{
A rooted tree of class "phylo".
}

\author{Stilianos Louca}

%\references{
%}

\seealso{
\code{\link{consensus_taxonomies}},
\code{\link{place_tips_taxonomically}}
}


\examples{
# define a list of taxa, with taxon "A" being the root
# Taxa G, H, I, J, K, L, M, N and O will be tips
taxa = list(A = c("B","C","D"),
B = c("E","I"),
C = c("J","F"),
D = c("M", "N", "O"),
E = c("G", "H"),
F = c("K", "L"))
tree = castor::tree_from_taxa(taxa)
}
% Add one or more standard keywords, see file 'KEYWORDS' in the
% R documentation directory.
% Only 1 keyword per line
\keyword{taxonomy}

0 comments on commit 3fd8671

Please sign in to comment.