phylogenetic tree construction in DADA2? #88

FabianRoger · 2016-06-29T08:27:54Z

hej,

I was wondering if there is (did I miss it?) - or if you plan to add - the possibility to construct a phylogeny within the DADA2 pipeline? I was thinking something like Fasttree.

Along that line I was also thinking if you had any thoughts on how the fact that the sequence resolution of DADA2 is higher than the 3% OTU threshold might interfere with the construction of a tree. I'm think wether it would be a problem if two close sequences both matched the same sequence in the reference database? Should then both be assigned to the same tip and their abundances summed? Or maybe they would simply be split into two tips with distance 0 as it might be done already?

best

Fabian

benjjneb · 2016-06-29T17:25:43Z

Hi Fabian,

There is no phylogeny construction in the dada2 package, and none is planned. There are tools for making phylogenetic trees already in R though, and these can be used in a fairly straightforward fashion.

For an example of this, see our recent workflow paper: http://f1000research.com/articles/5-1492/v1 (Construct the phylogenetic tree section). For more details on tree building, the phangorn package is the relevant R tree-building package.

It is worth noting that the FastTree algorithm is not, to my knowledge, available in R. For building trees on large datasets (thousands of sequences or more) it may be necessary to use an external application.

FabianRoger · 2016-07-01T11:31:18Z

thanks a lot, your workflow paper is super helpful!

It is worth noting though that the function used for alignment msa(seqs, method="ClustalW", type="dna", order="input") is very slow and scales exponentially with number of reads with the time almost quadrupling if the number of sequences doubles. From the two other methods ClustalOmega is not suited for nucleotides (according to the documentation) and Muscle failed (for me) with a cryptic error message *** ERROR *** MSA::SetIdCount: cannot increase count

However AlignSeqsfrom the DECIPHER package scales linear with number of sequences and aligned my dataset (~1000 short reads, ~420 bp) in ~20 seconds.

of cause I know that time is not the only criterion, but for an amplicon workflow it is maybe worth looking into?

thanks for all your help!

Fabian

spholmes · 2016-07-01T13:57:18Z

Thanks a lot for the pointer, when we do another revision for the workflow
paper we will
try DECIPHER as well,
Susan

On Fri, Jul 1, 2016 at 4:31 AM, FabianRoger notifications@github.com
wrote:

thanks a lot, your workflow paper is super helpful!

It is worth noting though that the function used for alignment msa(seqs,
method="ClustalW", type="dna", order="input") is very slow and scales
exponentially with number of reads with the time almost quadrupling if the
number of sequences doubles. From the two other methods ClustalOmega is
not suited for nucleotides (according to the documentation) and Muscle
failed (for me) with a cryptic error message *** ERROR ***
MSA::SetIdCount: cannot increase count

However AlignSeqsfrom the DECIPHER package scales linear with number of
sequences and aligned my dataset (~1000 short reads, ~420 bp) in ~20
seconds.

of cause I know that time is not the only criterion, but for an amplicon
workflow it is maybe worth looking into?

thanks for all your help!

Fabian

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#88 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABJcvTV6MGF5MyOjxsHhiScPOjOyRNAXks5qRPqGgaJpZM4JA42u
.

Susan Holmes
Professor, Statistics and BioX
John Henry Samter Fellow in Undergraduate Education
Sequoia Hall,
390 Serra Mall
Stanford, CA 94305
http://www-stat.stanford.edu/~susan/

benjjneb · 2016-11-03T17:39:33Z

The revised version of the F1000 workflow (https://f1000research.com/articles/5-1492/v2) now uses DECIPHER for the multiple alignment (thanks for suggestion @FabianRoger), and phangorn for phylogenetic construction.

We remain interested in other options within the R universe for phylogenetic construction, but still do not plan to implement it within the dada2 package.

giriarteS · 2017-03-09T21:24:57Z

I used DECIPHER to align ITS sequences (5542 sequences) and then tried to construct the tree with phangorn but it was too slow. I finally constructed a tree with raxml:
alignment <- read.dna("alignment.fasta",format="fasta",as.matrix=TRUE)
alignment.rax.gtr <- raxml(alignment,
m="GTRGAMMAIX", # model
f="a", # best tree and bootstrap
p=1234, # random number seed
x=2345, # random seed for rapid bootstrapping
N=100, # number of bootstrap replicates
file="alignment", # name of output files
exec="raxmlHPC-PTHREADS-SSE3", # name of executable
threads=20
)

benjjneb · 2017-03-09T23:00:34Z

We should consider revisiting our recommendations here. It is definitely not uncommon for datasets to be too large for phangorn to work in reasonable amounts of time.

joey711 · 2017-03-09T23:02:51Z

I agree. I usually assume with modern datasets that I need to use something like raxml.

giriarteS · 2017-03-10T01:20:48Z

I forgot to mention that I used the raxml funtion in the the ips package(v 0.0.7).

josemseoane · 2017-03-29T16:40:33Z

Hello,

Thanks a lot for the workflow, is really great and I have learned a lot going through it!
Triying to deal with a lot of files I swithed to raxml in the ips package as suggested by giriarteS and everything went fine except that RAxML is reporting a weird error about my taxon lenght name and reported error "status 65535". Here is wha I did, I would greatly appreciate any help you may provide!

Thanks a lot!

Jose

exec<- "C:/Users/R100295/Documents/Bioinfo/RAxML/raxmlHPC-PTHREADS-SSE3.exe"
alignment.rax.gtr <- raxml(alignment,

                       m="GTRGAMMAIX", # model

                       f="a", # best tree and bootstrap

                       p=1234, # random number seed

                       x=2345, # random seed for rapid bootstrapping

                       N=100, # number of bootstrap replicates

                       file="alignment", # name of output files

                       exec= exec # name of executable

```
                       #threads=8
```
)
[1] "C:/Users/R100295/Documents/Bioinfo/RAxML/raxmlHPC-PTHREADS-SSE3.exe -f a -p 1234 -x 2345 -m GTRGAMMAIX -N 100 -s alignment.phy -n alignment"

WARNING: The number of threads is currently set to 0
You can specify the number of threads to run via -T numberOfThreads
NumberOfThreads must be set to an integer value greater than 1

RAxML, will now set the number of threads automatically to 2 !

Taxon Name too long at taxon 1, adapt constant nmlngth in
axml.h, current setting 256
Error in file(file, "r") : cannot open the connection
In addition: Warning messages:
1: running command 'C:/Users/R100295/Documents/Bioinfo/RAxML/raxmlHPC-PTHREADS-SSE3.exe -f a -p 1234 -x 2345 -m GTRGAMMAIX -N 100 -s alignment.phy -n alignment' had status 65535
2: In file(file, "r") :
cannot open file 'RAxML_info.alignment': No such file or directory

giriarteS · 2017-03-29T18:00:43Z

If you need longer taxon names you can adapt the constant #define nmlngth 256 in file axml.h appropriately. Check the lenght of your SV ID (sequence) and change that parameter in the axml.h file. I just gave a short name to each single variant (SV1.....SVn), I work with ITS and my sequence lengths are really variable. Below I describe my workflow:
seqtab.nochim1 <- seqtab.nochim
colnames(seqtab.nochim1) <- paste0("SV", 1:ncol(seqtab.nochim1))
Then added taxonomy to the original seqtab.nochim
taxtab <- assignTaxonomy(seqtab.nochim, ref_fasta_UNITE, minBoot = 50, tryRC = TRUE,
outputBootstraps = FALSE, taxLevels = c("Kingdom", "Phylum", "Class",
"Order", "Family", "Genus", "Species"), multithread = TRUE,
verbose = TRUE)
and change the names again (the order was the same as in seqtab.nochim)
taxtab1 <- taxtab
rownames(taxtab1) <- paste0("SV", 1:nrow(taxtab1))

The SV table and SV sequences were exported also:
export_taxa_table_and_seqs = function(seqtab.nochim, file_seqtab, file_seqs) {
seqtab.t = as.data.frame(t(seqtab.nochim))
seqs = row.names(seqtab.t)
row.names(seqtab.t) = paste0("SV", 1:nrow(seqtab.t))
outlist = list(data_loaded = seqtab.t)
mctoolsr::export_taxa_table(outlist, file_seqtab)
seqs = as.list(seqs)
seqinr::write.fasta(seqs, row.names(seqtab.t), file_seqs)
}
export_taxa_table_and_seqs(seqtab.nochim,"/path_to your_file/SV_table.txt",
"/path_to your_file/SV_seqs.fa")

Finally you can load your SV fasta file and align the sequences with DECIPHER:
fas <- "SV_seqs.fa"
seqs <- readDNAStringSet(fas)
seqs <- OrientNucleotides(seqs) ## I found duplicated SVs after I reoriented the sequences
alignment <- AlignSeqs(seqs, anchor=NA)

I exported my alignment to a server to run RAXML, it took 9 days for 5000 sequences.
phang.align <- phyDat(as(alignment, "matrix"), type="DNA")
write.phyDat(phang.align, file="alignment.fasta", format="fasta")

TS3alignment <- read.dna("alignment.fasta",format="fasta",as.matrix=TRUE)
alignment.rax.gtr <- raxml(myalignment,
m="GTRGAMMAIX", # model
f="a", # best tree and bootstrap
p=1234, # random number seed
x=2345, # random seed for rapid bootstrapping
N=100, # number of bootstrap replicates
file="alignment", # name of output files
exec="raxmlHPC-PTHREADS-SSE3", # name of executable
threads=20
)

Then you incorporate everything to your phyloseq object:
tree <- read_tree("RAxML_bipartitionsBranchLabels.alignment")
metadata_path <- "Metadata.csv"
samdf <- read.csv(metadata_path, header=TRUE)
all(rownames(seqtab.nochim1) %in% samdf$SampleID)
rownames(samdf) <- samdf$SampleID

ps <- phyloseq(tax_table(taxtab1), sample_data(samdf),otu_table(seqtab.nochim1, taxa_are_rows = FALSE), phy_tree(tree))
saveRDS(ps, file="ps.rds")

This workflow works well, if any of the steps seen redundant or if anybody has suggestions on how to improve it, please let me know.

josemseoane · 2017-03-30T05:43:00Z

Thanks a lot @giriarteS, that was a smart way of solving the lenght issue, it worked out perfectly well!

jarrodscott · 2017-06-10T20:57:10Z

anyone try 'RAxML next generation' for tree construction?

https://github.com/amkozlov/raxml-ng/

benjjneb · 2017-06-12T15:46:18Z

@jjscarpa I haven't tried it, but would also be interested in results from those who have!

jarrodscott · 2017-06-12T15:56:59Z

I am running both RaxML and RaxML-ng on the same data set now. Happy to share results when the jobs are done and I analyze results. Curious also if others have tried this

…

On Monday, June 12, 2017, Benjamin Callahan ***@***.***> wrote: @jjscarpa <https://github.com/jjscarpa> I haven't tried it, but would also be interested in results from those who have! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#88 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APcStrv5udQ3xN71RFh6d0bADhgkEwCPks5sDV1MgaJpZM4JA42u> .

rachaelantwis · 2018-02-16T07:39:13Z

Hi all,
I'm having a bit of trouble with using decipher in dada2.
The code I'm using is as follows:

#Make the phylogenetic tree
library(DECIPHER)
seqs <- getSequences(seqtab.nochim)
names(seqs) <- seqs # This propagates to the tip labels of the tree
alignment <- AlignSeqs(DNAStringSet(seqs), anchor=NA)

library(phangorn)
phang.align <- phyDat(as(alignment, "matrix"), type="DNA")
dm <- dist.ml(phang.align)
treeNJ <- NJ(dm) # Note, tip order != sequence order
fit = pml(treeNJ, data=phang.align)

#negative edges length changed to 0!

fitGTR <- update(fit, k=4, inv=0.2)
fitGTR <- optim.pml(fitGTR, model="GTR", optInv=TRUE, optGamma=TRUE,
rearrangement = "stochastic", control = pml.control(trace = 0))
tree <- read.tree(fitGTR$tree)

#EXPORT TO PHYLOSEQ AND MERGE WITH SAMPLE DATA (replace "..." with path name of sample data, i.e. metadata)

  #Tidying Up Sample Data
    samples.out <- data.frame(Sample=rownames(seqtab.nochim))
  
  #Import sample dataset
  library(readr)
  Sample_data <- read_csv("/Users/rachaelantwis/Dropbox/Research/Current/Papers/Ponies/Pony_sample_data.csv")
  View(Sample_data)
  
    sample.info<-Sample_data
    head(sample.info)
    
  #Samples as factor
    sample.info$Sample<-as.factor(sample.info$Sample)
    
  #Merge the Sample ID data from dada2 with the metadata  
    sample.merge<-merge(sample.info,samples.out,by="Sample",all.y=TRUE)
    nrow(sample.merge)
    #tail(sample.merge)
    
  #Rownames of sample data are sample IDs
    rownames(sample.merge)<-sample.merge$Sample
    
  #Add an indicator variable for whether a sample is a sample or a negative
    sample.merge$sampletype<-"sample"
    sample.merge$sampletype[grep("BLANK",sample.merge$Sample)]<-"negative"
  
  #Trim names of Taxa Table - label them as sequence variants (SVs)
      #note this strips out the sequence data whcih dada2 uses as a name element
      #so this is where to find the sequence if you need them
    taxa.slim<-unname(tt.plus)
    rownames(taxa.slim)<- paste("SV",seq(nrow(tt.plus)))  
    head(taxa.slim)
  
  #Trim Names of OTU Table
    seqtab.slim<-t(seqtab.nochim)
    rownames(seqtab.slim)<-paste("SV",seq(nrow(tt.plus)))
      
  #Export to Phyloseq
    library(phyloseq)
    
    ps <- phyloseq(otu_table(seqtab.slim,taxa_are_rows = TRUE), 
                   sample_data(sample.merge), 
                   tax_table((taxa.slim)), phy_tree(tree))

And I get the following error:
Error in validObject(.Object) : invalid class “phyloseq” object:
Component taxa/OTU names do not match.
Taxa indices are critical to analysis.
Try taxa_names()

If I remove the tree component this isn't a problem. Does anyone know what's going wrong?
Many thanks

benjjneb · 2018-02-16T16:37:28Z

I think if you remove the read.tree call, that this will work. You should be able to use fitGTR$tree directly as in the F1000 workflow.

read.tree is intended for reading trees into R from Newick-format files.

rachaelantwis · 2018-02-22T06:37:17Z

Thanks - I still get the error message:

Error in validObject(.Object) : invalid class “phyloseq” object:
Component taxa/OTU names do not match.
Taxa indices are critical to analysis.
Try taxa_names()

When I try:

taxa_names(fitGTR)
NULL

This is what I get when I inspect fitGTR:

fitGTR

loglikelihood: -140407

unconstrained loglikelihood: -1712.417
Proportion of invariant sites: 0.1513297
Discrete gamma model
Number of rate categories: 4
Shape parameter: 0.4878634

Rate matrix:
a c g t
a 0.0000000 0.7836721 3.3499360 1.740702
c 0.7836721 0.0000000 0.5706275 4.415296
g 3.3499360 0.5706275 0.0000000 1.000000
t 1.7407018 4.4152960 1.0000000 0.000000

Base frequencies:
0.2315264 0.2043853 0.2952618 0.2688266

I'm assuming taxa_names should be populated in some way?
Many thanks

benjjneb · 2018-02-22T14:07:44Z

Oh... the problem I think is that you have renamed the taxa as "SV" etc. in seqtab and tax, but not in the tree.

I recommend leaving the sequences as the names until you get everything into the phyloseq object, then use taxa_names(ps) <- new.names to rename the phyloseq object (and each slot in it) consistently.

rachaelantwis · 2018-02-22T14:18:34Z

Ah thanks, this is working now! Code as follows (for the next person):

#library(DECIPHER)
seqs <- getSequences(seqtab.nochim)
names(seqs) <- seqs # This propagates to the tip labels of the tree
alignment <- AlignSeqs(DNAStringSet(seqs), anchor=NA)

#library(phangorn)
phang.align <- phyDat(as(alignment, "matrix"), type="DNA")
dm <- dist.ml(phang.align)
treeNJ <- NJ(dm) # Note, tip order != sequence order
fit = pml(treeNJ, data=phang.align)

#negative edges length changed to 0!
fitGTR <- update(fit, k=4, inv=0.2)
fitGTR <- optim.pml(fitGTR, model="GTR", optInv=TRUE, optGamma=TRUE,
rearrangement = "stochastic", control = pml.control(trace = 0))

  #Tidying Up Sample Data
    samples.out <- data.frame(Sample=rownames(seqtab.nochim))
  
  #Import sample dataset
  library(readr)
  Sample_data <- read_csv("/Users/rachaelantwis/Dropbox/Research/Current/Papers/Ponies/Pony_sample_data.csv")
  View(Sample_data)
  
    sample.info<-Sample_data
    head(sample.info)
    
  #Samples as factor
    sample.info$Sample<-as.factor(sample.info$Sample)

  #Merge the Sample ID data from dada2 with the metadata  
    sample.merge<-merge(sample.info,samples.out,by="Sample",all.y=TRUE)
    nrow(sample.merge)
    #tail(sample.merge)
    
  #Rownames of sample data are sample IDs
    rownames(sample.merge)<-sample.merge$Sample
    
  #Add an indicator variable for whether a sample is a sample or a negative
    sample.merge$sampletype<-"sample"
    sample.merge$sampletype[grep("BLANK",sample.merge$Sample)]<-"negative"
  
  #Export to Phyloseq
    library(phyloseq)

ps <- phyloseq(tax_table(tt.plus), sample_data(sample.merge), otu_table(seqtab.nochim, taxa_are_rows = FALSE), phy_tree(fitGTR$tree))

    taxa_names(ps) <- paste("SV",seq(nrow(tt.plus)))`

ndanckert · 2018-02-27T21:54:36Z

Hi,

I am also having trouble with this section of the DADA2 script and am relatively new at this work. I am processing my 16s sequences through the DICIPHER / Phangorn pipeline from your published workflow.

#My script runs as follows:
Seqs <- getSequences(Seqtab)
names(Seqs) <- Seqs
alignment <- AlignSeqs(DNAStringSet(Seqs), anchor=NA)

phang.align <- phyDat(as(alignment, "matrix"), type="DNA")
dm <- dist.ml(phang.align)
tree.NJ <- NJ(dm)
fit = pml(tree.NJ, data=phang.align)

fitGTR <- update(fit, k=4, inv=0.2)
fitGTR <-optim.pml(fitGTR, model="GTR", optInv=TRUE, optGamma=TRUE, rearrangement = "stochastic", control = pml.control(trace = 0)) 
detach("package:phangorn", unload=TRUE)

After the 'fitGTR <-optim.pml(fitGTR, model="GTR", optInv=TRUE, optGamma=TRUE, rearrangement = "stochastic", control = pml.control(trace = 0))' command I receive the following error:

"Error in if (((ll1 - ll)/ll < control$eps) && rounds > 2) opti <- FALSE : 
  missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50) 
#The error messages are:
1: In optimize(f = fn, interval = c(0.1, 500), lower = 0.1,  ... :
  NA/Inf replaced by maximum positive value"

Could someone please point me in the right direction to solve this.

Much appreciated.
Nathan

benjjneb · 2018-02-28T03:02:14Z

Hm....

There were no errors or warnings in the alignment or NJ tree code before this?

I'm not sure on this one, this is an error in phangorn code so I'm not an expert here, but I would take a look if you can share the fitGTR object causing the problem, i.e. saveRDS(fitGTR, "fitGTR.rds") right before the error-causing command, and post that here.

ndanckert · 2018-02-28T03:13:34Z

Thanks Benjamin. The only warnings I am getting are as follows "Warning messages: 1: package ‘phangorn’ was built under R version 3.3.3 2: package ‘ape’ was built under R version 3.3.3 Please find the file attached. Thanks, Nathan

…

On Wed, Feb 28, 2018 at 2:02 PM, Benjamin Callahan ***@***.*** > wrote: Hm.... There were no errors or warnings in the alignment or NJ tree code before this? I'm not sure on this one, this is an error in phangorn code so I'm not an expert here, but I would take a look if you can share the fitGTR object causing the problem, i.e. saveRDS(fitGTR, "fitGTR.rds") right before the error-causing command, and post that here. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#88 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AjMVCdf9gXIWm3TaopzRChhF3v1y1E3bks5tZMG4gaJpZM4JA42u> .

benjjneb · 2018-02-28T03:29:20Z

@ndanckert

I don't see it attached, might be too big. You can email me:

benjamin DOT j DOT callahan AT gmail DOT com

benjjneb · 2018-03-03T13:41:44Z

@ndanckert

I was able to complete the command that is giving you trouble on my machine. That suggests this might be something solved by updating your libraries or R version. See my setup below. Are you using older versions of phangorn, ape or R?

sessionInfo()
R version 3.4.3 (2017-11-30)

Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] phangorn_2.3.1 ape_5.0

loaded via a namespace (and not attached):
[1] compiler_3.4.3 Matrix_1.2-12 magrittr_1.5 parallel_3.4.3 tools_3.4.3 igraph_1.1.2 yaml_2.1.15
[8] fastmatch_1.1-0 Rcpp_0.12.15 nlme_3.1-131 grid_3.4.3 pkgconfig_2.0.1 lattice_0.20-35 quadprog_1.5-5

ndanckert · 2018-03-06T07:10:59Z

Thanks Ben, I've double checked my computer at uni and you're correct. The error was due to the software not being up to date. Thank you for the help.

…

On Sun, Mar 4, 2018 at 12:41 AM, Benjamin Callahan ***@***.*** > wrote: @ndanckert <https://github.com/ndanckert> I was able to complete the command that is giving you trouble on my machine. That suggests this might be something solved by updating your libraries or R version. See my setup below. Are you using older versions of phangorn, ape or R? sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Sierra 10.12.6 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/ A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/ Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] phangorn_2.3.1 ape_5.0 loaded via a namespace (and not attached): [1] compiler_3.4.3 Matrix_1.2-12 magrittr_1.5 parallel_3.4.3 tools_3.4.3 igraph_1.1.2 yaml_2.1.15 [8] fastmatch_1.1-0 Rcpp_0.12.15 nlme_3.1-131 grid_3.4.3 pkgconfig_2.0.1 lattice_0.20-35 quadprog_1.5-5 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#88 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AjMVCaa_-S64jN_Xt8gn2AZbXoR_8ki8ks5tap2agaJpZM4JA42u> .

akalichen · 2018-04-16T23:40:23Z

Hello, I have also run into a similar problem with this revised workflow "http://web.stanford.edu/class/bios221/MicrobiomeWorkflowII.html".
when I tried to execute this commandline.

phangAlign <- phyDat(as(alignment, "matrix"), type="DNA")
dm <- dist.ml(phangAlign)
treeNJ <- NJ(dm) # Note, tip order != sequence order

fit = pml(treeNJ, data=phangAlign)
fitGTR <- update(fit, k=4, inv=0.2)
fitGTR <- optim.pml(fitGTR, model="GTR", optInv=TRUE, optGamma=TRUE,
        rearrangement = "stochastic", control = pml.control(trace = 0))
detach("package:phangorn", unload=TRUE)

I get error message like this.

"NA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive value"

akalichen · 2018-04-16T23:43:39Z

my Rstudio and my packages were uptodate.

benjjneb · 2018-04-17T13:06:51Z

@akalichen Can you repost this issue at the GitHub issues page for the workflow itself? https://github.com/spholmes/F1000_workflow/issues

This error isn't really related to the dada2 package.

akalichen · 2018-04-17T15:24:42Z

Thanks benjjneb! I will do that right now.

zhilongjia · 2018-08-04T13:06:52Z

@jarrodscott how about the performance and speed of RaxML and RaxML-ng? Thank you.

ricmedveterinario · 2018-09-24T05:42:14Z

Hello,

I have the workflow after dada2:

#multiple-alignment
library("DECIPHER", lib.loc="~/R/win-library/3.4")
setwd ("C:/Users/AVELL/Desktop/2_fase")

seqs <- read.table(file = "seqs_2 _teste.txt")
seqs <- as.matrix(seqs)
names(seqs) <- seqs
alignment <- AlignSeqs(DNAStringSet(seqs), anchor=NA)

#Construct the phylogenetic tree
library("phangorn")

phang.align <- phyDat(as(alignment, "matrix"), type="DNA")
dm <- dist.ml(phang.align)
treeNJ <- NJ(dm)
treeNJ# Note, tip order != sequence order
fit = pml(treeNJ, data=phang.align)

But I have problems with the following error in FIT:

fit = pml(treeNJ, data=phang.align)

negative edges length changed to 0!
Error in pml(treeNJ, data = phang.align) : tip labels are not in data

Could someone please point me in the right direction to solve this.

Much appreciated.
Richard

benjjneb · 2018-09-25T01:24:57Z

seqs <- read.table(file = "seqs_2 _teste.txt")
seqs <- as.matrix(seqs)
names(seqs) <- seqs

This workflow is expecting seqs to be a character vector of the DNA sequences, not a matrix. You have read in a matrix it seems? You need to get seqs into the right format, and then it should work as expected.

…, see benjjneb/dada2#88 for details

Aircus · 2019-05-24T12:15:58Z

Thanks Ben, I've double checked my computer at uni and you're correct. The error was due to the software not being up to date. Thank you for the help.
…
On Sun, Mar 4, 2018 at 12:41 AM, Benjamin Callahan @.*** > wrote: @ndanckert https://github.com/ndanckert I was able to complete the command that is giving you trouble on my machine. That suggests this might be something solved by updating your libraries or R version. See my setup below. Are you using older versions of phangorn, ape or R? sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Sierra 10.12.6 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/ A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/ Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] phangorn_2.3.1 ape_5.0 loaded via a namespace (and not attached): [1] compiler_3.4.3 Matrix_1.2-12 magrittr_1.5 parallel_3.4.3 tools_3.4.3 igraph_1.1.2 yaml_2.1.15 [8] fastmatch_1.1-0 Rcpp_0.12.15 nlme_3.1-131 grid_3.4.3 pkgconfig_2.0.1 lattice_0.20-35 quadprog_1.5-5 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#88 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AjMVCaa_-S64jN_Xt8gn2AZbXoR_8ki8ks5tap2agaJpZM4JA42u .

hi!
How didnt you solve this problem? I have the same error now

Best regards!
Tong

khuch123 · 2019-09-06T16:30:00Z

@giriarteS
The workflow you mentioned produced these errors.
Any help please?

ChrisTrivedi · 2020-02-17T13:21:49Z

If you need longer taxon names you can adapt the constant #define nmlngth 256 in file axml.h appropriately. Check the lenght of your SV ID (sequence) and change that parameter in the axml.h file. I just gave a short name to each single variant (SV1.....SVn), I work with ITS and my sequence lengths are really variable. Below I describe my workflow:
seqtab.nochim1 <- seqtab.nochim
colnames(seqtab.nochim1) <- paste0("SV", 1:ncol(seqtab.nochim1))
Then added taxonomy to the original seqtab.nochim
taxtab <- assignTaxonomy(seqtab.nochim, ref_fasta_UNITE, minBoot = 50, tryRC = TRUE,
outputBootstraps = FALSE, taxLevels = c("Kingdom", "Phylum", "Class",
"Order", "Family", "Genus", "Species"), multithread = TRUE,
verbose = TRUE)
and change the names again (the order was the same as in seqtab.nochim)
taxtab1 <- taxtab
rownames(taxtab1) <- paste0("SV", 1:nrow(taxtab1))

The SV table and SV sequences were exported also:
export_taxa_table_and_seqs = function(seqtab.nochim, file_seqtab, file_seqs) {
seqtab.t = as.data.frame(t(seqtab.nochim))
seqs = row.names(seqtab.t)
row.names(seqtab.t) = paste0("SV", 1:nrow(seqtab.t))
outlist = list(data_loaded = seqtab.t)
mctoolsr::export_taxa_table(outlist, file_seqtab)
seqs = as.list(seqs)
seqinr::write.fasta(seqs, row.names(seqtab.t), file_seqs)
}
export_taxa_table_and_seqs(seqtab.nochim,"/path_to your_file/SV_table.txt",
"/path_to your_file/SV_seqs.fa")

Finally you can load your SV fasta file and align the sequences with DECIPHER:
fas <- "SV_seqs.fa"
seqs <- readDNAStringSet(fas)
seqs <- OrientNucleotides(seqs) ## I found duplicated SVs after I reoriented the sequences
alignment <- AlignSeqs(seqs, anchor=NA)

I exported my alignment to a server to run RAXML, it took 9 days for 5000 sequences.
phang.align <- phyDat(as(alignment, "matrix"), type="DNA")
write.phyDat(phang.align, file="alignment.fasta", format="fasta")

TS3alignment <- read.dna("alignment.fasta",format="fasta",as.matrix=TRUE)
alignment.rax.gtr <- raxml(myalignment,
m="GTRGAMMAIX", # model
f="a", # best tree and bootstrap
p=1234, # random number seed
x=2345, # random seed for rapid bootstrapping
N=100, # number of bootstrap replicates
file="alignment", # name of output files
exec="raxmlHPC-PTHREADS-SSE3", # name of executable
threads=20
)

Then you incorporate everything to your phyloseq object:
tree <- read_tree("RAxML_bipartitionsBranchLabels.alignment")
metadata_path <- "Metadata.csv"
samdf <- read.csv(metadata_path, header=TRUE)
all(rownames(seqtab.nochim1) %in% samdf$SampleID)
rownames(samdf) <- samdf$SampleID

ps <- phyloseq(tax_table(taxtab1), sample_data(samdf),otu_table(seqtab.nochim1, taxa_are_rows = FALSE), phy_tree(tree))
saveRDS(ps, file="ps.rds")

This workflow works well, if any of the steps seen redundant or if anybody has suggestions on how to improve it, please let me know.

Hi all, apologies for posting on a closed topic, but I'm having a few issues with incorporating @giriarteS 's workflow. I'm using it for 16S/18S/ITS which have >5000 seqs to align. The one thing I don't want to do (especially for 16/18S) is rename my sequences to SV1...

I tried to skip those portions of the workflow by using the normal seqtabl.nochim and assign taxonomy instructions by B. Callahan - then going directly into the DECIPHER alignment:

tree.sequences <- getSequences(seqtab.nochim)
names(tree.sequences) <- tree.sequences  # this propagates to the tip labels of the tree
tree.alignment <- AlignSeqs(DNAStringSet(tree.sequences), anchor=NA)
phang_align <- phyDat(as(tree.alignment, 'matrix'), type='DNA')
write.phyDat(phang_align, file="alignment.fasta", format="fasta")
16S_alignment <- read.dna("alignment2.fasta",format="fasta",as.matrix=TRUE)

but I get an error from the Ape package (I'm assuming), which says Error: unexpected symbol in "16S_alignment" which I can't seem to track.

Additionally, I tried to skip phangorn altogether by exporting the alignment from DECIPHER and reimporting via Ape:

tree.sequences <- getSequences(seqtab.nochim)
names(tree.sequences) <- tree.sequences  # this propagates to the tip labels of the tree
tree.alignment <- AlignSeqs(DNAStringSet(tree.sequences), anchor=NA)
writeXStringSet(tree.alignment, file="alignment2.fasta")
16S_alignment <- read.dna("alignment2.fasta",format="fasta",as.matrix=TRUE)

But I was also unsuccessful. Any help would be appreciated. Not quite sure what I'm doing wrong, and maybe this is a better question for a DECIPHER/Ape thread.

ZuaniaColon · 2023-02-08T15:47:57Z

Hi, is there a way to export and read the tree "alignment.rax.gtr"? Because of time constrains, I would like to run two different codes. I saw that you read it with

tree <- read_tree("RAxML_bipartitionsBranchLabels.alignment"),

but I cannot follow when and how you saved it.

Thanks,

benjjneb closed this as completed Nov 3, 2016

benjjneb mentioned this issue May 18, 2017

do we expect step of constructing phen.tree take too long? spholmes/F1000_workflow#17

Open

spholmes mentioned this issue Jun 26, 2017

Problem with the F1000 workflow #272

Closed

benjjneb mentioned this issue Sep 21, 2018

Construct phylogenetic tree error #567

Closed

cwshin mentioned this issue Jan 8, 2019

Prevalence filtering issue spholmes/F1000_workflow#29

Open

benjjneb mentioned this issue Jan 23, 2019

Assign Taxonomy using DECIPHER #658

Closed

mattjmeier added a commit to mattjmeier/ECCC that referenced this issue Feb 7, 2019

Updated DADA2 processing script, switching to raxml for tree building…

852a66a

…, see benjjneb/dada2#88 for details

benjjneb mentioned this issue Sep 5, 2019

removeBimeraDenovo: System Crashes #833

Closed

csmiguel mentioned this issue May 25, 2021

recommended software/parameters for tree reconstruction #1348

Closed

rjhuang5 mentioned this issue Sep 14, 2021

Construct phylogenetic tree - max number sequences? #1410

Closed

phylogenetic tree construction in DADA2? #88

phylogenetic tree construction in DADA2? #88

Comments

FabianRoger commented Jun 29, 2016

benjjneb commented Jun 29, 2016

FabianRoger commented Jul 1, 2016

spholmes commented Jul 1, 2016

benjjneb commented Nov 3, 2016

giriarteS commented Mar 9, 2017

benjjneb commented Mar 9, 2017

joey711 commented Mar 9, 2017

giriarteS commented Mar 10, 2017

josemseoane commented Mar 29, 2017

giriarteS commented Mar 29, 2017

josemseoane commented Mar 30, 2017

jarrodscott commented Jun 10, 2017

benjjneb commented Jun 12, 2017

jarrodscott commented Jun 12, 2017 via email

rachaelantwis commented Feb 16, 2018

benjjneb commented Feb 16, 2018

rachaelantwis commented Feb 22, 2018 • edited Loading

benjjneb commented Feb 22, 2018 • edited Loading

rachaelantwis commented Feb 22, 2018 • edited Loading

ndanckert commented Feb 27, 2018 • edited Loading

benjjneb commented Feb 28, 2018

ndanckert commented Feb 28, 2018 via email

benjjneb commented Feb 28, 2018

benjjneb commented Mar 3, 2018

ndanckert commented Mar 6, 2018 via email

akalichen commented Apr 16, 2018

akalichen commented Apr 16, 2018

benjjneb commented Apr 17, 2018

akalichen commented Apr 17, 2018

zhilongjia commented Aug 4, 2018

ricmedveterinario commented Sep 24, 2018

benjjneb commented Sep 25, 2018

Aircus commented May 24, 2019

khuch123 commented Sep 6, 2019

ChrisTrivedi commented Feb 17, 2020

ZuaniaColon commented Feb 8, 2023

rachaelantwis commented Feb 22, 2018 •

edited

Loading

benjjneb commented Feb 22, 2018 •

edited

Loading

rachaelantwis commented Feb 22, 2018 •

edited

Loading

ndanckert commented Feb 27, 2018 •

edited

Loading