-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phylogenetic tree construction in DADA2? #88
Comments
Hi Fabian, There is no phylogeny construction in the dada2 package, and none is planned. There are tools for making phylogenetic trees already in R though, and these can be used in a fairly straightforward fashion. For an example of this, see our recent workflow paper: http://f1000research.com/articles/5-1492/v1 (Construct the phylogenetic tree section). For more details on tree building, the phangorn package is the relevant R tree-building package. It is worth noting that the FastTree algorithm is not, to my knowledge, available in R. For building trees on large datasets (thousands of sequences or more) it may be necessary to use an external application. |
thanks a lot, your workflow paper is super helpful! It is worth noting though that the function used for alignment However of cause I know that time is not the only criterion, but for an amplicon workflow it is maybe worth looking into? thanks for all your help! Fabian |
Thanks a lot for the pointer, when we do another revision for the workflow On Fri, Jul 1, 2016 at 4:31 AM, FabianRoger notifications@github.com
Susan Holmes |
The revised version of the F1000 workflow (https://f1000research.com/articles/5-1492/v2) now uses DECIPHER for the multiple alignment (thanks for suggestion @FabianRoger), and phangorn for phylogenetic construction. We remain interested in other options within the R universe for phylogenetic construction, but still do not plan to implement it within the dada2 package. |
I used DECIPHER to align ITS sequences (5542 sequences) and then tried to construct the tree with phangorn but it was too slow. I finally constructed a tree with raxml: |
We should consider revisiting our recommendations here. It is definitely not uncommon for datasets to be too large for phangorn to work in reasonable amounts of time. |
I agree. I usually assume with modern datasets that I need to use something like |
I forgot to mention that I used the raxml funtion in the the ips package(v 0.0.7). |
Hello, Thanks a lot for the workflow, is really great and I have learned a lot going through it! Thanks a lot! Jose
WARNING: The number of threads is currently set to 0 RAxML, will now set the number of threads automatically to 2 ! Taxon Name too long at taxon 1, adapt constant nmlngth in |
If you need longer taxon names you can adapt the constant #define nmlngth 256 in file axml.h appropriately. Check the lenght of your SV ID (sequence) and change that parameter in the axml.h file. I just gave a short name to each single variant (SV1.....SVn), I work with ITS and my sequence lengths are really variable. Below I describe my workflow: The SV table and SV sequences were exported also: Finally you can load your SV fasta file and align the sequences with DECIPHER: I exported my alignment to a server to run RAXML, it took 9 days for 5000 sequences. TS3alignment <- read.dna("alignment.fasta",format="fasta",as.matrix=TRUE) Then you incorporate everything to your phyloseq object: ps <- phyloseq(tax_table(taxtab1), sample_data(samdf),otu_table(seqtab.nochim1, taxa_are_rows = FALSE), phy_tree(tree)) This workflow works well, if any of the steps seen redundant or if anybody has suggestions on how to improve it, please let me know. |
Thanks a lot @giriarteS, that was a smart way of solving the lenght issue, it worked out perfectly well! |
anyone try 'RAxML next generation' for tree construction? |
@jjscarpa I haven't tried it, but would also be interested in results from those who have! |
I am running both RaxML and RaxML-ng on the same data set now. Happy to
share results when the jobs are done and I analyze results. Curious also if
others have tried this
…On Monday, June 12, 2017, Benjamin Callahan ***@***.***> wrote:
@jjscarpa <https://github.com/jjscarpa> I haven't tried it, but would
also be interested in results from those who have!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#88 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/APcStrv5udQ3xN71RFh6d0bADhgkEwCPks5sDV1MgaJpZM4JA42u>
.
|
Hi all, #Make the phylogenetic tree library(phangorn) #negative edges length changed to 0! fitGTR <- update(fit, k=4, inv=0.2) #EXPORT TO PHYLOSEQ AND MERGE WITH SAMPLE DATA (replace "..." with path name of sample data, i.e. metadata)
And I get the following error: If I remove the tree component this isn't a problem. Does anyone know what's going wrong? |
I think if you remove the
|
Thanks - I still get the error message: Error in validObject(.Object) : invalid class “phyloseq” object: When I try:
This is what I get when I inspect fitGTR:
loglikelihood: -140407 unconstrained loglikelihood: -1712.417 Rate matrix: Base frequencies: I'm assuming taxa_names should be populated in some way? |
Oh... the problem I think is that you have renamed the taxa as "SV" etc. in I recommend leaving the sequences as the names until you get everything into the phyloseq object, then use |
Ah thanks, this is working now! Code as follows (for the next person):
#library(phangorn) #negative edges length changed to 0!
ps <- phyloseq(tax_table(tt.plus), sample_data(sample.merge), otu_table(seqtab.nochim, taxa_are_rows = FALSE), phy_tree(fitGTR$tree))
|
Hi, I am also having trouble with this section of the DADA2 script and am relatively new at this work. I am processing my 16s sequences through the DICIPHER / Phangorn pipeline from your published workflow.
After the 'fitGTR <-optim.pml(fitGTR, model="GTR", optInv=TRUE, optGamma=TRUE, rearrangement = "stochastic", control = pml.control(trace = 0))' command I receive the following error:
Could someone please point me in the right direction to solve this. Much appreciated. |
Hm.... There were no errors or warnings in the alignment or NJ tree code before this? I'm not sure on this one, this is an error in phangorn code so I'm not an expert here, but I would take a look if you can share the |
Thanks Benjamin.
The only warnings I am getting are as follows
"Warning messages:
1: package ‘phangorn’ was built under R version 3.3.3
2: package ‘ape’ was built under R version 3.3.3
Please find the file attached.
Thanks,
Nathan
…On Wed, Feb 28, 2018 at 2:02 PM, Benjamin Callahan ***@***.*** > wrote:
Hm....
There were no errors or warnings in the alignment or NJ tree code before
this?
I'm not sure on this one, this is an error in phangorn code so I'm not an
expert here, but I would take a look if you can share the fitGTR object
causing the problem, i.e. saveRDS(fitGTR, "fitGTR.rds") right before the
error-causing command, and post that here.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#88 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AjMVCdf9gXIWm3TaopzRChhF3v1y1E3bks5tZMG4gaJpZM4JA42u>
.
|
I don't see it attached, might be too big. You can email me: benjamin DOT j DOT callahan AT gmail DOT com |
I was able to complete the command that is giving you trouble on my machine. That suggests this might be something solved by updating your libraries or R version. See my setup below. Are you using older versions of
|
Thanks Ben,
I've double checked my computer at uni and you're correct. The error was
due to the software not being up to date.
Thank you for the help.
…On Sun, Mar 4, 2018 at 12:41 AM, Benjamin Callahan ***@***.*** > wrote:
@ndanckert <https://github.com/ndanckert>
I was able to complete the command that is giving you trouble on my
machine. That suggests this might be something solved by updating your
libraries or R version. See my setup below. Are you using older versions of
phangorn, ape or R?
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/
A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/
Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] phangorn_2.3.1 ape_5.0
loaded via a namespace (and not attached):
[1] compiler_3.4.3 Matrix_1.2-12 magrittr_1.5 parallel_3.4.3 tools_3.4.3
igraph_1.1.2 yaml_2.1.15
[8] fastmatch_1.1-0 Rcpp_0.12.15 nlme_3.1-131 grid_3.4.3 pkgconfig_2.0.1
lattice_0.20-35 quadprog_1.5-5
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#88 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AjMVCaa_-S64jN_Xt8gn2AZbXoR_8ki8ks5tap2agaJpZM4JA42u>
.
|
Hello, I have also run into a similar problem with this revised workflow "http://web.stanford.edu/class/bios221/MicrobiomeWorkflowII.html".
I get error message like this. "NA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive value" |
my Rstudio and my packages were uptodate. |
@akalichen Can you repost this issue at the GitHub issues page for the workflow itself? https://github.com/spholmes/F1000_workflow/issues This error isn't really related to the dada2 package. |
Thanks benjjneb! I will do that right now. |
@jarrodscott how about the performance and speed of RaxML and RaxML-ng? Thank you. |
Hello, I have the workflow after dada2: #multiple-alignment seqs <- read.table(file = "seqs_2 _teste.txt") #Construct the phylogenetic tree phang.align <- phyDat(as(alignment, "matrix"), type="DNA") But I have problems with the following error in FIT:
negative edges length changed to 0! Could someone please point me in the right direction to solve this. Much appreciated. |
This workflow is expecting |
hi! Best regards! |
@giriarteS |
Hi all, apologies for posting on a closed topic, but I'm having a few issues with incorporating @giriarteS 's workflow. I'm using it for 16S/18S/ITS which have >5000 seqs to align. The one thing I don't want to do (especially for 16/18S) is rename my sequences to SV1... I tried to skip those portions of the workflow by using the normal
but I get an error from the Ape package (I'm assuming), which says Additionally, I tried to skip phangorn altogether by exporting the alignment from DECIPHER and reimporting via Ape:
But I was also unsuccessful. Any help would be appreciated. Not quite sure what I'm doing wrong, and maybe this is a better question for a DECIPHER/Ape thread. |
Hi, is there a way to export and read the tree "alignment.rax.gtr"? Because of time constrains, I would like to run two different codes. I saw that you read it with tree <- read_tree("RAxML_bipartitionsBranchLabels.alignment"), but I cannot follow when and how you saved it. Thanks, |
hej,
I was wondering if there is (did I miss it?) - or if you plan to add - the possibility to construct a phylogeny within the DADA2 pipeline? I was thinking something like Fasttree.
Along that line I was also thinking if you had any thoughts on how the fact that the sequence resolution of DADA2 is higher than the 3% OTU threshold might interfere with the construction of a tree. I'm think wether it would be a problem if two close sequences both matched the same sequence in the reference database? Should then both be assigned to the same tip and their abundances summed? Or maybe they would simply be split into two tips with distance 0 as it might be done already?
best
Fabian
The text was updated successfully, but these errors were encountered: