Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change otu.table() sequence header to numbered OTUs with taxonomy #1030

Closed
RJ333 opened this issue Oct 31, 2018 · 4 comments
Closed

change otu.table() sequence header to numbered OTUs with taxonomy #1030

RJ333 opened this issue Oct 31, 2018 · 4 comments

Comments

@RJ333
Copy link

RJ333 commented Oct 31, 2018

Hello everyone,

I recently got into dada2 and phyloseq using this workflow: https://f1000research.com/articles/5-1492/v2
I followed it until all prepared data was combined into a phyloseq object. I provided an example with data from the mentioned article at stackoverflow:

https://stackoverflow.com/questions/53032504/combine-otu-and-tax-table-and-replace-actual-sequences-with-otu-ids-phyloseq-da

Now I'm wondering how I may get an otu.table(ps), that does not have the actual sequences as headers? I would like to combine the information from the taxonomy table and from the OTU table.
but so far no one could help and I thought this might be a more appropriate place to ask for help. I try to create a format as this:

        full_taxonomy_OTU00001  full_taxonomy_OTU00002   full_taxonomy_OTU00003
F3D0    counts                  counts                  counts
F3D1    counts                  counts                  counts
F3D11   counts                  counts                  counts
F3D125  counts                  counts                  counts

I assume I'm just not finding an obvious function for this? I've looked around quite a bit but couldn't find something specific that was helpful.

Thank you very much

René

@jjrahn
Copy link

jjrahn commented Nov 1, 2018

I am having the same problem. The OTUs are labeled as the full sequence which makes doing any plots that utilize the OTU names impossible for the computer to display.

@spholmes
Copy link
Contributor

spholmes commented Nov 2, 2018 via email

@RJ333
Copy link
Author

RJ333 commented Nov 6, 2018

Thank you for your comment and apologies for the late reply. I crosslinked this post to my question on stackoverflow.com. But I'm still unable to add the taxonomy within the header (which is now Seq_001, Seq_002 etc). I'll try to find a way for that.

René

@RJ333
Copy link
Author

RJ333 commented Nov 6, 2018

I think a found a (still dirty looking) way to achieve my goal. I will show the code here, more is explained on

https://stackoverflow.com/questions/53032504/combine-otu-and-tax-table-and-replace-actual-sequences-with-otu-ids-phyloseq-da

# this changes the header from the actual sequence to Seq_001, Seq_002 etc
taxa_names(ps)
n_seqs <- seq(ntaxa(ps))
len_n_seqs <- nchar(max(n_seqs))
taxa_names(ps) <- paste("Seq", formatC(n_seqs, 
                                            width = len_n_seqs, 
                                            flag = "0"), sep = "_")
taxa_names(ps)

A possible way to get taxonomy included into the header is the following (continuing from above):

# generate a vector containing the full taxonomy path for all OTUs
wholetax <- do.call(paste, c(as.data.frame(tax_table(ps))
                  [c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus")], 
                  sep = "__"))  # to distinguish from "_" within tax ranks

# turn the otu_table into a data.frame
otu_export <- as.data.frame(otu_table(ps))
tmp <- names(otu_export)

# paste wholetax and OTU_ids together
for(i in 1:length(tmp)){
names(tmp)[i] = paste(wholetax[i], tmp[i], sep = "__")
}

# overwrite old names
names(otu_export) <- names(tmp)

> head(otu_export)[5]

# output:  
     Bacteria__Bacteroidetes__Bacteroidia__Bacteroidales__Bacteroidaceae__Bacteroides__Seq_005
F3D0                                                                                         146
F3D1                                                                                         126
F3D11                                                                                        496
F3D125                                                                                       440
F3D13                                                                                       1183
F3D141                                                                                       184

maybe it is possible to include a function that performs the above in a clean way, within the phyloseq-structure? It might also be possible to provide the wholetaxvector to the taxa_names() command, but I haven't tested that and I'm not sure whether this is desirable for further downstream analysis?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants