Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify reference taxonomy files (e.g., %OTU ID) used for annotation of expected-taxonomy.tsv files #22

Closed
nbokulich opened this issue May 19, 2016 · 0 comments

Comments

@nbokulich
Copy link
Contributor

Expected composition (expected-taxonomy.tsv) files need not only match the database and version, but the exact ref taxonomy file that is used for taxonomy assignment of observed data. In other words, if using 97 OTUs for taxonomy assignment, a 97 OTUs expected taxonomy file must be generated (that's what we have now). If 99 OTUs, 99 OTU expected taxonomy, etc.

Perhaps we should include this information somewhere. Any ideas how/where to do this? Perhaps changing the directory structure to:
database-name/version/OTU%
or
database-name/version-OTU%

One issue with specifying this in the directory name is 1) the name can be ambiguous (e.g., "97" is not very specific) and 2) OTU %ID may not be the only difference between file types (e.g,. if using a curated subset of reference seqs), and is marker-gene ref db specific, e.g., does not apply to metagenome ref dbs. We will need to be very descriptive (e.g., "97-otus" instead of "97") for filenames or perhaps add a README file to the directory? READMEs could get cumbersome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant