Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added relevant COI mock community info and rep seqs #92

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

devonorourke
Copy link

Hi mockrobiota folk,
I've added the fasta file for the mock COI dataset I've used in a few bat guano related projects. Though I don't have a publication to link these data to at the moment, @nbokulich is on the forthcoming paper that describes their use. Reads are dumped as BioSamples via NCBI and I've provided a link in the README.md file for users to access.
Please let me know what other information you'd like me to add.
Cheers

@nbokulich
Copy link
Contributor

Thanks @devonorourke !

It looks like the tests failed; could you please fix those and then I can review once tests path? The error suggests that the dataset metadata file's header line is space delimited not tab delimited.

@devonorourke
Copy link
Author

Sorry; I fixed the dataset-metadata.tsv file so that it was tab-delimited.
Should be okay now

Copy link
Contributor

@nbokulich nbokulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @devonorourke ! Just a couple minor comments and a request.

In addition to the source formats (which can be provided as-is), would it be possible to provide expected taxonomy files?

  1. See here for an example file
  2. the directory structure should be .../mock-29/<database-name>/<database-version-or-download-date-MMDDYYYY>/<OTU-cluster-percent>/
  3. The taxonomy file will contain taxon names (as row names) that match valid taxa in the reference database/version/otu% that you used. ideally these should be formatted for use with QIIME 2 (e.g., semicolon-delimited
  4. The "database identifier" file is a list of reference database identifiers that match the expected taxon names
  5. If you base this off of a custom database, just make sure the database is available on github, zenodo, or elsewhere (I think this is what you are already doing with your databases, correct?), and make sure it is all well documented (e.g., you can link to a github repo with code describing how the database was made)
  6. Note, a long time ago I put together some shoddy untested code for automatically generating the expected taxonomy files. Specifically, you want this.

The expected taxonomy files are not required at submission, so if this is too much to ask right now that is fine.

Thanks!

@@ -0,0 +1,12 @@
# mock-coi1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call this mock-29 (to keep consistent)

Note:
The mock sample described above was sequenced in conjunction with hundreds of bat guano samples in a single MiSeq run. All data are availble as BioSamples [here at NCBI](https://www.ncbi.nlm.nih.gov/bioproject/518082). Individual sequence data specific to the mock sample are found in the `dataset-metadata.tsv` document.

These reads contain dual-index barcodes modeled after the Schloss lab [workflow described here](https://github.com/SchlossLab/MiSeq_WetLab_SOP/blob/master/MiSeq_WetLab_SOP.md). Reads were processed in QIIME2 as described in [this GitHub repo](https://github.com/devonorourke/tidybug/blob/master/docs/sequence_filtering.md#raw-sequence-data-processing).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be useful to provide a snippet of code showing how to import these reads into QIIME 2 (note that dual-index barcode support is now available in QIIME 2!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants