Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing unclassified metagenomics contigs #795

Closed
wants to merge 4 commits into from
Closed

Conversation

yesimon
Copy link
Contributor

@yesimon yesimon commented Feb 27, 2018

Assemble via trinity

Run the following analysis on assembled contigs

  • prodigal orf -> rpsblast on CDD database.
  • infernal on RNA database.
  • blastn
  • blastx

Assemble via trinity

Run the following analysis on assembled contigs

* prodigal orf -> rpsblast on CDD database.
* infernal on RNA database.
* blastn
* blastx
@yesimon
Copy link
Contributor Author

yesimon commented Feb 27, 2018

Let's address with #795 with this.

@yesimon yesimon force-pushed the sy-cdd branch 2 times, most recently from 9188175 to 583fd84 Compare March 1, 2018 06:05
@dpark01
Copy link
Member

dpark01 commented Mar 1, 2018

Instead of the self-referencing #795, did you mean to say #784?

@dpark01
Copy link
Member

dpark01 commented Mar 1, 2018

Two comments:

  1. Does this need to be a PR right now? PRs should be entirely unnecessary for dev purposes until you're sure it's ready to go (including tests passing).
  2. Instead of pulling from NCBI FTP for nr, switch to a standard WDL Input file. We already know that production at-scale dependencies on NCBI can cause all sorts of non-deterministic failures (which is why we comment out most of the NCBI-based unit tests from Travis). Additionally, it means we lose control of database versioning if we pull straight from the source. Also, using the normal Input mechanism allows for underlying platform optimizations (such as DNAnexus's piped input file) to reduce staging time. Use a .tar.lz4 for input.

@yesimon
Copy link
Contributor Author

yesimon commented Mar 1, 2018

Yeah 784. Maybe the best would be a config switch between downloading from NCBI versus downloading a controlled version, with the controlled versioned db as the default.

@dpark01
Copy link
Member

dpark01 commented Mar 1, 2018

Is there any reason to prefer an uncontrolled version? On the snakemake side the general pattern is to have a separate rule to download, on the WDL side, maybe there could be a dedicated Task just for a one-off download-and-tar-up job... I certainly think that hitting their FTP server in parallel isn't going to bode well...

@tomkinsc
Copy link
Member

tomkinsc commented Jun 7, 2018

@yesimon: Just wanted to check in on this PR. Do you have work in mind before it's ready to merge? (Apart from correcting the DNAnexus file- ID mentioned in the most recent Travis failure)

@dpark01
Copy link
Member

dpark01 commented Jun 7, 2018

does this replace #737 (should we close that one?)

@dpark01 dpark01 closed this Jan 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants