Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating workflows into a single workflow #17

Open
4 of 10 tasks
mahesh-panchal opened this issue Apr 8, 2020 · 4 comments
Open
4 of 10 tasks

Integrating workflows into a single workflow #17

mahesh-panchal opened this issue Apr 8, 2020 · 4 comments

Comments

@mahesh-panchal
Copy link
Collaborator

mahesh-panchal commented Apr 8, 2020

It would be helpful if all stages of genome annotation could be run in a single workflow.

@Juke34 Provided this diagram of how the single workflow might look like.
Single workflow concept

With the introduction of Nextflow modules, each stage can be built as independent workflows and
then imported into a single workflow.

e.g. (note: modules must be locally available).

include foo from './some/module'

workflow {
    data = Channel.fromPath('/some/data/*.txt')
    foo(data)
}

Part of the suggested workflow proposes to include a part from another workflow:
https://github.com/ikmb-denbi/genome-annotation

A collaboration has been suggested, such that we just need to include the proposed part(s) into the single workflow. This requires the ikmb-denbi workflow to be converted to DSL2 and modularized. Together, we need to ensure interoperability between modules. The NBIS annotation workflows are almost complete to be used as modules.

Tasks:

Should training genemark be included into the Abinitio training pipeline?

Which pipeline should the agat_sp_complement_annotations process go into?

Trinity will be added to the transcript assembly pipeline.

@mahesh-panchal
Copy link
Collaborator Author

The include statement can be included in config files rather than hard coded into the code. This potentially allows for profile specific include statements. This needs to be tested.

An alternative would be to use a workflow parameter to provide the module path.

@mahesh-panchal
Copy link
Collaborator Author

Marcs (IKMB) cleaned up code -> DSL2:
https://github.com/ikmb/esga

@mahesh-panchal
Copy link
Collaborator Author

Just to clarify what's in Marc's modules:

  • assembly: assembly stats, and assembly filtering.
  • augustus: takes input from Pasa, and trains Augustus with it.
  • evm: runs evm
  • pasa: includes two workflows for fast and standard. Runs Pasa and creates models
  • proteins: Produces annotation hints from protein sequences with Diamondx and Exonerate.
  • repeatmasker: Runs repeat masker.
  • repeatmodeler: Runs repeat modeler.
  • rnaseq: Maps and QC's reads to a reference with Hisat2 and fastp
  • transcripts: Generate hints from Ests
  • trinity: Genome guided assembly using trinity.
  • fasta.nf: Some fasta utilities.
  • util.nf: Some gff utilities.

If some one with a bit more knowledge could also annotate the new module suggestions with what could be reused from here, it would be appreciated.

@mahesh-panchal
Copy link
Collaborator Author

nf-core has a #denovohybrid channel where they are building a de novo genome assembly and annotation workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant