-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bactopia v2 Overview #233
Comments
This is a Work-In-Progress, and evolving. Please feel free to provide any feedback and suggestions. |
@rpetit3 I completely agree with the design pattern that you're aiming to implement. Creating specific workflows for some genus makes complete sense and could be easily implemented with DSL2. Let me know if i can make myself useful on those tasks! |
Regarding the inclusion of outlined modules into nf-core, @Mxrcon and I are happy to take this forward as discussed on slack. I'm dividing these into two groups, for initiating a baseline draft module (we'll be collaborating extensively of course)
P.S. @rpetit3 , if possible could you please add us as collaborators, so that we could start making use of https://github.com/bactopia/bactopia/projects/1 as well? |
Looking forward to Bactopia v2. |
@rpetit3 , I need a bit of guidance here regarding the contributions to nf-core modules.
|
1. Which sub-commands exactly do we need from the tools mentioned here Bactopia v2 Overview #233 (comment) . This is necessary to understand since the structure of modules is TOOL/SUB_COMMAND I'll get this put together, but I think most of them don't have subcommands 2. The nf-core/modules generally rely on https://github.com/nf-core/test-datasets/ repo, which only has human and sars-cov datasets, we might need to decide which bacterial dataset we want to add to test these modules. I'm thinking E. Coli by default, but happy to hear your thoughts. I noticed that as well. To get around this I created https://github.com/bactopia/bactopia-tests which is modeled after nf-core's test-datasets, just customized for bactopia. For these tests I'm using Candidatus Portiera aleyrodidarum which only has a genome size of 350kb. Here's some more info https://github.com/bactopia/bactopia/tree/dsl2/tests I've also modeled the tests.config after nf-core/modules (glad I was there for that hackathon!). The small size genome allows me to test functionality quickly, but I think more importantly with limited resources (my desktop, and eventually GitHub Actions). Now, I think eventually, another set of genomes (multiple organisms, E coli being one) will be needed as a validation set. For these it will be to make Bactopia is still given the same results, but also larger genomes are not producing errors we wouldn't have seen using a small genome. I think this would be a pre-release test set, but again something we could ideally get to fit on GitHub Action (not sure that'll be possible, unless the memory is increased >7gb on the Linux instance) |
Small update. I've been doing some cleanup and integrating more nf-core inspirations into v2. The clean up is to make things easier to maintain, and nf-core is because I think they are great practices to follow and by including them now would make a potential nf-core transition for Bactopia easier.
|
Small update, I took the In doing so, this provides:
It also keeps Bactopia on track for maintaining convergent evolution with nf-core practices, which would make any potential transitions easier.
Bactopia related additions
More Examples
I think at this point I'm going to get a basic workflow without any datasets, then work on cleaning up the Dataset imports and validations Also, Bactopia has a lot of parameters! |
V2 is getting super close! I've linked the datasets into the DSL2 framework (also fixed an issue with the cache when using
I realize a lot of the code is hidden in the modules, but this is the Bactopia Workflow in its entirety. Its so much cleaner, and much easier to work with now
Next I will implementment a species-specific workflow (e.g. Staphopia). Then I think we'll be to the point that we can tidy up and start prepping for the V2 release |
Small update - I've incorporated nf-core/modules usage of the I updated the tests for this change, and I've also implemented this in GitHub Actions. So we can now quickly run all the tests that we're making |
I'm also finding come cases where the SARS-CoV-2 test data doesn't work well with bacterial tools. To address this I've submitted a PR (nf-core/test-datasets#344) to add some basic bacterial test-data to the |
Small update: 11 of 18 bactopia tools have either been merged into I'm hoping to get a few more PRs open tomorrow. |
Significant update - Bactopia now officially supports Nanopore reads! Nanopore supportI have add QC and Assembly support for Nanopore reads QC is done:
And assembly with done with Dragonflye Adopting
|
I could use your opinion. In Bactopia v2, you will be able create custom workflows. For example:
You can basically include any available bactopia tools. However, I think some should be excluded and only available independently (e.g. GTDB, eggnog, pirate, roary, etc...), due to runtimes or input DB size requirments. I'm curious how you would like to do something like this. Some ideas I have are: Command-line
Automated for species (can be disabled)
Config file
Command to generate static workflowsEach of the above examples are dynamic and have many of moving parts that can cause issues. An alternative would be to create a bactopia command ( I'm open to pretty much anything, so please feel free to toss out ideas. Tagging a few folks who have given feedback in the past (please feel free to unsubscribe to notifications!) @embatty @lskatz @kusandeep @simone-pignotti @tauqeer9 @uloeber @marcelladane @haruosuz |
The command line seems perfect for new users and seems to be a nice path to follow in the development, but I think that this might be a problem to maintain as every new tool will require a new parameter. a tool for generating new workflows seems awesome but for specific customization It'll require more straightforward users, I'll certainly use it to develop custom wf's. You think that automated use of the bactopia tools will be a standard on the V2? This seems to be a good addition as the v2 is moving in the direction of species specific workflows. |
Thank you for the feedback @Mxrcon! You think that automated use of the bactopia tools will be a standard on the V2? This seems to be a good addition as the v2 is moving in the direction of species specific workflows. I think yes, but we'll have to strike a balance. There are some tools (e.g. But if certain species-specific tools have significant runtime or resource requirements, then I think those should be executed separately. |
First of all, congrats on all the awesome improvements! Concerning species-specific workflows that are automatically generated by the species param, it's certainly nice to have but I personally wouldn't mind adding the specific tools I need in a config file.
For sure I would prefer anything that takes more than a few minutes per sample to NOT be automatically added to the workflow! |
@simone-pignotti thanks as always for the feedback. I think I'm starting to be convinced on the idea of custom workflows generated by config files, because it kind of creates a pathway for a curated set of user provided workflow configs. For the power users, I'm thinking in the short term I'll create documentation for the inputs and outputs of each module, as well as make some examples of adding 'custom' modules. Most advanced Nextflow users could probably use this as a baseline and get started pretty quickly hacking away. |
I hadn't thought of it this way, but this may actually be the best reason to opt for the config file solution! For the custom pipelines, is there an easy way to just add bactopia as process inside a more complex workflow? |
That's a great question @simone-pignotti, I think that since Bactopia V2 is based on Nextflow DSL2, it should be doable. Though given the amount of configs we use, it might be worth testing this officially and then having a showcase extension workflow to guide the community a bit. Nested params are still somewhat problematic with NF i.e. |
@simone-pignotti totally agree with @abhi18av 's points I think for the initial v2 release we might not be quite ready for direct import of bactopia modules (eg. |
Progress Update - I have rough framework for implementing bactopia-tools into the main bactopia workflow (e.g. custom workflows). One thing I want to make sure of is that, whether the program is executing as a Bactopia Tool or as a part of the main bactopia workflow, the subworkflow script would be the same (e.g. we don't need to have separate scripts for each). Here's an overview Current Framework
|
I think we are getting super close, to being ready for v2 release. I was able to submit a few more PRs to nf-core/modules ( I've converted almost all of the v1 Bactopia Tools to DSL2. It's a very manual process at the moment, but not too tedious and easy to work with. But I am grouping the Bactopia Tools into two groups:
There is a parameter
I'll have to document this process, but I think for power users, it should be pretty straight forward. I've set in place a path to have the documentation autogenerated from |
This is awesome - I am learning a lot from the DSl1 -> DSL2 (+ nf-core modules) in Bactopia for some other workflows I've written. Thanks for sharing the updates @rpetit3! 😊 |
Some of these aren't quite ready for V2, or need v2 before they can be accomplished. For now I'm going to put them here, and start a project board to better capture these Implement
|
Happy to report v2 has been released https://github.com/bactopia/bactopia/releases/tag/v2.0.0 Thank you very much @Mxrcon and @abhi18av for your help, and everyone for your feedback! Super excited to see where Bactopia goes from here! |
Bactopia v2 Overview
With tremendous effort by @Mxrcon and @abhi18av, the foundation for migrating Bactopia to DSL2 has been laid out. This transition represents the key milestone to push Bactopia to v2! (Super excited about this Davi and Abhinav!)
By switching to DSL2, the door for creating custom Bactopia workflows has been opened. For example, let's say you have some Staphylococcus aureus samples, and you want to run Bactopia and then the Bactopia Tool staph-typer. Instead with DSL2, we can create a sub-workflow (e.g. Staphopia) that will automatically run Bactopia and staph-typer. In other words, we can start creating organism-specific sub-workflows, as well as sub-workflows that only include certain steps such as assembly.
I think this also a good time to start cleaning up some things and adding features that will make long-term maintenance more sustainable.
House Cleaning
These are to help reduce the burden required to maintain Bactopia long-term. These are really about standardizing things in such a way that we can automate things. For example, printing usage across each of the workflows can be configured through config files (e.g. nf-core json schema. There are also a lot of shared functions for checking inputs, creating channels, etc. These duplications are no longer necessary in DSL2.
Automate Version and Citation tracking
bactopia citations
bactopia versions
(now handled byversions.yml
)Reduce code duplication
lib/*
)lib/*
)Organize DSL2 structure
Fixes
Additional Features
--skip_compression
disables this featuregnl|
Implement
pytest
for testingI'd like to create a suite of tests that are operated by
pytest
andpytest-workflows
. Thenf-core/modules
team has a framework that can be extended to Bactopia.annotate_genome
antimicrobial_resistance
ariba_analysis
assemble_genome
assembly_qc
blast
call_variants
(merged intocount_31mers
minmer_sketch
)(merged intodownload_references
call_variants
)(merged intoestimate_genome_size
gather_samples
)(merged intofastq_status
gather_samples
)gather_samples
mapping_query
minmer_query
minmer_sketch
qc_reads
sequence_type
agrvate
bakta
ectyper
emmtyper
eggnog
fastani
hicap
ismapper
kleborate
lissero
mashtree
meningotype
ngmaster
pangenome
seqsero2
spatyper
staph-typer
staphopiasccmec
tbprofiler
Convert some processes to
nf-core/modules
There are a few tools used by Bactopia that are the only tool in the process. Most of these tools are in the Bactopia Tools. I think its best that these tools be transferred to
nf-core/modules
. Many of these will need to be added tonf-core
but they are in need of some bacterial genomic tool love, so its ok!agrvate
module foragrvate
nf-core/modules#693bakta
add bakta module nf-core/modules#1085clonalframe
add clonalframeml module nf-core/modules#974ectyper
add ectyper module nf-core/modules#948eggnog_mapper
add eggnog-mapper module nf-core/modules#1020emmtyper
add emmtyper module nf-core/modules#1028fastani
module fastani nf-core/modules#695fastq-scan
add module for fastq-scan nf-core/modules#935gtdb
Add gtdbtk/classifywf module nf-core/modules#765hicap
add hicap module nf-core/modules#772ismapper
add ismapper module nf-core/modules#773kleborate
Add modulekleborate
nf-core/modules#711lissero
add lissero module nf-core/modules#1026mashtree
add mashtree module nf-core/modules#767meningotype
add meningotype module nf-core/modules#1022ngmaster
add ngmaster module nf-core/modules#1024phyloflash
Add phyloflash module nf-core/modules#786pirate
add pirate module nf-core/modules#777roary
add roary module nf-core/modules#776scoary
add scoary module nf-core/modules#1034seqsero2
add seqsero2 module nf-core/modules#1016snp-dists
module: snp-dists nf-core/modules#697spatyper
add spatyper module nf-core/modules#784staphopia-sccmec
add staphopia-sccmec module nf-core/modules#702tb-profiler
add module for tbprofiler nf-core/modules#947Curated Datasets
I think one of the best features of Bactopia is the ability to include public datasets. This works great for general datasets, but organism-specific datasets are kind of lost. I think it would be great to start a set of curated datasets that users can add data to.
Here's an example of a curated Staphylococcus aureus Bactopia Dataset. This dataset can easily be imported and allow users to rapidly analyze their samples with a curated dataset specific to their organism.
I think it would also be nice if these curated datasets, included SRA accessions linked to publications. But this exceeds my capabilities and would require extensive community support.
Species specific Workflows
With DSL2, we can create Species Specific workflows by combining the main Bactopia workflow with some Bactopia Tools. The main example, and thus shall act as a proof-of-concept will be Staphopia. Staphopia is essentially Bactopia + the Bactopia Tool
staph-typer
.The text was updated successfully, but these errors were encountered: