Skip to content
Young edited this page Mar 1, 2024 · 4 revisions

Frequently asked questions

What do I do if I encounter an error?

TELL US ABOUT IT!!!

Be sure to include the command used, what config file was used, and what the nextflow error was.

Do you have some recommended light reading for assembly rationale?

Dr. Wick has a wonderful tutorial available at https://github.com/rrwick/Perfect-bacterial-genome-tutorial

Which assembler is best?

We are flattered that there are those out there that think we can summarize a response into something that would fit into a github readme. In summary, the three aligners do different things and have different issues. The default is flye due to its popularity.

When comparing four samples that went through 'flye', 'minasm', and 'raven' assembly via visualizing their gfa files in bandage, these assemblies look very similar. There are instances, however, where the genome is closed from one assembler but could not be circularized in another. Sometimes there are different numbers of plasmids as well.

flye miniasm and minipolish raven
sample 1
sample 2
sample 3
sample 4

If perfection is the goal, please try Trycycler, which is covered on the trycycler wiki page.

Trycycler is a useful tool that reconciles generated consensus sequences from other assemblers, but has manual steps. There was an attempt to automate these steps into this workflow, but the assemblies did not achieve the desired quality. Other devs more talented that we are may have solved the issues that we were experiencing.

Is there anything for quality evalution?

In general, longer reads are better.

Currently the brunt of QC is being undertaken by NanoPlot, busco, and circulocov

Nanoplot graph of pore health

The greener the better. This is from using the optional sequencing_summary param (params.sequencing_summary = "nanopore sequencing summary file").

Nanopore read quality

Samples with "few" or "short" reads likely need a different workflow. All reads with lengths less than 1000 are discarded.

Circulocov Read Depth

Contigs with few reads mapping to them should be discarded.

2735071_flye_reoriented_edge_1

Where is an example config file?

To get a copy of the template file that Donut Falls uses by default, run

nextflow run UPHL-BioNGS/Donut_Falls --config_file true

This creates an edit_me.config file in the current directory that the End User can edit for their own purpose. This file can be renamed with no penalty.

To use this edited config file, simply use -c on the command line.

nextflow run UPHL-BioNGS/Donut_Falls -r main -profile singularity --reads reads -c edit_me.config

Why does flye keep failing?

Most of this has to do with the quality of the nanpore reads and flye's internal workings. Right now, flye errors are set to be ignored by default, but this can be adjusted in a config file. A common error and ways around it are found in this issue thread. This means that the End User will need a config file with the specified flye parameters.

Some example lines for a config file

process {
    withName: flye {
        ext.args = "--asm-coverage 50"
    }
}

Or something like this

process {
    withName: flye {
        ext.args = "--meta"
    }
}

How were flye, raven, and unicycler chosen for this workflow?

They perform well.

We did attempt adding canu, but the assembly took forever.

We used to include dragonflye to this workflow, but this assembler was dropped because versions later than 1.0.14 stopped working on our system.

We would like to include plassembler or hybracter as options, but the documentation for these requires more effort than we can put in at the moment.

Note: both dragonflye and hybracter already perform many of the steps of Donut Falls and End Users may want to use those tools instead.

If the End User prefers other assemblers, please let me know and we'll work in some options.

Warning : If there's not a relaible container of the suggested tool, we'll request the End User create a container for that tool and contribute to StaPH-B's docker repositories.

Why was pilon or "insert name of other polisher" not included?

Polishing can be tricky in that there is an intention to remove errors from nanopore sequencing, but then not force the introduction of new errors. As such, a minimum amount of polishing exists in this workflow. If there is a desire to add additional polishers or more rounds of polishing, please read about linking to another nextflow workflow which is covered on a different wiki page.

What if I'm starting with fast5 files?

Then the End User needs to do basecalling and demultiplexing with guppy first.

Something like the following to get the fastq files.

# With config file
guppy_basecaller -i <input path> -s <save path> -c <config file> [options]
# With flowcell and kit name
guppy_basecaller -i <input path> -s <save path> --flowcell <flowcell name> --kit <kit name>

Something like the following to get the demultiplexed fastq files.

guppy_barcoder -i <input fastq path> -s <save path> --barcode_kits <kit name>

Linking a nextflow workflow to Donut Falls is discussed in a separate wiki page if there is a desire to create a nextflow basecalling workflow that uses Donut Falls for assembly.