NextDenovo Parameter Reference

NextDenovo requires at least one read file (option: input_fofn) as input, it works with gzip'd FASTA and FASTQ formats and uses a config file to pass options.

Input

input_fofn (one file one line)

ls reads1.fasta reads2.fastq reads3.fasta.gz reads4.fastq.gz ... > input.fofn

config file

A config file is a text file that contains a set of parameters (key=value pairs) to set runtime parameters for NextDenovo. The following is a typical config file, which is also located in doc/run.cfg.

[General]
job_type = local
job_prefix = nextDenovo
task = all
rewrite = yes
deltmp = yes 
parallel_jobs = 20
input_type = raw
read_type = clr # clr, ont, hifi
input_fofn = input.fofn
workdir = 01_rundir

[correct_option]
read_cutoff = 1k
genome_size = 1g # estimated genome size
sort_options = -m 20g -t 15
minimap2_options_raw = -t 8
pa_correction = 3
correction_options = -p 15

[assemble_option]
minimap2_options_cns = -t 8 
nextgraph_options = -a 1

Output

workdir/03.ctg_graph/nd.asm.fasta

Contigs with fasta format, the fasta header includes ID, type, length, node count, a consecutive lowercase region in the sequence implies a weak connection, and a low quality base is marked with a single lowercase base.
workdir/03.ctg_graph/nd.asm.fasta.stat

Some basic statistical information (N10-N90, Total size et al.).

Options

Global options

job_type = sge

local, sge, pbs, lsf, slurm... (default: sge)

job_prefix = nextDenovo

prefix tag for jobs. (default: nextDenovo)

task = <all, correct, assemble>

task need to run, correct = only do the correction step, assemble = only do the assembly step (only work if input_type = corrected or read_type = hifi), all = correct + assemble. (default: all)

rewrite = no

overwrite existed directory [yes, no]. (default: no)

deltmp = yes

delete intermediate results. (default: yes)

rerun = 3

re-run unfinished jobs untill finished or reached rerun loops, 0=no. (default: 3)

parallel_jobs = 10

number of tasks used to run in parallel. (default: 10)

input_type = raw

input reads type [raw, corrected]. (default: raw)

input_fofn = input.fofn

input file, one line one file. (required)

read_type = {clr, hifi, ont}

reads type, clr=PacBio continuous long read, hifi=PacBio highly accurate long reads, ont=NanoPore 1D reads. (required)

workdir = 01.workdir

work directory. (default: ./)

usetempdir = /tmp/test

temporary directory in compute nodes to avoid high IO wait. (default: None)

nodelist = avanode.list.fofn

a list of hostnames of available nodes, one node one line, used with usetempdir for non-sge job_type.

submit = auto

command to submit a job, auto = automatically set by Paralleltask.

kill = auto

command to kill a job, auto = automatically set by Paralleltask.

check_alive = auto

command to check a job status, auto = automatically set by Paralleltask.

job_id_regex = auto

the job-id-regex to parse the job id from the out of submit, auto = automatically set by Paralleltask.

use_drmaa = no

use drmaa to submit and control jobs.

Correction options

read_cutoff = 1k

filter reads with length < read_cutoff. (default: 1k)

genome_size = 1g

estimated genome size, suffix K/M/G recognized, used to calculate seed_cutoff/seed_cutfiles/blocksize and average depth, it can be omitted when manually setting seed_cutoff.

seed_depth = 45

expected seed depth, used to calculate seed_cutoff, co-use with genome_size, you can try to set it 30-45 to get a better assembly result. (default: 45)

seed_cutoff = 0

minimum seed length, <=0 means calculate it automatically using bin/seq_stat <seq_stat>.

seed_cutfiles = 5

split seed reads into seed_cutfiles subfiles. (default: pa_correction)

blocksize = 10g

block size for parallel running, split non-seed reads into small files, the maximum size of each file is blocksize. (default: 10g)

pa_correction = 3

number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage, overwrite parallel_jobs only for this step. (default: 3)

minimap2_options_raw = -t 10

minimap2 options, used to find overlaps between raw reads, see minimap2-nd <minimap2-nd> for details.

sort_options = -m 40g -t 10

sort options, see ovl_sort <ovl_sort> for details.
correction_options = -p 10

correction options, see following:
-p, --process, set the number of processes used for correcting. (default: 10)
-b, --blacklist, disable the filter step and increase more corrected data.
-s, --split, split the corrected seed with un-corrected regions. (default: False)
-fast, 0.5-1 times faster mode with a little lower accuracy. (default: False)
-dbuf, disable caching 2bit files and reduce ~TOTAL_INPUT_BASES/4 bytes of memory usage. (default:False)
-max_lq_length, maximum length of a continuous low quality region in a corrected seed, larger max_lq_length will produce more corrected data with lower accuracy. (default: auto [pb/1k, ont/10k])

Assembly options

minimap2_options_cns = -t 8 -k17 -w17

minimap2 options, used to find overlaps between corrected reads.

minimap2_options_map = -t 10

minimap2 options, used to map reads back to the assembly.

nextgraph_options = -a 1

nextgraph options, see nextgraph <nextgraph> for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPTION.rst

OPTION.rst

NextDenovo Parameter Reference

Input

Output

Options

Global options

Correction options

Assembly options

Files

OPTION.rst

Latest commit

History

OPTION.rst

File metadata and controls

NextDenovo Parameter Reference

Input

Output

Options

Global options

Correction options

Assembly options