After running using :cl_param:mode initiate, CUT&RUN-Flow will copy the task configuration template into your current working directory. For a full example of this file, seeTask nextflow.config
.# Configure: $ <vim/nano...> ./my_task/nextflow.config # Task Input, Steps, etc. ConfigurationTask-level inputs such as input files and reference fasta files must be configured here (seeInput File Setup
). Additional task-specific settings are also configured here, such as output read naming rules and output file locations (seeOutput Setup
).Note
These settings are provided for user customizability, but in the majority of cases the default settings should work fine.
Many pipeline settings can justifiably be configured either on a task-specifc basis (inTask nextflow.config
) or as defaults for the pipeline (inPipe nextflow.config
). These include nextflow "executor" settings for use of SLURM and Environment Modules and associated settings such as memory and cpu usage. These settings are described here, inExecutor Setup
, but can also be set in thePipe nextflow.config
.
Likewise, settings for individual pipeline components such as Trimmomatic tag trimming paramaters, or theqval
used for MACS2 peak calling can be provided in either config file, or both (for a description of these parameters, seeWorkflow
).Note
If any settings are provided in both the above
Task nextflow.config
file and thePipe nextflow.config
file located in the pipe directory, the task-directory settings will take precedence. For more information on Nextflow configuration precedence, seeconfig
.
CUT&RUN-Flow handles reference database preparation with a series of steps utilizing :cl_param:mode prep_fasta. The location of the fasta used for preparation is provided to the
ref_fasta
paramater as either a file path or URL.Reference preparation is then performed using:
$ nextflow CnR-flow --mode prep_fasta
This will place the prepared reference files in the directory specified by
refs_dir
(seeOutput Setup
). Once prepared, the this parameter can be dynamically used during pipeline execution to detect the reference name and location, depending on the value of theref_mode
parameter.
- Ref Modes:
'fasta'
: Get reference name fromref_fasta
(which must then be set)'name'
: Get reference name fromref_name
(which must then be set)'manual'
: Set required paramaters manually:
- Ref Required Manual Paramaters:
ref_name
: Reference Nameref_bt2db_path
: Reference Bowtie2 Alignment Reference Pathref_chrom_sizes_path
: Path to <reference>.chrom_sizes fileref_eff_genome_size
: Effective genome size for reference.The
ref_mode
parameter also applies to the preparation and location of the fasta used for the normalization reference if :flag_param:do_norm. These paramaters are named in parallel using anorm_[ref...]
prefix and are autodetected from the value ofnorm_ref_fasta
ornorm_ref_name
depending on the value ofref_mode
. For details on normalization steps, seeNormalization Steps
.
Two (mutually-exclusive) options are provided for supplying input sample fastq[.gz] files to the workflow.
- Single Sample Group:
A single group of samples with zero or one (post-combination) control sample(s) for all treatment samples.
treat_fastqs
ctrl_fastqs
- Multiple Sample Group:
A multi-group layout, with groups of samples provided where each group has a control sample. (All groups are required to have a control sample in this mode.)
:config_param:fastq_groups
Multiple pairs of files representing the same sample/replicate that were sequenced on different lanes can be automatically recognized and combined (default:
true
). For more information see:MergeFastqs
.Note
Note, for convenience, if the same file is found both as a treatment and control, the copy passed to treatment will be ignored (facilitates easy pattern matching).
Warning
Input files must be paired-end, and in fastq[.gz] format. Nextflow requires the use of this (strange-looking)
R{1,2}
naming construct, (matches either R1 or R2) which ensures that files are fed into the pipeline as pairs.
Nextflow provides extensive options for using cluster-based job scheduling, such as SLURM, PBS, etc. These options are worth reviewing in the nextflow docs:
executor
. The specific executor is selected with the configuration setting:process.executor = 'option'
. The default value ofprocess.executor = 'local'
runs the execution on the local filesystem.
- Specific settings of note:
Option Example process.executor
'slurm'
process.memory
'4 GB'
process.cpus
4
process.time
'1h'
process.clusterOptions
'--qos=low'
- | To facilitate process efficiency (and for adequate capacity)
for different parts of the process, memory-related process labels have been applied to the processes:
'small_mem'
,'norm_mem'
, and'big_mem'
. These are specified usingprocess.withLabel: my_label { key = value }
Example:process.withLabel: big_mem { memory = '16 GB' }
.- | A
1n/2n/4n
or1n/2n/8n
strategy is recommended for the respective
small_mem/norm_mem/big_mem
options. (for details on nextflow process labels, see process). Additionally, mutliple cpu usage is disabled for processes that do not support (or aren't significanlly more effective) with multiple processes, and so theprocess.cpus
setting only applies to processes within the pipeline with multiple CPUS enabled.
Output options can control the quantity, naming, and location of output files from the pipeline.
- publish_files:
Three modes are available for selecting the number of output files from the pipeline:
minimal
: Only the final alignments are output. (Trimmed Fastqs are Excluded)default
: Multiple types of alignments are output. (Trimmed Fastqs are included)all
: All files produced by the pipline (excluding deleted intermediates) are output.This option is selected with
publish_files
.- publish_mode:
This mode selects the value for the Nextflow
process.publishDir
mode used to output files (for details, see: publishDir). Available options are:
'copy'
: Copy output files (from the nextflow working directory) to the output folder.'symlink'
: Link to the output files located in the nextflow working directory.- trim_name_prefix & trim_name_suffix:
trim_name_prefix
&trim_name_suffix
These options allow trimming of a prefix or suffix from sample names (after any merging steps).- out_dir:
out_dir
: Location for output of the files- refs_dir:
refs_dir
: Location for placing and searching for refernce directories