Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add header to NF files #152

Merged
merged 4 commits into from
Jun 20, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 9 additions & 86 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -12,87 +12,16 @@
@Organization : JAX Li Lab
----------------------------------------------------------------------------------------
*/
// We now support both latest and lower versions, due to Lifebit CloudOS is only support 21.04
// We now support both latest and lower versions, due to Lifebit CloudOS is only support 20.04
// Note: NXF_VER=20.04.1 nextflow run main.nf -profile test,singularity
if( nextflow.version.matches(">= 20.07.1") ){
nextflow.enable.dsl=2
nextflow.enable.dsl = 2
} else {
// Support lower version of nextflow
nextflow.preview.dsl=2
nextflow.preview.dsl = 2
}

def helpMessage() {
log.info"""
NANOME - Nextflow PIPELINE (v$workflow.manifest.version)
by Sheng Li Lab at The Jackson Laboratory
https://github.com/LabShengLi/nanome
=================================
Usage:
The typical command is as follows:

nextflow run LabShengLi/nanome -profile test,docker
nextflow run LabShengLi/nanome -profile test,singularity
nextflow run LabShengLi/nanome -profile [docker/singularity] \\
--dsname DSNAME --input INPUT --genome GENOME

Mandatory arguments:
--dsname Dataset/analysis name
--input Input path for raw fast5 files (folders, tar/tar.gz files)
--genome Genome reference name ('hg38', 'ecoli', or 'hg38_chr22') or a directory, the directory must contain only one .fasta file with .fasta.fai index file. Default is hg38

General options:
--processors Processors used for each task
--outdir Output dir, default is 'results'
--chrSet Chromosomes used in analysis, default is chr1-22, X and Y, for human. For E. coli data, it is default as 'NC_000913.3'. For other reference genome, please specify each chromosome with space seperated.
--cleanAnalyses If clean old basecalling info in fast5 files
--skipBasecall Skip redo basecalling if users provide basecalled inputs

--cleanup If clean work dir after complete, default is false

Tools specific options:
--run[Tool-name] By default, we run top four performers in nanome paper, specify '--run[Tool-name]' can include other tool, supported tools: NANOME, Megalodon, Nanopolish, DeepSignal, Guppy, Tombo, METEORE, and DeepMod
--rerioDir Rerio dir for Megalodon model, default will get online
--MEGALODON_MODEL Megalodon model name, default is 'res_dna_r941_min_modbases_5mC_v001.cfg'
--guppyDir Guppy installation local directory, used only for conda environment
--GUPPY_BASECALL_MODEL Guppy basecalling model, default is 'dna_r9.4.1_450bps_hac.cfg'
--GUPPY_METHCALL_MODEL Guppy methylation calling model, default is 'dna_r9.4.1_450bps_modbases_5mc_hac.cfg'
--deepsignalDir DeepSignal model dir, default will get online
--tomboResquiggleOptions Tombo resquiggle options for super long/damaged sequencing, set to '--signal-length-range 0 500000 --sequence-length-range 0 50000'
--moveOption If using move table for DeepMod, default is true
--useDeepModCluster If using DeepMod cluster model for human, default is false
--METEOREDir METEORE model dir, default will get online

Running environment options:
--docker_name Docker name used for pipeline, default is 'liuyangzzu/nanome:latest'
--singularity_name Singularity name used for pipeline, default is 'docker://liuyangzzu/nanome:latest'
--singularity_cache Singularity cache dir, default is 'local_singularity_cache'
--conda_name Conda name used for pipeline, default is 'nanome'
--conda_base_dir Conda base directory, default is '/opt/conda'

Platform specific options:
--queue SLURM job submission queue name, e.g., 'gpu'
--qos SLURM job submission QOS name, e.g., 'inference'
--gresOptions SLURM job submission GPU allocation option, e.g., 'gpu:v100:1'
--time SLURM job submission running time, e.g., '2h', '1d'
--memory SLURM job submission memory, e.g., '32GB'

--projectCloud Google Cloud Platform (GCP) project name for google-lifesciences
--config Lifebit CloudOS config file, e.g., 'conf/executors/lifebit.config'

-profile options:
Use this parameter to choose a predefined configuration profile. Profiles can give configuration presets for different compute environments.

test A bundle of input params for ecoli test
test_human A bundle of input params for human test
docker A generic configuration profile to be used with Docker, pulls software from Docker Hub: liuyangzzu/nanome:latest
singularity A generic configuration profile to be used with Singularity, pulls software from: docker://liuyangzzu/nanome:latest
conda Please only use conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity. Check our GitHub for how to install local conda environment
hpc A generic configuration profile to be used on HPC cluster with SLURM
google A generic configuration profile to be used on Google Cloud platform with 'google-lifesciences'

Contact to https://github.com/LabShengLi/nanome/issues for bug report.
""".stripIndent()
}
include {helpMessage} from './modules/HELP'

// Show help message
if (params.help){
Expand Down Expand Up @@ -156,17 +85,12 @@ projectDir = workflow.projectDir
ch_utils = Channel.fromPath("${projectDir}/utils", type: 'dir', followLinks: false)
ch_src = Channel.fromPath("${projectDir}/src", type: 'dir', followLinks: false)

// Reference genome, deepmod cluster settings
def referenceGenome = "reference_genome/${params.GENOME_FN}"
def chromSizesFile = "reference_genome/${params.CHROM_SIZE_FN}"

// Reference genome, chom size file
params.referenceGenome = "${params.GENOME_DIR}/${params.GENOME_FN}"
params.chromSizesFile = "${params.GENOME_DIR}/${params.CHROM_SIZE_FN}"


if (dataType == 'human') { isDeepModCluster = params.useDeepModCluster }
else { isDeepModCluster = false }

params.isDeepModCluster = isDeepModCluster


Expand All @@ -184,7 +108,7 @@ if (params.input.endsWith(".filelist.txt")) {
}
.set{ inputCh }
} else if (params.input.contains('*') || params.input.contains('?')) {
// match all files in the folder, note: input must use '', prevent expand in advance
// match all files in the folder, note: input must use quote string '', prevent expand in advance
// such as --input '/fastscratch/liuya/nanome/NA12878/NA12878_CHR22/input_chr22/*'
Channel.fromPath(params.input, type: 'any', checkIfExists: true)
.set{ inputCh }
Expand Down Expand Up @@ -256,7 +180,6 @@ if (params.hmc) { summary['hmc'] = params.hmc }
if (params.ctg_name) { summary['ctg_name'] = params.ctg_name }



summary['\nModel summary'] = "--------"
if (params.runBasecall && !params.skipBasecall) summary['GUPPY_BASECALL_MODEL'] = params.GUPPY_BASECALL_MODEL
if (params.runMethcall && params.runMegalodon)
Expand Down Expand Up @@ -287,8 +210,8 @@ if (workflow.revision) summary['Pipeline Release'] = workflow.revision
if (workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container"
summary['errorStrategy'] = params.errorStrategy
summary['maxRetries'] = params.maxRetries
if (params.echo) summary['echo'] = params.echo
if (params.cleanup) summary['cleanup'] = params.cleanup
if (params.echo) summary['echo'] = params.echo
if (params.cleanup) summary['cleanup'] = params.cleanup

if (workflow.profile.contains('hpc') || workflow.profile.contains('winter') ||\
workflow.profile.contains('sumner') ) {
Expand Down Expand Up @@ -508,7 +431,7 @@ workflow {
if (params.runNewTool && params.newModuleConfigs) {
newModuleCh = Channel.of( params.newModuleConfigs ).flatten()
// ref: https://www.nextflow.io/docs/latest/operator.html#combine
NewTool(newModuleCh.combine(BASECALL.out.basecall), ENVCHECK.out.reference_genome, referenceGenome)
NewTool(newModuleCh.combine(BASECALL.out.basecall), ENVCHECK.out.reference_genome, params.referenceGenome)
NewToolComb(NewTool.out.batch_out.collect(), newModuleCh, ch_src)

s_new = NewToolComb.out.site_unify
Expand Down
13 changes: 13 additions & 0 deletions modules/ALIGNMENT.nf
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
/*
=========================================================================================
NANOME(Nanopore methylation) pipeline for Oxford Nanopore sequencing
=========================================================================================
NANOME Analysis Pipeline.
#### Homepage / Documentation
https://github.com/LabShengLi/nanome
@Author : Yang Liu
@FileName : ALIGNMENT.nf
@Software : NANOME project
@Organization : JAX Sheng Li Lab
----------------------------------------------------------------------------------------
*/
// Align each basecalled outputs
process ALIGNMENT {
tag "${basecallDir.baseName}"
Expand Down
13 changes: 13 additions & 0 deletions modules/BASECALL.nf
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
/*
=========================================================================================
NANOME(Nanopore methylation) pipeline for Oxford Nanopore sequencing
=========================================================================================
NANOME Analysis Pipeline.
#### Homepage / Documentation
https://github.com/LabShengLi/nanome
@Author : Yang Liu
@FileName : BASECALL.nf
@Software : NANOME project
@Organization : JAX Sheng Li Lab
----------------------------------------------------------------------------------------
*/
// basecall of subfolders named 'M1', ..., 'M10', etc.
process BASECALL {
tag "${fast5Untar.baseName}"
Expand Down
24 changes: 24 additions & 0 deletions modules/COMMONS.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
=========================================================================================
NANOME(Nanopore methylation) pipeline for Oxford Nanopore sequencing
=========================================================================================
NANOME Analysis Pipeline.
#### Homepage / Documentation
https://github.com/LabShengLi/nanome
@Author : Yang Liu
@FileName : COMMONS.nf
@Software : NANOME project
@Organization : JAX Sheng Li Lab
----------------------------------------------------------------------------------------
*/
// check nextflow version, then declare DSL2 in two ways
def nextflowVersionCheck() {
// We now support both latest and lower versions, due to Lifebit CloudOS is only support 20.04
// Note: NXF_VER=20.04.1 nextflow run main.nf -profile test,singularity
if( nextflow.version.matches(">= 20.07.1") ){
nextflow.enable.dsl = 2
} else {
// Support lower version of nextflow
nextflow.preview.dsl = 2
}
}
15 changes: 14 additions & 1 deletion modules/DEEPSIGNAL.nf
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
/*
=========================================================================================
NANOME(Nanopore methylation) pipeline for Oxford Nanopore sequencing
=========================================================================================
NANOME Analysis Pipeline.
#### Homepage / Documentation
https://github.com/LabShengLi/nanome
@Author : Yang Liu
@FileName : DEEPSIGNAL.nf
@Software : NANOME project
@Organization : JAX Sheng Li Lab
----------------------------------------------------------------------------------------
*/
// DeepSignal runs on resquiggled subfolders named 'M1', ..., 'M10', etc.
process DEEPSIGNAL {
tag "${indir.baseName}"
Expand Down Expand Up @@ -106,4 +119,4 @@ process DPSIGCOMB {
. $task.cpus 12 ${params.sort ? true : false} "${params.chrSet1.replaceAll(',', ' ')}"
echo "### DeepSignal combine DONE"
"""
}
}
13 changes: 13 additions & 0 deletions modules/ENVCHECK.nf
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
/*
=========================================================================================
NANOME(Nanopore methylation) pipeline for Oxford Nanopore sequencing
=========================================================================================
NANOME Analysis Pipeline.
#### Homepage / Documentation
https://github.com/LabShengLi/nanome
@Author : Yang Liu
@FileName : ENVCHECK.nf
@Software : NANOME project
@Organization : JAX Sheng Li Lab
----------------------------------------------------------------------------------------
*/
// Check all tools work well
process ENVCHECK {
tag "${params.dsname}"
Expand Down
86 changes: 86 additions & 0 deletions modules/HELP.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
/*
=========================================================================================
NANOME(Nanopore methylation) pipeline for Oxford Nanopore sequencing
=========================================================================================
NANOME Analysis Pipeline.
#### Homepage / Documentation
https://github.com/LabShengLi/nanome
@Author : Yang Liu
@FileName : HELP.nf
@Software : NANOME project
@Organization : JAX Sheng Li Lab
----------------------------------------------------------------------------------------
*/
def helpMessage() {
log.info"""
NANOME - Nextflow PIPELINE (v$workflow.manifest.version)
by Sheng Li Lab at The Jackson Laboratory
https://github.com/LabShengLi/nanome
=================================
Usage:
The typical command is as follows:

nextflow run LabShengLi/nanome -profile test,docker
nextflow run LabShengLi/nanome -profile test,singularity
nextflow run LabShengLi/nanome -profile [docker/singularity] \\
--dsname DSNAME --input INPUT --genome GENOME

Mandatory arguments:
--dsname Dataset/analysis name
--input Input path for raw fast5 files (folders, tar/tar.gz files)
--genome Genome reference name ('hg38', 'ecoli', or 'hg38_chr22') or a directory, the directory must contain only one .fasta file with .fasta.fai index file. Default is hg38

General options:
--processors Processors used for each task
--outdir Output dir, default is 'results'
--chrSet Chromosomes used in analysis, default is chr1-22, X and Y, for human. For E. coli data, it is default as 'NC_000913.3'. For other reference genome, please specify each chromosome with space seperated.
--cleanAnalyses If clean old basecalling info in fast5 files
--skipBasecall Skip redo basecalling if users provide basecalled inputs

--cleanup If clean work dir after complete, default is false

Tools specific options:
--run[Tool-name] By default, we run top four performers in nanome paper, specify '--run[Tool-name]' can include other tool, supported tools: NANOME, Megalodon, Nanopolish, DeepSignal, Guppy, Tombo, METEORE, and DeepMod
--rerioDir Rerio dir for Megalodon model, default will get online
--MEGALODON_MODEL Megalodon model name, default is 'res_dna_r941_min_modbases_5mC_v001.cfg'
--guppyDir Guppy installation local directory, used only for conda environment
--GUPPY_BASECALL_MODEL Guppy basecalling model, default is 'dna_r9.4.1_450bps_hac.cfg'
--GUPPY_METHCALL_MODEL Guppy methylation calling model, default is 'dna_r9.4.1_450bps_modbases_5mc_hac.cfg'
--deepsignalDir DeepSignal model dir, default will get online
--tomboResquiggleOptions Tombo resquiggle options for super long/damaged sequencing, set to '--signal-length-range 0 500000 --sequence-length-range 0 50000'
--moveOption If using move table for DeepMod, default is true
--useDeepModCluster If using DeepMod cluster model for human, default is false
--METEOREDir METEORE model dir, default will get online

Running environment options:
--docker_name Docker name used for pipeline, default is 'liuyangzzu/nanome:latest'
--singularity_name Singularity name used for pipeline, default is 'docker://liuyangzzu/nanome:latest'
--singularity_cache Singularity cache dir, default is 'local_singularity_cache'
--conda_name Conda name used for pipeline, default is 'nanome'
--conda_base_dir Conda base directory, default is '/opt/conda'

Platform specific options:
--queue SLURM job submission queue name, e.g., 'gpu'
--qos SLURM job submission QOS name, e.g., 'inference'
--gresOptions SLURM job submission GPU allocation option, e.g., 'gpu:v100:1'
--time SLURM job submission running time, e.g., '2h', '1d'
--memory SLURM job submission memory, e.g., '32GB'

--projectCloud Google Cloud Platform (GCP) project name for google-lifesciences
--config Lifebit CloudOS config file, e.g., 'conf/executors/lifebit.config'

-profile options:
Use this parameter to choose a predefined configuration profile. Profiles can give configuration presets for different compute environments.

test A bundle of input params for ecoli test
test_human A bundle of input params for human test
docker A generic configuration profile to be used with Docker, pulls software from Docker Hub: liuyangzzu/nanome:latest
singularity A generic configuration profile to be used with Singularity, pulls software from: docker://liuyangzzu/nanome:latest
conda Please only use conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity. Check our GitHub for how to install local conda environment
hpc A generic configuration profile to be used on HPC cluster with SLURM
google A generic configuration profile to be used on Google Cloud platform with 'google-lifesciences'

Contact to https://github.com/LabShengLi/nanome/issues for bug report.
""".stripIndent()
}

14 changes: 14 additions & 0 deletions modules/MEGALODON.nf
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
/*
=========================================================================================
NANOME(Nanopore methylation) pipeline for Oxford Nanopore sequencing
=========================================================================================
NANOME Analysis Pipeline.
#### Homepage / Documentation
https://github.com/LabShengLi/nanome
@Author : Yang Liu
@FileName : MEGALODON.nf
@Software : NANOME project
@Organization : JAX Sheng Li Lab
----------------------------------------------------------------------------------------
*/
// Megalodon runs on resquiggled subfolders named 'M1', ..., 'M10', etc.
process MEGALODON {
tag "${fast5Untar.baseName}"
Expand Down Expand Up @@ -151,3 +164,4 @@ process MGLDNCOMB {
echo "### Megalodon combine DONE"
"""
}

15 changes: 14 additions & 1 deletion modules/NANOPOLISH.nf
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
/*
=========================================================================================
NANOME(Nanopore methylation) pipeline for Oxford Nanopore sequencing
=========================================================================================
NANOME Analysis Pipeline.
#### Homepage / Documentation
https://github.com/LabShengLi/nanome
@Author : Yang Liu
@FileName : NANOPOLISH.nf
@Software : NANOME project
@Organization : JAX Sheng Li Lab
----------------------------------------------------------------------------------------
*/
// Nanopolish runs on resquiggled subfolders named 'M1', ..., 'M10', etc.
process NANOPOLISH {
tag "${basecallDir.baseName}"
Expand Down Expand Up @@ -108,4 +121,4 @@ process NPLSHCOMB {

echo "### Nanopolish combine DONE"
"""
}
}
Loading