while we wait for four five two
a very special plugin, just for you
Nextflow plugin for experimental features that want to become core features!
Currently includes the following features:
-
automatic deletion of temporary files (
boost.cleanup
) -
exec
operator for creating an inline native (i.e.exec
) process -
mergeCsv
function for saving records to a CSV file -
mergeText
function for saving items to a text file (similar tocollectFile
operator) -
scan
operator for, well, scan operations -
then
andthenMany
operators for defining custom operators in your pipeline
To use nf-boost
, include it in your Nextflow config and add any desired settings:
plugins {
id 'nf-boost'
}
boost {
cleanup = true
}
The plugin requires Nextflow version 23.10.0
or later.
New in version 0.4.0
: requires Nextflow 24.04.0
or later.
If a release hasn't been published to the main registry yet, you can still use it by specifying the following environment variable so that Nextflow can find the plugin:
export NXF_PLUGINS_TEST_REPOSITORY="https://github.com/bentsherman/nf-boost/releases/download/0.3.2/nf-boost-0.3.2-meta.json"
Check out the examples
directory for example pipelines that demonstrate how to use the features in this plugin.
boost.cleanup
Set to true
to enable automatic cleanup (default: false
). Temporary files will be automatically deleted as soon as they are no longer needed.
The default cleanup observer uses publishDir
directives to determine whether a file should be published before it is deleted. Setting boost.cleanup = 'v2'
will use an alternate cleanup observer which uses the new workflow publish definition instead of publishDir
to track publishing.
Limitations:
-
Resume is not supported with automatic cleanup at this time. Deleted tasks will be re-executed on a resumed run. Resume will be supported when this feature is finalized in Nextflow.
-
Helper files and log files created by Nextflow (e.g.
.command.run
,.command.log
) are not deleted. Consider using a cleanup policy on the underlying filesystem or object storage to delete these files automatically over time. -
Input files that are staged into the work directory (e.g. from an HTTP/FTP server or S3 bucket) are not deleted.
-
Files created by operators (e.g.
collectFile
,splitFastq
) cannot be tracked and so are not deleted. For optimal performance, consider refactoring such operators into processes:-
Splitter operators such as
splitFastq
can also be used as functions in a native process:process SPLIT_FASTQ { input: val(fastq) output: path(chunks) exec: chunks = splitFastq(fastq, file: true) }
-
The
collectFile
operator can be replaced withmergeText
(in this plugin) in a native process. See theexamples
directory for example usage.
-
boost.cleanupInterval
Specify how often to scan for cleanup (default: '60s'
).
mergeCsv( records, path, [opts] )
Save a list of records (i.e. tuples or maps) to a CSV file.
Available options:
-
header
: Whentrue
, the keys of the first record are used as the column names (default:false
). Can also be a list of column names. -
sep
: The character used to separate values (default:','
).
mergeText( items, path, [opts] )
Save a list of items (i.e. files or strings) to a text file.
Available options:
-
keepHeader
: Prepend the resulting file with the header of the first file (default:false
). The number of header lines can be specified using theskip
option, to determine how many lines to remove from each file. -
newLine
: Append a newline character after each entry (default:false
). -
skip
: The number of lines to skip at the beginning of each entry (default:1
whenkeepHeader
is true,0
otherwise).
exec( name, body )
The exec
operator creates and invokes an inline native (i.e. exec
) process with the given name, as well as a closure which corresponds to the exec:
section of a native process.
The inline process can be configured from the config file like any other process, including the use of process selectors (i.e. withName
).
Limitations:
-
Inline process directives are not supported yet.
-
The inline exec body should accept a single value and return a single value. Multiple inputs/outputs are not supported yet.
scan( [seed], accumulator )
The scan
operator is similar to reduce
-- it applies an accumulator function sequentially to each value in a channel -- however, whereas reduce
only emits the final result, scan
emits each partially accumulated value.
then( onNext, [opts] )
then( [others...], opts )
thenMany( onNext, emits: <emits>, [opts] )
thenMany( [others...], emits: <emits>, opts )
The then
operator is a generic operator that can be used to implement nearly any operator you can imagine.
It accepts any of three event handlers: onNext
, onComplete
, and onError
(similar to subscribe
). Each event handler has access to the following methods:
-
emit( value )
: emit a value to the output channel (used only bythen
) -
emit( name, value )
: emit a value to an output channel (used only bythenMany
) -
done()
: signal that no more values will be emitted
When there is only one source channel, the done()
method will be called automatically when the source channel sends the onComplete
event. You can still call it manually, e.g. to finalize the output earlier. When there are multiple source channels, you are responsible for calling done()
at the appropriate time -- if you don't call it, your operator will wait forever.
When there are multiple source channels, onNext
and onComplete
events are synchronized. This way, you don't need to worry about making your event handlers thread-safe, because they will be invoked on one event at a time.
Available options:
-
emits
: List of output channel names when usingthenMany
. Whereasthen
emits a single channel,thenMany
emits a multi-channel output (similar to processes and workflows) where each output can be accessed by name. -
onNext( value, [i] )
: Closure that is invoked when a value is emitted by a source channel. Equivalent to providing a closure as the first argument. When there are multiple source channels, the closure is invoked with a second argument corresponding to the index of the source channel. -
onComplete( [i] )
: Closure that is invoked after the last value is emitted by a source channel. When there are multiple source channels, the closure is invoked with the index of the source channel. -
onError( error )
: Closure that is invoked when an exception is raised while handling anonNext
event. It is invoked the exception that caused the error. No further calls will be made toonNext
oronComplete
after this event. By default, the error is logged and the workflow is terminated. -
singleton
: Whether the output channel should be a value (i.e. singleton) channel. By default, it is determined by the source channel, i.e. if the source is a value channel then the output will also be a value channel and vice versa.
The easiest way to build and test nf-boost locally is to run make install
. This will build the plugin and install it to your Nextflow plugins directory (e.g. ~/.nextflow/plugins
), using the version defined in MANIFEST.MF
. Finally, specify the plugin in your Nextflow config with this exact version. You can then use it locally like a regular plugin.
Refer to the nf-hello README for more information about building and publishing Nextflow plugins.