nf-iav-illumina is a bioinformatics analysis pipeline for assembly and H/N subtyping of Influenza A virus Illumina sequence data.
The pipeline is implemented in Nextflow
for the IRMA assembly and H/N subtyping by nucleotide BLAST against the NCBI Influenza DB.
- Download latest NCBI Influenza DB sequences and metadata (or use user-specified files)
- Merge reads of re-sequenced samples (
cat
) (if needed) - Assembly of Influenza gene segments with IRMA using the built-in FLU module
- Nucleotide BLAST search against NCBI Influenza DB
- H/N subtype prediction and Excel XLSX report generation based on BLAST results
-
Install
Nextflow
(>=21.04.0
). -
Install any of
Docker
,Singularity
,Podman
,Shifter
orCharliecloud
for full pipeline reproducibility (please only useConda
as a last resort) -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run peterk87/nf-iav-illumina -profile test,<docker/singularity/podman/shifter/charliecloud/conda>
- If you are using
singularity
then the pipeline will auto-detect this and attempt to download the Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Singularity images directly due to timeout or network issues then please use the--singularity_pull_docker_container
parameter to pull and convert the Docker image instead. Alternatively, it is highly recommended to use thenf-core download
command to pre-download all of the required containers before running the pipeline and to set theNXF_SINGULARITY_CACHEDIR
orsingularity.cacheDir
Nextflow options to be able to store and re-use the images from a central location for future pipeline runs. - If you are using
conda
, it is highly recommended to use theNXF_CONDA_CACHEDIR
orconda.cacheDir
settings to store the environments in a central location for future pipeline runs.
- If you are using
-
Run your own analysis
-
[Optional] Generate an input samplesheet from a directory containing Illumina FASTQ files (e.g.
/path/to/illumina_run/Data/Intensities/Basecalls/
) with the included Python scriptfastq_dir_to_samplesheet.py
before you run the pipeline (requires Python 3 installed locally) e.g.python ~/.nextflow/assets/peterk87/nf-iav-illumina/bin/fastq_dir_to_samplesheet.py \ -i /path/to/illumina_run/Data/Intensities/Basecalls/ \ -o samplesheet.csv
-
Typical command
nextflow run peterk87/nf-iav-illumina \ --input samplesheet.csv \ --profile <docker/singularity/podman/shifter/charliecloud/conda>
-
The nf-iav-illumina pipeline comes with:
- NCBI Influenza FTP site
- IRMA Iterative Refinement Meta-Assembler
- nf-core project for establishing Nextflow workflow development best-practices, nf-core tools and nf-core modules
- nf-core/viralrecon for inspiration and setting a high standard for viral sequence data analysis pipelines
- Conda and Bioconda project for making it easy to install, distribute and use bioinformatics software.
- Biocontainers for automatic creation of Docker and Singularity containers for bioinformatics software in [Bioconda]