GitHub - ammaraziz/wfi: whofluIRMA

wfi (WhoFlu IRMA)

Snakemake pipeline for running IRMA. Designed for Influenza and RSV Illumina Sequencing.

Note: This pipeline is not ready for general use, while it generally works, it is very brittle.

Requirements:

Linux Distro (or *unix system like MacOS)
Conda
Snakemake
Cutadapt
IRMA
R
R Packages: ggplot2 dplyr stringr tidyr cowplot gridExtra furrr

Installation - short

Install miniconda: https://docs.conda.io/en/latest/miniconda.html

Install the latest mamba:

conda install -n base -c conda-forge mamba

Git clone this repo:

git clone --depth 1 https://github.com/ammaraziz/wfi

Install dependencies:
```
cd wfi
mamba env create -f conda.yaml
```
Activate wfi environment:
```
conda activate wfi
```

Installation - long

Install miniconda: https://docs.conda.io/en/latest/miniconda.html

Install snakemake: https://snakemake.readthedocs.io/en/stable/getting_started/installation.html

conda install -n base -c conda-forge mamba
mamba install -c bioconda -c conda-forge snakemake-minimal python=3.9

Install cutadapt and biopython:

mamba install -c bioconda -c conda-forge cutadapt biopython bbmap

Install R (>3.6 should work) and R packages:

mamba install -c conda-forge r-base 
mamba install -c r r-ggplot2 r-dplyr r-tidyr r-cowplot r-gridExtra r-optparse r-furrr

Note: installing r packages through conda is troublesome for some, if so install manually in R.

Install custom verison of IRMA which contains the RSV module:
```
mamba install -c ammaraziz irma 
```
Finally, download this repository and store in your /bin/

Usage

To use the pipeline, follow these steps:

Navigate to config.yaml and modify as appropriate:

Params	Values	Information
input_dir	path	input directory - location of the raw fastq files for input
output_dir	path	output directory - location to output results - same dir where the config sits
second_assembly	`True`/`False`	if you suspect mixtures, set to `True` . It will increase run time substantially
subset	`True`/`False`	if you are only sequencing HA/NA/MP set this to `True` else leave as `False`
trim_prog	`standard`/`tile`	Trimming program to use, tile (bbduk) or standard (cutadapt)
trim_org	`h1`/`h3`	Influenza only, Flu subtype
technology	`illumina`/`ont`/`pgm`	seq technology used, will change the module by IRMA

Check snakemake is installed, if an error is produced it means snakemake was not found or it is not installed.

% snakemake --version
% 5.10.0

Test the pipeline, this will output all the commands that will be run. Look for errors (red).

% snakemake -nq

Run the pipeline, with option -j to specify number of cores to use.

% snakemake -j 8

Output structure:

Pipeline will output correctly formatted names located in:

{output_dir}/assemblies/rename/

Sorted by subtype - most likely the disired output:

{output_dir}/assemblies/rename/type/FLU{A|B}

IRMA assembly specific files, see: https://wonder.cdc.gov/amd/flu/irma/output.html

{output_dir}/assemblies/{sampleID}/

Files for depth and summary info located in:

{output_dir}/assemblies/{sampleID}/figures/ {output_dir}/assemblies/{sampleID}/tables/

Dependencies

BLAT for the match step
LABEL, which also packages certain resources used by IRMA:
- Sequence Alignment and Modeling System (SAM) for both the rough align and sort steps
- Shogun Toolbox, which is an essential part of LABEL, is used in the sort step
SSW for the final assembly step, download our minor modifications to SSW
samtools for BAM-SAM conversion as well as BAM sorting and indexing
GNU Parallel for single node parallelization
R and these R packages: optparse, ggplot2, dplyr, tidyr, stringr, cowplot, gridExtra

Troubleshooting problems:

1. Error regarding path directories		Check input and output directorys you've specified end with a '/'

2. Error: Nothing to be done			Check config file and ensure you've changed the input/output directories. 

3. A job crashed. What do I do?			Two options, delete the output directory so snakemake can run everything again. 
						Or find out where it crashed and delete the whole folder/sample. 
						Example, sometimes IRMA produces errors, find the sample which crashed, 
						go to assemblies and delete the corresponding {sampleID} folder. Rerun snakemake.
4. I'm very confusd 										
   or I need more help			
   or I've screwed something up badly!		Shoot me an email

For any issues please submit a github issue.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.github/workflows		.github/workflows
bin		bin
config		config
tools		tools
.gitignore		.gitignore
conda.yaml		conda.yaml
readme.MD		readme.MD
snakefile		snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wfi (WhoFlu IRMA)

Requirements:

Installation - short

Installation - long

Usage

Output structure:

Dependencies

Troubleshooting problems:

About

Releases 8

Packages

Languages

ammaraziz/wfi

Folders and files

Latest commit

History

Repository files navigation

wfi (WhoFlu IRMA)

Requirements:

Installation - short

Installation - long

Usage

Output structure:

Dependencies

Troubleshooting problems:

About

Resources

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

Packages