Snakemake workflow: PacBio Iso-Seq processing pipeline

Introduction

A Snakemake workflow for processing PacBio raw subreads.bam into polished mRNA isoforms in FASTA format.
Optionnally, long assembled mRNAs can be aligned against a genomic reference to generate a genomic annotation in the GFF3 format.

Steps

The workflow follows Iso-Seq standard analysis that consists of the following steps:

Get Circular Consensus Sequence (CCS) reads.
Get Full Length (FL) reads.
Get refined Full-Length, Non-Concatemer (FLNC) reads.
Get transcript isoforms from (refined and clustered) FLNC reads.
Optionally, align these transcript isoforms to a genome reference and create a GFF3 annotation file.

PacBio Iso-Seq terminology

Usage

The usage of this workflow is described in the Snakemake Workflow Catalog and also here.

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and its DOI (see above).

Install conda and mamba

For each rule, a dedicated Conda/Mamba environment On the crunchomics cluster,

To install the 'conda' package manager from the lightweight miniconda distribution, follow instructions here.

To install the mamba package manager, follow the instructions here.

Create a ''snakemake' environment

This will be your starting environment with:

Snakemake version 7.0.1
Pandas version 1.4.3.
Biopython version 1.79.

To create it, run mamba env create -f config/environment.yaml to install these three Python dependencies.

Run Snakemake with conda

Snakemake will use the rule conda environments defined in envs/ for each given rule. It will install the conda environment using mamba so be sure mamba is available by running either which mamba.

If using Snakemake interactively execute: snakemake --use-conda -j X where X is your number of cores.
Otherwise submit your jobs using SLURM job manager: sbatch pacbio_snakemake_sbatch.sh.

Pipeline maintainers

Tijs Bliek, technician, Plant Development and Epigenetics, SILS, University of Amsterdam.
Marc Galland, support data scientist, Plant Physiology, SILS, University of Amsterdam.

References

PacBio conda tools

https://github.com/PacificBiosciences/pbbioconda

PacBio Iso-Seq workflow

TODO

Replace <owner> and <repo> everywhere in the template (also under .github/workflows) with the correct <repo> name and owning user or organization.
Replace <name> with the workflow name (can be the same as <repo>).
Replace <description> with a description of what the workflow does.
The workflow will occur in the snakemake-workflow-catalog once it has been made public. Then the link under "Usage" will point to the usage instructions if <owner> and <repo> were correctly set.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
.template		.template
config		config
info		info
workflow		workflow
.gitignore		.gitignore
.snakemake-workflow-catalog.yml		.snakemake-workflow-catalog.yml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snakemake workflow: PacBio Iso-Seq processing pipeline

Introduction

Steps

PacBio Iso-Seq terminology

Usage

Install conda and mamba

Create a ''snakemake' environment

Run Snakemake with conda

Pipeline maintainers

References

PacBio conda tools

PacBio Iso-Seq workflow

TODO

About

Releases 1

Packages

Languages

License

SilkeAllmannLab/pacbio_snakemake

Folders and files

Latest commit

History

Repository files navigation

Snakemake workflow: PacBio Iso-Seq processing pipeline

Introduction

Steps

PacBio Iso-Seq terminology

Usage

Install conda and mamba

Create a ''snakemake' environment

Run Snakemake with conda

Pipeline maintainers

References

PacBio conda tools

PacBio Iso-Seq workflow

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages