Skip to content
Workflows
R Python Makefile Other
Branch: master
Clone or download
Pull request Compare This branch is 32 commits ahead, 245 commits behind rioualen:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
doc
examples
img
scripts
.gitignore
Dockerfile
README.md
how_to_make_a_new_workflow.md
install.md

README.md

Gene Regulation @ Calanques BCF

Authors : Edlira Nano, Claire Rioualen

This work was funded by the IFB project T5 : https://www.france-bioinformatique.fr/fr/projets2015/t5-project

This program holds shared code to produce workflows for the analysis of Next Generation Sequencing data related to gene regulation: ChIP-seq, RNA-seq, and related technologies, as used at the Calanques BCF (Bioinformatics Core Facility, TAGC & IBDM labs).

This program is a fork of the original France Genomique Workpackage 2.6 - Gene Regulation : https://github.com/rioualen/gene-regulation from Claire Rioualen.

This version of the gene-regulation package follows the guidelines of:

  1. report of the RNA-seq meeting @ TAGC

Installation

This program is written in the Snakemake workflow management system[1]. Python, shell and R scripts are called from the snakemake workflows. In addition, several well-known NGS analysis tools are called from the snakemake workflows.

Most of these softwares are packaged for debian-based systems. If not, we refer to the install.md instructions file.

Prerequisites

For this program to run you will need to install the following software :

  • R 3+ (debian packaged)
  • Python 2.7/3.4 (debian packaged)
  • Snakemake 3.4+ (debian packaged)

Depending on the rules and workflow you will use on your analysis, you may also need to install the following :

Installing of non debian-packaged software

Homer install

Homer needs seqlogo and blat programs to run, both not packaged:

To install blat on Linux:

1 take the latest blatSrcXX.zip archive from https://users.soe.ucsc.edu/~kent/src/
2 install it on /usr/local following instruction from http://nix-bio.blogspot.fr/2013/10/installing-blat-and-blast.html
3 to avoid the "jkweb.a no rule" problem compile it with make MACHTYPE=$MACHTYPE

To install seqlogo:

You have to install the weblogo archive from http://weblogo.berkeley.edu/ The archive already contains the seqlogo binary file ready for use.

SARtools install

On the SARtools GitHub page follow the install instructions on the README file, to install it either within R, or using bioconda.

Alternative installation options

The original project also contains a makefile that installs all the tools and dependencies used by gene-regulation. [NB currently only ChIP-seq dependencies are included, RNA-seq specific tools are to be included soon.]

make -f gene-regulation/scripts/makefiles/install_tools_and_libs.mk all
source ~/.bashrc

Virtualization

The original poject recommends using one of the tutorials on virtualization, in order to run the workflows under a unix system without damaging your installation. Full tutorials can be found in the doc section, including the creation of a virtual machine/docker container, the installation of all the tools and dependencies and the execution of the workflows: doc/gene-regulation_tutorials. These tutorials have been developed for ChIP-seq studies; however RNA-seq pipelines are soon to be included.

The workflows

The workflows and the individual rules that form them are written in Snakemake. Snakemake is a programmimg tool that enables the creation of analysis pipelines, based on the python language and the make concepts of rules and targets.

A rule contains the commands to generate a given target. A workflow can be defined as a series of files generated by successive rules.

This program contains several reusable NGS-specific rules, as well as a few workflow examples for standard ChIP-seq and RNA-seq analyses.

A tutorial on Snakemake basic usage is available in the doc section: doc/snakemake_tutorial.

Example study cases (data and workflow)

In the example directory you will find several case studies ready to be tested. Each study includes the following files:

  • config.yml, a configuration file that contains the necessary paths and parameters
  • samples.tab contains a list of sample IDs, and possibly any additional info on samples
  • design.tab contains the samples to be compared (typically, pairs of ChIP/input in a ChIP-seq study)
  • README.md, a file describing how to execute the corresponding workflow

Follow the instructions in the README.md file in order to execute the whole workflow, using one of the ready-to-use workflows.

These workflows generate a flowchart of the analysis, fo example here is the chart for ChIP-seq_SE_GSE20870 study case:

The differential expression performed via SARtools in the workflow, can produce text or html reports.

An RNA-seq workflow example

The Flavio analyses is a tumor versus control case study of mouse RNA-seq paired-ended fastq files. The Flavio workflow we designed performed:

  1. quality control with fastqc
  2. mapping with subread-align
  3. indexing with subread-index
  4. feature count with subread
  5. differential expression in R with SARtools (edgeR and DESeq2)

Here is the chart for this workflow :

Documentation

In order to begin building your own workflow for your data to analyse, you may want to check the how_to_build_a_new_workflow guide (not complete, ongoing work).

More documentation can be found in the doc directory.

It includes:

  • A Snakemake tutorial section (snakemake_tutorial)
  • General tutorials on NGS tools installation, RSAT installation... (install_protocols)
  • Instructions for building a virtual machine on the IFB cloud / under VirtualBox / using a Docker image in order to run a snakemake workflow (gene-regulation_tutorials)

Some general information about NGS can be found in the Wiki section.

Contact

At the BCF:

At the TAGC: (original gene-regulation package)

References

  1. Köster, Johannes and Rahmann, Sven. "Snakemake - A scalable bioinformatics workflow engine". Bioinformatics 2012.
You can’t perform that action at this time.