Gene Regulation @ Calanques BCF
Authors : Edlira Nano, Claire Rioualen
This work was funded by the IFB project T5 : https://www.france-bioinformatique.fr/fr/projets2015/t5-project
This program holds shared code to produce workflows for the analysis of Next Generation Sequencing data related to gene regulation: ChIP-seq, RNA-seq, and related technologies, as used at the Calanques BCF (Bioinformatics Core Facility, TAGC & IBDM labs).
This program is a fork of the original France Genomique Workpackage 2.6 - Gene Regulation : https://github.com/rioualen/gene-regulation from Claire Rioualen.
This version of the gene-regulation package follows the guidelines of:
This program is written in the Snakemake workflow management system. Python, shell and R scripts are called from the snakemake workflows. In addition, several well-known NGS analysis tools are called from the snakemake workflows.
For this program to run you will need to install the following software :
- R 3+ (debian packaged)
- Python 2.7/3.4 (debian packaged)
- Snakemake 3.4+ (debian packaged)
Depending on the rules and workflow you will use on your analysis, you may also need to install the following :
- SRA Toolkit (debian packaged)
- Sickle (debian packaged)
- BWA (debian packaged)
- Bowtie (debian packaged)
- Bowtie 2 (debian packaged)
- SAMtools 1.3+ (debian packaged)
- FastQC (0.11.2+) (debian packaged)
- bedtools (debian packaged)
- HOMER, blast, weblogo/seqlogo (see install.md file
- SARtools (packaged for R or bioconda, see install.md file
- MACS 14 (1.4.3)
Installing of non debian-packaged software
blat programs to run, both not packaged:
blat on Linux:
1 take the latest blatSrcXX.zip archive from https://users.soe.ucsc.edu/~kent/src/
2 install it on /usr/local following instruction from http://nix-bio.blogspot.fr/2013/10/installing-blat-and-blast.html
3 to avoid the "jkweb.a no rule" problem compile it with make MACHTYPE=$MACHTYPE
You have to install the
weblogo archive from http://weblogo.berkeley.edu/
The archive already contains the seqlogo binary file ready for use.
On the SARtools GitHub page follow the install instructions on the README file, to install it either within
R, or using
Alternative installation options
The original project also contains a makefile that installs all the tools and dependencies used by gene-regulation. [NB currently only ChIP-seq dependencies are included, RNA-seq specific tools are to be included soon.]
make -f gene-regulation/scripts/makefiles/install_tools_and_libs.mk all source ~/.bashrc
The original poject recommends using one of the tutorials on virtualization, in order to run the workflows under a unix system without damaging your installation.
Full tutorials can be found in the
doc section, including the creation of a virtual machine/docker container, the installation of all the tools and dependencies and the execution of the workflows:
These tutorials have been developed for ChIP-seq studies; however RNA-seq pipelines are soon to be included.
The workflows and the individual rules that form them are written in Snakemake. Snakemake is a programmimg tool that enables the creation of analysis pipelines, based on the python language and the make concepts of rules and targets.
A rule contains the commands to generate a given target. A workflow can be defined as a series of files generated by successive rules.
This program contains several reusable NGS-specific rules, as well as a few workflow examples for standard ChIP-seq and RNA-seq analyses.
A tutorial on Snakemake basic usage is available in the doc section:
Example study cases (data and workflow)
In the example directory you will find several case studies ready to be tested. Each study includes the following files:
config.yml, a configuration file that contains the necessary paths and parameters
samples.tabcontains a list of sample IDs, and possibly any additional info on samples
design.tabcontains the samples to be compared (typically, pairs of ChIP/input in a ChIP-seq study)
README.md, a file describing how to execute the corresponding workflow
Follow the instructions in the
README.md file in order to execute the whole workflow, using one of the ready-to-use workflows.
The differential expression performed via SARtools in the workflow, can produce text or html reports.
An RNA-seq workflow example
The Flavio analyses is a tumor versus control case study of mouse RNA-seq paired-ended fastq files. The Flavio workflow we designed performed:
- quality control with fastqc
- mapping with subread-align
- indexing with subread-index
- feature count with subread
- differential expression in R with SARtools (edgeR and DESeq2)
In order to begin building your own workflow for your data to analyse, you may want to check the how_to_build_a_new_workflow guide (not complete, ongoing work).
More documentation can be found in the
- A Snakemake tutorial section (
- General tutorials on NGS tools installation, RSAT installation... (
- Instructions for building a virtual machine on the IFB cloud / under VirtualBox / using a Docker image in order to run a snakemake workflow (
Some general information about NGS can be found in the Wiki section.
At the BCF:
At the TAGC: (original gene-regulation package)
- Köster, Johannes and Rahmann, Sven. "Snakemake - A scalable bioinformatics workflow engine". Bioinformatics 2012.