Skip to content

Latest commit

 

History

History
112 lines (67 loc) · 3.04 KB

DIFF_EXP.md

File metadata and controls

112 lines (67 loc) · 3.04 KB

Quick overview


Here we describe the differential expression part of the workflow.

An analyse should not last more than 10 minutes.

Set up Tools


First you need to check that the following tools are installed on server/computer.

Scripts available here are in Python3.
It's not required but advised to install Conda if python3 is not set up on your computer.
It will make things easier then for installing tools or switching to older python version if needed.

Conda : here

DIFFERENTIAL EXPRESSION :

R : R here

Install following packages for R :

source("https://bioconductor.org/biocLite.R")
	biocLite("data.table")
	biocLite("reshape2")
	biocLite("edgeR")
	biocLite("DESeq2")
	biocLite("limma")
	biocLite("RColorBrewer")
	biocLite("gplots")
	biocLite("heatmap3")
	biocLite("grDevices")
	biocLite("genefilter")
	biocLite("ggplot2")
	biocLite("GenomicFeatures")
	biocLite("AnnotationDbi")
	biocLite("biomaRt")
	biocLite("stringr")
	biocLite("org.Hs.eg.db")
	#biocLite("vsn")
	biocLite("plyr")
	biocLite("pheatmap")
	biocLite("PoiClaClu")
	biocLite("gtools")

Create config file in json format


You need an init.json and diff_exp.json to launch this script.

init.json is called automatically. Create the file in in configs directory.

Only scriptDir variable need to be set up in your init.json :

	"scriptDir"                      : "/home/jean-philippe.villemin/code/RNA-SEQ/",

To get an overview of the json, look into configs directory.

Here we show an example for the diff_exp.json :

alt text

Launch differential expression


	python3 pathTo/diffGeneExp.py -c pathToConfigFile/diff_exp.json -p TestConditionName_vs_NormalConditionName

This script is a wrapper calling a Rscript called diff_exp.R. diff_exp.R will use Design.csv & Raw_read_counts.csv created by the python wrapper using json configuration file.

Design.csv & Raw_read_counts.csv should be in $path_to_output/output/$project_name/ directory.

If you already have Design.csv & Raw_read_counts.csv , you can execute directly the Rscript as follows :

Rscript ${PATH_TO_SCRIPT}/diff_exp.R  --dir ${PATH_TO_DATA}/[DIR_NAME] --cond1 [COND1]  --cond2 [COND2]  ${PATH_TO_DATA}/[DESIGN.csv] ${PATH_TO_DATA}/[GENE_READ_COUNT.csv] 

This is how Design.csv should be :

alt text

When you call the script, the p parameter need TestConditionName_vs_NormalConditionName to be set. It should be set in accordance with what you wrote in design.csv.

Note : No need of last column.

This is how Raw_read_counts.csv should be :

alt text

Finally you get the following directories as output :

alt text )