Mfuzz_RNAseq.R

A R script to perform clustering of gene expression time-series RNA-seq data with Mfuzz.

Required R libraries : optparse, tools, Mfuzz, GenomicFeatures, DESeq, edgeR

Mfuzz webpage : http://mfuzz.sysbiolab.eu/ Mfuzz paper : http://w3.ualg.pt/%7Emfutschik/publications/bioinformation.pdf

Mfuzz_RNAseq.R take as input a set of RNA-seq count tables, one per sample, from HTSeq-count for example. All the RNA-seq count tables must be contain in a same folder, given in input of the script.

For example, a folder containing four count data files : Sample1.txt,Sample2.txt,Sample3.txt,Sample4.txt

Sample1.txt contains the following data, without header :

GeneID1	S1Count1
GeneID2	S1Count2
GeneID3	S1Count3
GeneID4	S1Count4

And Mfuzz_RNAseq.R read all the file and generates :

GeneID	Sample1	Sample2	Sample3	Sample4
GeneID2	S1Count2	S2Count2	S3Count2	S4Count2
GeneID3	S1Count3	S2Count3	S3Count3	S4Count3
GeneID4	S1Count4	S2Count4	S3Count4	S4Count4

From this table, Mfuzz_RNAseq.R performs a complete RNAseq data normalization and then uses Mfuzz package to perform a soft clustering of gene expression time-series data.

Normalization steps : From the input count tables, the Mfuzz_RNAseq.R script performs a library size normalization with DESeq method and then adjust these normalized data for gene length (normalized data / gene length). These normalization steps are carried out to make all the samples comparable, which is required by Mfuzz package.

Soft clustering steps : With these last normalized data (called RPKN data), the Mfuzz_RNAseq.R script performs a genes clustering analysis with Mfuzz package, generating clusters and associated genes lists.

This script has three principal inputs :

the argument "--folder" or "-f" which is the directory containing all the RNA-seq count tables (and only these files). Mfuzz_RNAseq.R will read and merge all these tables and will perform the normalization steps.
the argument "--annotation" or "-a" is the path to an genes/transcripts annotation file (gff or gtf format), allowing to calculate the genes length (sum of the exons length, overlap of exons is take into account). This lengths are used during the data normalization by gene length.
the argument "--time" or "-t" give the time value of each file by respecting the same order in the vector than the files in the folder. This is a list of type 'time1,time1,time1,time2,time2,time2,time3'. If several files correspond to a same time (replicates), give the same time value and then the script performs the mean on the normalized counts of all the samples of a same time to perform the soft clustering.

For a description of optional arguments, type : /usr/bin/Rscript Mfuzz_RNAseq.R -h

Minimal command: /usr/bin/Rscript Mfuzz_RNAseq.R -f count_files_folder -a annotation -t time

Complete command: /usr/bin/Rscript Mfuzz_RNAseq.R -f count_files_folder -a annotation -b gene_name_attribute -t time -n nb_clusters -m membership_cutoff -s min_std -e exclude_thres -r replacement_mode -o output_directory

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Mfuzz_RNAseq.R		Mfuzz_RNAseq.R
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mfuzz_RNAseq.R

About

Releases

Packages

Languages

a-velt/Mfuzz_RNAseq

Folders and files

Latest commit

History

Repository files navigation

Mfuzz_RNAseq.R

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages