R package for identifying differentially expressed genes from genome-wide gene expression profiling studies.
R C
Switch branches/tags
Nothing to show
Clone or download
Latest commit 20f4cf1 Aug 13, 2018
Permalink
Failed to load latest commit information.
R
data various updates to new function names and some fixes to documentation Apr 4, 2015
inst Update NEWS Oct 12, 2015
man
src
tests Updates based on bioconductor feedback Apr 10, 2015
vignettes Update vignette. Apr 16, 2015
DESCRIPTION Update DESCRIPTION Sep 7, 2016
LICENSE
NAMESPACE fixed DESCRIPTION file and import jackstraw Oct 6, 2015
README.md

README.md

edge: Extraction of Differential Gene Expression

Introduction

The edge package implements methods for carrying out differential expression analyses of genome-wide gene expression studies. Significance testing using the optimal discovery procedure and generalized likelihood ratio tests (equivalent to F-tests and t-tests) are implemented for general study designs. Special functions are available to facilitate the analysis of common study designs, including time course experiments. Other packages such as snm, sva, and qvalue are integrated in edge to provide a wide range of tools for gene expression analysis.

Installation and Documentation

To install the Bioconductor release version, open R and type:

source("http://bioconductor.org/biocLite.R")
biocLite("edge")

To install the development version, open R and type:

install.packages("devtools")
library("devtools")
install_github(c("jdstorey/qvalue","jdstorey/edge"), build_vignettes = TRUE)

Instructions on using edge can be viewed by typing:

library("edge")
browseVignettes("edge")

Main functions

  • build_models
  • build_study
  • odp
  • lrt
  • fit_models
  • kl_clust
  • apply_sva
  • apply_snm
  • apply_qvalue

Quick start guide

To get started, first load the kidney dataset included in the package:

library(edge)
data(kidney)
names(kidney)

The kidney study is interested in determining differentially expressed genes with respect to age in kidney tissue. The age variable is the age of the subjects and the sex variable is whether the subjects were male or female. The expression values for the genes are contained in the kidexpr variable.

kidexpr <- kidney$kidexpr
age <- kidney$age
sex <- kidney$sex

Once the data has been loaded, the user has two options to create the experimental models: build_models or build_study. If the experiment models are unknown to the user, build_study can be used to create the models:

edge_obj <- build_study(data = kidexpr, adj.var = sex, tme = age, sampling = "timecourse")
full_model <- fullModel(edge_obj)
null_model <- nullModel(edge_obj)

The variable sampling describes the type of experiment performed, adj.var is the adjustment variable and tme is the time variable in the study. If the experiment is more complex then type ?build_study for additional arguments.

If the alternative and null models are known to the user then build_models can be used to make a deSet object:

library(splines)
cov <- data.frame(sex = sex, age = age)
null_model <- ~sex
full_model <- ~sex + ns(age, df=4)
edge_obj <- build_models(data = kidexpr, cov = cov, null.model = null_model, full.model = full_model)

The cov is a data frame of covariates, the null.model is the null model and the full.model is the alternative model. The input cov is a data frame with the column names the same as the variables in the alternative and null models. Once the models have been generated, it is often useful to normalize the gene expression matrix using apply_snm and/or adjust for unmodelled variables using apply_sva.

edge_norm <- apply_snm(edge_obj, int.var=1:ncol(exprs(edge_obj)), diagnose=FALSE)
edge_sva <- apply_sva(edge_norm)

The odp or lrt function can be used on edge_sva to implement either the optimal discovery procedure or the likelihood ratio test, respectively:

# optimal discovery procedure
edge_odp <- odp(edge_sva, bs.its = 30, verbose=FALSE)
# likelihood ratio test
edge_lrt <- lrt(edge_sva)

To access the proportional of null p-values estimate, p-values, q-values and local false discovery rates for each gene, use the function qvalueObj:

qval_obj <- qvalueObj(edge_odp)
qvals <- qval_obj$qvalues
pvals <- qval_obj$pvalues
lfdr <- qval_obj$lfdr
pi0 <- qval_obj$pi0

See the vignette for more detailed explanations of the edge package.