GO-a-GO

GO-a-GO annotates Gene Ontology terms that are enriched in a given set of gene pairs. The enrichment is calculated from a permutation test for overrepresentation of gene pairs that are associated with a shared term. Such gene pairs are counted for the original set of gene pairs and compared against randomized sets in which the structure of the pairs is preserved, but the gene identities (including the associated terms) are permuted.

Installation

You can install the development version of GO-a-GO from GitHub with:

# install.packages("devtools")
devtools::install_github("ajank/GOaGO")

Example

Let us run GO-a-GO on a set of gene pairs associated with chromatin loops identified in human cell line GM12878. We can inspect a few rows of the dataset:

library(GOaGO)
#> 

data("genePairsGM12878")
head(genePairsGM12878)
#>    interactionID chrom1    start1      end1 geneID1      tss1 strand1 chrom2
#>            <int> <char>     <int>     <int>  <char>     <int>  <char> <char>
#> 1:             3  chr10 101190001 101195000    2805 101190530       -  chr10
#> 2:             7  chr10 102100001 102110000    6319 102106772       +  chr10
#> 3:             7  chr10 102100001 102110000    6319 102106772       +  chr10
#> 4:            10  chr10 102810001 102815000   81621 102820999       +  chr10
#> 5:            19  chr10 103600001 103605000   30819 103603677       -  chr10
#> 6:            23  chr10 103985001 103990000   83401 103986143       +  chr10
#>       start2      end2 geneID2      tss2 strand2
#>        <int>     <int>  <char>     <int>  <char>
#> 1: 101370001 101375000   81894 101372701       -
#> 2: 102270001 102280000   25956 102276754       -
#> 3: 102270001 102280000   25956 102279595       -
#> 4: 102900001 102905000    3195 102891061       +
#> 5: 103830001 103835000   79803 103825124       +
#> 6: 104160001 104165000    5662 104169041       -

The column names ending with 1 and 2 refer to the first and second loop anchors, respectively. The essential columns are: geneID1 and geneID2 (gene identifiers from a database of choice) and interactionID (identifier of a chromatin loop). As some loop anchors overlapped multiple Transcription Start Sites, possibly of many genes, for these loops the dataset contains all combinations.

When running GO-a-GO, we should specify that gene identifiers are from the Entrez database, and use the Bioconductor org.Hs.eg.db package as the source of Gene Ontology annotations for human genes, taking all three subontologies (Biological Process, Molecular Function, and Cellular Component). We will run the default 10,000 permutations, and also use the default p-value cutoff of 0.05 with Benjamini-Hochberg correction:

library(org.Hs.eg.db)

# set the number of CPU threads to use
library(BiocParallel)
options(MulticoreParam = MulticoreParam(workers = 2))

goago <- GOaGO(genePairsGM12878,
    keyType = "ENTREZID", OrgDb = org.Hs.eg.db, ont = "ALL"
)
#> Warning in uniqueGenePairs(genePairs): removing 509 duplicated gene pair(s)
#> Warning in uniqueGenePairs(genePairs): removing 87 gene pair(s) containing the
#> same gene twice

We can inspect the enriched GO terms:

head(as.data.frame(goago))
#>   ONTOLOGY         ID
#> 1       BP GO:0034728
#> 2       BP GO:0006334
#> 3       BP GO:0071824
#> 4       BP GO:0040029
#> 5       BP GO:0002495
#> 6       BP GO:0019886
#>                                                                         Description
#> 1                                                           nucleosome organization
#> 2                                                               nucleosome assembly
#> 3                                                  protein-DNA complex organization
#> 4                                          epigenetic regulation of gene expression
#> 5           antigen processing and presentation of peptide antigen via MHC class II
#> 6 antigen processing and presentation of exogenous peptide antigen via MHC class II
#>   Count   PairRatio      BgRatio FoldEnrichment pvalue p.adjust qvalue
#> 1    19 0.010900746 1.580608e-04       68.96552      0        0      0
#> 2    19 0.010900746 1.053356e-04      103.48584      0        0      0
#> 3    19 0.010900746 3.423982e-04       31.83646      0        0      0
#> 4    15 0.008605852 4.303500e-04       19.99733      0        0      0
#> 5     3 0.001721170 1.491681e-05      115.38462      0        0      0
#> 6     3 0.001721170 1.187608e-05      144.92754      0        0      0

Note that some of the p-values have the value of zero, which means that for all the randomizations fewer gene pairs were associated with a given term than in the input dataset.

We can plot the enriched GO terms as a dotplot. The X-axis shows the fold enrichment of the fraction of gene pairs sharing the given term, calculated as a quotient of PairRatio (the fraction in the input gene pairs) and BgRatio (the fraction in permuted ones). Point size indicates the number of input gene pairs sharing the given term, and point color – the adjusted p-value:

DotPlot(goago)

Note that by default only the terms associated with at least 5 gene pairs are shown; you can change this by setting minCount to any other value.

We can also see the sampling distributions of numbers of gene pairs sharing each GO term, obtained for the randomized gene pairs. From these distributions, empirical p-values were calculated:

RidgePlot(goago)
#> Picking joint bandwidth of 0.0684

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
R		R
data		data
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GO-a-GO

Installation

Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GO-a-GO

Installation

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages