# Table of Contents
 <p><div class="lev1 toc-item"><a href="#PEB-Belgrade---Bioconductor-workshop" data-toc-modified-id="PEB-Belgrade---Bioconductor-workshop-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>PEB Belgrade - Bioconductor workshop</a></div><div class="lev2 toc-item"><a href="#Requirements" data-toc-modified-id="Requirements-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Requirements</a></div><div class="lev2 toc-item"><a href="#Which-libraries-are-we-installing?" data-toc-modified-id="Which-libraries-are-we-installing?-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Which libraries are we installing?</a></div><div class="lev1 toc-item"><a href="#The-Annotation-packages-in-Bioconductor" data-toc-modified-id="The-Annotation-packages-in-Bioconductor-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>The Annotation packages in Bioconductor</a></div><div class="lev1 toc-item"><a href="#The-Homo.sapiens-package" data-toc-modified-id="The-Homo.sapiens-package-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>The Homo.sapiens package</a></div><div class="lev2 toc-item"><a href="#Gene-symbols-and-IDs:-the-org.Hs.eg.db-package" data-toc-modified-id="Gene-symbols-and-IDs:-the-org.Hs.eg.db-package-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Gene symbols and IDs: the org.Hs.eg.db package</a></div><div class="lev3 toc-item"><a href="#Entrez-and-symbols" data-toc-modified-id="Entrez-and-symbols-3.1.1"><span class="toc-item-num">3.1.1&nbsp;&nbsp;</span>Entrez and symbols</a></div><div class="lev4 toc-item"><a href="#Exercise-1--Gene-Ontology" data-toc-modified-id="Exercise-1--Gene-Ontology-3.1.1.1"><span class="toc-item-num">3.1.1.1&nbsp;&nbsp;</span>Exercise 1- Gene Ontology</a></div><div class="lev3 toc-item"><a href="#Gene-Ontology-Enrichment-with-DOSE-and-clusterProfiler" data-toc-modified-id="Gene-Ontology-Enrichment-with-DOSE-and-clusterProfiler-3.1.2"><span class="toc-item-num">3.1.2&nbsp;&nbsp;</span>Gene Ontology Enrichment with DOSE and clusterProfiler</a></div><div class="lev2 toc-item"><a href="#Getting-gene-coordinates:-the-TxDB-packages" data-toc-modified-id="Getting-gene-coordinates:-the-TxDB-packages-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Getting gene coordinates: the TxDB packages</a></div><div class="lev1 toc-item"><a href="#Calculating-Enrichment" data-toc-modified-id="Calculating-Enrichment-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Calculating Enrichment</a></div><div class="lev1 toc-item"><a href="#Annotation-Hub" data-toc-modified-id="Annotation-Hub-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Annotation Hub</a></div>

# PEB Belgrade - Bioconductor workshop

Giovanni M. Dall'Olio, GSK. 10/09/2016. http://bioinfoblog.it

Welcome to the Bioconductor / Data Integration workshop.

This workshop is heavily inspired by the Coursera Bioconductor course. See here for materials: http://kasperdanielhansen.github.io/genbioconductor/


## Requirements

This workshop requires several bioconductor libraries, which take a while to install.

Please start their installation by copying&pasting the commands below. We'll continue the lecture while they get installed:

```
# dplyr
install.packages(c("dplyr"))

# bioconductor
source("http://bioconductor.org/biocLite/R")
biocLite("Homo.sapiens")
biocLite("rtracklayer")
biocLite("DOSE")
biocLite("clusterProfiler")
biocLite("AnnotationHub")
```

## Which libraries are we installing?

- **Homo.sapiens**: Wrapper containing several H.sapiens-related packages:
    - **TxDB**: coordinates for genes, transcripts, exons...
    - **org.Hs.eg.db**: Gene symbols
    - **GenomicRanges**: allows to work with gene coordinates
- **rtracklayer**: allows to import BED files and other formats
- **DOSE** and **clusterProfiler**: for ontology enrichment (GO, Disease Ontology, Reactome)
- **AnnotationHub**: allows to download data from UCSC and many other sources

# The Annotation packages in Bioconductor

Bioconductor contains several data packages (https://www.bioconductor.org/packages/release/data/annotation/), containing contain datasets from public sources for multiple organisms.

Some examples:

- **TxDB** objects: coordinates for genes, transcripts, exons...
- **BSGenome**: genome sequences
- **microarray ids** (e.g. hgu133): conversions probe to genes for Affymetrix and Illumina arrays
- **org.\*.eg.db**: gene symbol to id conversion (entrez, ensembl, GO, ..)

In addition two packages allow to access large dataset repositories:

- **biomaRt**: any biomart installation, e.g. ensembl, hgnc, (see http://www.biomart.org/)
- **AnnotationHub**: access to several resources, e.g. any track in the UCSC browser, and more

In this tutorial we will see some of these (TxDB, org.eg.db, AnnotationHub).

# The Homo.sapiens package

Let's load the Homo.sapiens package. You will see that it will load several other packages:

In [4]:
suppressPackageStartupMessages(library(Homo.sapiens))

Bioconductor contains similar wrapper packages for the most common model species (e.g. mouse, rat). For other species, similar data packages may be available, although not organized with a similar wrapper.

## Gene symbols and IDs: the org.Hs.eg.db package

The org.\*.eg.db packages allow to retrieve gene symbols and ids relative to a species (see [list of all packages](https://www.bioconductor.org/packages/release/data/annotation/)). The data is updated every two years, which is a good compromise between reproducibility and getting recent data.

To see which data is included in this package, we can open its help page:
```
library(help=org.Hs.eg.db)
```
In alternative, we can use the columns() function:

In [3]:
columns(org.Hs.eg.db)

This means that the org.Hs.eg.db package contains mapping between Entrez IDs, PFAM, Prosite, Genenames, GO, etc... for all human genes.



### Entrez and symbols

The select() function from AnnotationDbi allows to access all the IDs and Symbols in the org.db object.

For example, here is how to get entrez and description of genes MGAT2 and MGAT3:

In [8]:
AnnotationDbi::select(org.Hs.eg.db, keys=as.character(c("MGAT2", "MGAT3")), keytype='SYMBOL', columns=c('ENTREZID','GENENAME'))

SYMBOL,ENTREZID,GENENAME
MGAT2,4247,"mannosyl (alpha-1,6-)-glycoprotein beta-1,2-N-acetylglucosaminyltransferase"
MGAT3,4248,"mannosyl (beta-1,4-)-glycoprotein beta-1,4-N-acetylglucosaminyltransferase"


The "keys" argument defines the symbols that we want to search. The "keytype" and "columns" define the type of input and output. Use the functions keytypes(org.Hs.eg.db) and columns(org.Hs.eg.db) to see which values are supported. 

#### Exercise 1: symbol and Ensembl

What is the name and the Ensembl ID of the gene with Entrez ID 1234? 

In [9]:
AnnotationDbi::select(org.Hs.eg.db, keys=as.character(c("1234")), keytype='ENTREZID', columns=c('SYMBOL','ENSEMBL'))

ENTREZID,SYMBOL,ENSEMBL
1234,CCR5,ENSG00000160791


#### Exercise 2- Gene Ontology

The Gene Ontology database annotates terms related to the biological process, molecular function, and cellular compartment for every gene.

Can you get all the Gene Ontology (GO) terms associated to PTEN?

In [5]:
head(select(org.Hs.eg.db, keys='PTEN', keytype='SYMBOL', columns='GO'))

SYMBOL,GO,EVIDENCE,ONTOLOGY
PTEN,GO:0000079,TAS,BP
PTEN,GO:0000287,IEA,MF
PTEN,GO:0001525,IEA,BP
PTEN,GO:0001933,IDA,BP
PTEN,GO:0002902,IEA,BP
PTEN,GO:0004438,IDA,MF


To get the definition of these GO ids, we can use the GO.db database:

In [6]:
PTEN.go = select(org.Hs.eg.db, keys='PTEN', keytype='SYMBOL', columns='GO')
PTEN.go$TERM = AnnotationDbi::select(GO.db, keys=as.character(PTEN.go$GO), columns="TERM")$TERM
head(PTEN.go)

SYMBOL,GO,EVIDENCE,ONTOLOGY,TERM
PTEN,GO:0000079,TAS,BP,regulation of cyclin-dependent protein serine/threonine kinase activity
PTEN,GO:0000287,IEA,MF,magnesium ion binding
PTEN,GO:0001525,IEA,BP,angiogenesis
PTEN,GO:0001933,IDA,BP,negative regulation of protein phosphorylation
PTEN,GO:0002902,IEA,BP,regulation of B cell apoptotic process
PTEN,GO:0004438,IDA,MF,phosphatidylinositol-3-phosphatase activity


### Gene Ontology Enrichment with DOSE and clusterProfiler

When we have a long list of genes, listing all the Gene Ontology terms associated to each gene is not really useful, as we don't have the time to look into it manually.

It is more useful to do an Ontology Enrichment analysis, to see which are the most represented terms in the list.

One way to do enrichment analysis is to use the enrichGO function from clusterProfiler:

In [5]:
# The gcSample data contains a list of genes
library(clusterProfiler)
data(gcSample)  
mygenes = gcSample$X1
print(mygenes[1:10])

 [1] "4597"  "7111"  "5266"  "2175"  "755"   "23046" "3931"  "6770"  "993"  
[10] "229"  


In [6]:
# Let's calculate the enrichment
myenrich = enrichGO(mygenes, "human", ont="BP", readable=T)
summary(myenrich)

Unnamed: 0,ID,Description,GeneRatio,BgRatio,pvalue,p.adjust,qvalue,geneID,Count
GO:0008150,GO:0008150,biological_process,193/193,16362/18585,1.831414e-11,5.730495e-08,5.093259e-08,MED6/EIF1/MSLN/KLHL41/RAPGEF3/HEXIM1/POSTN/CXCR6/PLK4/NMU/ADAMTS7/SLC2A6/MAP4K1/CYP4F8/CHRNA4/CLCN4/ADD2/COL3A1/ZNF280B/CRYBA2/CSN3/TOM1L2/CTSW/DGKG/CADM4/ELF5/ELK4/EMX1/EMX2/EPHB1/FABP7/FANCA/FCER2/PHACTR1/LAMB4/MRAS/NLRP1/ALDOB/KIF21B/ARHGAP26/DDN/NFASC/NACAD/LARP4B/TRIM2/ZDHHC17/CDC42EP4/KCNE5/FUT7/FYB/GABRA5/TRIM58/NCR3/KLK13/PLA2G2D/OR2W1/OR2J2/GFRA3/B9D1/GJB5/SIGLEC9/GPR162/GNAT2/GP2/GPR17/CDH19/CTNNA3/GRM5/CXCL3/TMOD2/ICOS/NPC1L1/LRP12/HIST1H1A/HBE1/CTAG2/HK3/HMGA1/APLP1/IL2RA/CXCL8/IL12RB2/AQP9/ITIH1/KCND3/KCNG1/KCNH2/KNG1/KRT16/AFF3/LCAT/TRPM1/MUC7/MVD/NF2/NOVA1/NRTN/ATP2B2/PAWR/PAX6/TRAT1/PCDH9/LEF1/CYP39A1/TLR8/NME8/ANGPT4/VGLL1/RHCG/PDGFRA/ENPP1/MPP6/PI3/PKP1/PLA2G1B/UBASH3A/S1PR5/PPARD/HYDIN/DCHS2/HAUS6/SCN3B/TMPRSS4/NCLN/TTYH1/CREBZF/BCAT1/MAP4K2/ACE2/PRDM13/RSU1/S100A8/SERPINB3/CCL20/CXCL5/DPEP3/LHX5/ROBO3/EFCAB6/BMP5/SLC7A1/SLC15A1/SMARCA4/SNRNP70/SNTB1/SOX15/DST/BPI/STAR/BUB1/TMOD1/TNNI2/TRO/UGT8/C21orf2/ZNF214/IL1R2/MMP28/THAP9/EPHX3/TNIP3/FOXRED2/PNPLA3/PDCD1LG2/ALX1/ACTL8/COLQ/FZD9/KCNAB2/LGR5/LY6D/RUNX1T1/MARCO/B3GALT2/SERPINB7/TNFRSF10D/CDKL1/FGF18/NRP2/HIST1H2BJ/CD6/CD8B/MS4A3/LHX2/CPNE6/NRXN3/HAND2/TBX4/GABBR2/CD47/MAGI2/CDC25A/XYLB,193
GO:0044707,GO:0044707,single-multicellular organism process,105/193,6392/18585,8.310631e-09,1.300198e-05,1.155615e-05,MSLN/KLHL41/RAPGEF3/HEXIM1/POSTN/PLK4/NMU/ADAMTS7/CHRNA4/ADD2/COL3A1/CRYBA2/CSN3/DGKG/ELF5/EMX1/EMX2/EPHB1/FABP7/FANCA/MRAS/NLRP1/ARHGAP26/NFASC/KCNE5/FUT7/GABRA5/OR2W1/OR2J2/GFRA3/B9D1/GJB5/GNAT2/CTNNA3/GRM5/TMOD2/ICOS/NPC1L1/HBE1/APLP1/IL2RA/CXCL8/IL12RB2/AQP9/KCNH2/KNG1/KRT16/AFF3/LCAT/TRPM1/NF2/NRTN/ATP2B2/PAWR/PAX6/PCDH9/LEF1/CYP39A1/TLR8/NME8/ANGPT4/PDGFRA/ENPP1/PKP1/PLA2G1B/UBASH3A/S1PR5/PPARD/HYDIN/SCN3B/ACE2/PRDM13/S100A8/SERPINB3/CCL20/LHX5/ROBO3/BMP5/SLC15A1/SMARCA4/SNTB1/SOX15/BPI/STAR/TMOD1/TNNI2/TRO/UGT8/PDCD1LG2/ALX1/FZD9/KCNAB2/LGR5/LY6D/SERPINB7/CDKL1/FGF18/NRP2/LHX2/CPNE6/NRXN3/HAND2/TBX4/CD47/MAGI2,105
GO:0050789,GO:0050789,regulation of biological process,144/193,10322/18585,2.804956e-08,2.845883e-05,2.529418e-05,MED6/EIF1/KLHL41/RAPGEF3/HEXIM1/POSTN/CXCR6/PLK4/NMU/ADAMTS7/MAP4K1/CHRNA4/CLCN4/ADD2/COL3A1/ZNF280B/CSN3/TOM1L2/DGKG/ELF5/ELK4/EMX1/EMX2/EPHB1/FABP7/FANCA/FCER2/PHACTR1/MRAS/NLRP1/ALDOB/ARHGAP26/DDN/LARP4B/TRIM2/ZDHHC17/CDC42EP4/KCNE5/FYB/GABRA5/NCR3/OR2W1/OR2J2/GFRA3/B9D1/SIGLEC9/GPR162/GNAT2/GPR17/CTNNA3/GRM5/CXCL3/TMOD2/ICOS/NPC1L1/LRP12/HBE1/HMGA1/APLP1/IL2RA/CXCL8/IL12RB2/ITIH1/KCND3/KCNG1/KCNH2/KNG1/KRT16/AFF3/LCAT/TRPM1/MVD/NF2/NOVA1/NRTN/ATP2B2/PAWR/PAX6/TRAT1/LEF1/TLR8/NME8/ANGPT4/VGLL1/PDGFRA/ENPP1/PI3/PKP1/PLA2G1B/UBASH3A/S1PR5/PPARD/SCN3B/NCLN/TTYH1/CREBZF/MAP4K2/ACE2/PRDM13/RSU1/S100A8/SERPINB3/CCL20/CXCL5/LHX5/EFCAB6/BMP5/SMARCA4/SNRNP70/SOX15/DST/BPI/STAR/BUB1/TMOD1/TNNI2/TRO/C21orf2/ZNF214/IL1R2/MMP28/TNIP3/PDCD1LG2/ALX1/FZD9/KCNAB2/LGR5/RUNX1T1/MARCO/SERPINB7/TNFRSF10D/CDKL1/FGF18/NRP2/CD8B/MS4A3/LHX2/NRXN3/HAND2/TBX4/GABBR2/CD47/MAGI2/CDC25A,144
GO:0032501,GO:0032501,multicellular organismal process,106/193,6644/18585,3.711725e-08,2.845883e-05,2.529418e-05,MSLN/KLHL41/RAPGEF3/HEXIM1/POSTN/PLK4/NMU/ADAMTS7/CHRNA4/ADD2/COL3A1/CRYBA2/CSN3/DGKG/ELF5/EMX1/EMX2/EPHB1/FABP7/FANCA/MRAS/NLRP1/ARHGAP26/NFASC/KCNE5/FUT7/GABRA5/OR2W1/OR2J2/GFRA3/B9D1/GJB5/GNAT2/CTNNA3/GRM5/TMOD2/ICOS/NPC1L1/HIST1H1A/HBE1/APLP1/IL2RA/CXCL8/IL12RB2/AQP9/KCNH2/KNG1/KRT16/AFF3/LCAT/TRPM1/NF2/NRTN/ATP2B2/PAWR/PAX6/PCDH9/LEF1/CYP39A1/TLR8/NME8/ANGPT4/PDGFRA/ENPP1/PKP1/PLA2G1B/UBASH3A/S1PR5/PPARD/HYDIN/SCN3B/ACE2/PRDM13/S100A8/SERPINB3/CCL20/LHX5/ROBO3/BMP5/SLC15A1/SMARCA4/SNTB1/SOX15/BPI/STAR/TMOD1/TNNI2/TRO/UGT8/PDCD1LG2/ALX1/FZD9/KCNAB2/LGR5/LY6D/SERPINB7/CDKL1/FGF18/NRP2/LHX2/CPNE6/NRXN3/HAND2/TBX4/CD47/MAGI2,106
GO:0065007,GO:0065007,biological regulation,148/193,10810/18585,4.547592e-08,2.845883e-05,2.529418e-05,MED6/EIF1/KLHL41/RAPGEF3/HEXIM1/POSTN/CXCR6/PLK4/NMU/ADAMTS7/MAP4K1/CHRNA4/CLCN4/ADD2/COL3A1/ZNF280B/CSN3/TOM1L2/DGKG/ELF5/ELK4/EMX1/EMX2/EPHB1/FABP7/FANCA/FCER2/PHACTR1/MRAS/NLRP1/ALDOB/ARHGAP26/DDN/LARP4B/TRIM2/ZDHHC17/CDC42EP4/KCNE5/FYB/GABRA5/NCR3/OR2W1/OR2J2/GFRA3/B9D1/SIGLEC9/GPR162/GNAT2/GPR17/CTNNA3/GRM5/CXCL3/TMOD2/ICOS/NPC1L1/LRP12/HBE1/HK3/HMGA1/APLP1/IL2RA/CXCL8/IL12RB2/AQP9/ITIH1/KCND3/KCNG1/KCNH2/KNG1/KRT16/AFF3/LCAT/TRPM1/MVD/NF2/NOVA1/NRTN/ATP2B2/PAWR/PAX6/TRAT1/LEF1/TLR8/NME8/ANGPT4/VGLL1/RHCG/PDGFRA/ENPP1/PI3/PKP1/PLA2G1B/UBASH3A/S1PR5/PPARD/SCN3B/NCLN/TTYH1/CREBZF/MAP4K2/ACE2/PRDM13/RSU1/S100A8/SERPINB3/CCL20/CXCL5/LHX5/EFCAB6/BMP5/SMARCA4/SNRNP70/SOX15/DST/BPI/STAR/BUB1/TMOD1/TNNI2/TRO/C21orf2/ZNF214/IL1R2/MMP28/TNIP3/PDCD1LG2/ALX1/COLQ/FZD9/KCNAB2/LGR5/RUNX1T1/MARCO/SERPINB7/TNFRSF10D/CDKL1/FGF18/NRP2/CD8B/MS4A3/LHX2/NRXN3/HAND2/TBX4/GABBR2/CD47/MAGI2/CDC25A,148
GO:0022610,GO:0022610,biological adhesion,37/193,1372/18585,6.794327e-08,3.287788e-05,2.922183e-05,MSLN/POSTN/COL3A1/CADM4/EPHB1/LAMB4/NFASC/FUT7/SIGLEC9/CDH19/CTNNA3/ICOS/APLP1/IL2RA/CXCL8/KNG1/NF2/PAWR/PCDH9/LEF1/PDGFRA/PKP1/PPARD/DCHS2/TTYH1/ACE2/RSU1/S100A8/DST/TRO/PDCD1LG2/LY6D/NRP2/CD6/CD8B/NRXN3/CD47,37
GO:0050794,GO:0050794,regulation of cellular process,139/193,9916/18585,7.355230e-08,3.287788e-05,2.922183e-05,MED6/EIF1/KLHL41/RAPGEF3/HEXIM1/POSTN/CXCR6/PLK4/NMU/ADAMTS7/MAP4K1/CHRNA4/CLCN4/ADD2/COL3A1/ZNF280B/TOM1L2/DGKG/ELF5/ELK4/EMX1/EMX2/EPHB1/FABP7/FANCA/FCER2/MRAS/NLRP1/ARHGAP26/DDN/LARP4B/TRIM2/ZDHHC17/CDC42EP4/KCNE5/FYB/GABRA5/OR2W1/OR2J2/GFRA3/B9D1/SIGLEC9/GPR162/GNAT2/GPR17/CTNNA3/GRM5/CXCL3/TMOD2/ICOS/NPC1L1/LRP12/HBE1/HMGA1/APLP1/IL2RA/CXCL8/IL12RB2/ITIH1/KCND3/KCNG1/KCNH2/KNG1/KRT16/AFF3/LCAT/TRPM1/MVD/NF2/NOVA1/NRTN/ATP2B2/PAWR/PAX6/TRAT1/LEF1/TLR8/NME8/ANGPT4/VGLL1/PDGFRA/ENPP1/PI3/PKP1/PLA2G1B/UBASH3A/S1PR5/PPARD/SCN3B/NCLN/CREBZF/MAP4K2/ACE2/PRDM13/RSU1/S100A8/SERPINB3/CCL20/CXCL5/LHX5/EFCAB6/BMP5/SMARCA4/SNRNP70/SOX15/DST/BPI/STAR/BUB1/TMOD1/TNNI2/TRO/C21orf2/ZNF214/IL1R2/MMP28/TNIP3/PDCD1LG2/ALX1/FZD9/KCNAB2/LGR5/RUNX1T1/MARCO/SERPINB7/TNFRSF10D/CDKL1/FGF18/NRP2/CD8B/MS4A3/LHX2/NRXN3/HAND2/TBX4/GABBR2/CD47/MAGI2/CDC25A,139
GO:0007155,GO:0007155,cell adhesion,36/193,1365/18585,1.832290e-07,7.166542e-05,6.369617e-05,MSLN/POSTN/COL3A1/CADM4/EPHB1/LAMB4/NFASC/FUT7/SIGLEC9/CDH19/CTNNA3/ICOS/APLP1/IL2RA/CXCL8/KNG1/NF2/PAWR/PCDH9/LEF1/PDGFRA/PKP1/PPARD/DCHS2/TTYH1/RSU1/S100A8/DST/TRO/PDCD1LG2/LY6D/NRP2/CD6/CD8B/NRXN3/CD47,36
GO:0044699,GO:0044699,single-organism process,166/193,13071/18585,2.365895e-07,8.225428e-05,7.310754e-05,MSLN/KLHL41/RAPGEF3/HEXIM1/POSTN/CXCR6/PLK4/NMU/ADAMTS7/SLC2A6/MAP4K1/CYP4F8/CHRNA4/CLCN4/ADD2/COL3A1/CRYBA2/CSN3/TOM1L2/DGKG/ELF5/ELK4/EMX1/EMX2/EPHB1/FABP7/FANCA/FCER2/PHACTR1/MRAS/NLRP1/ALDOB/KIF21B/ARHGAP26/NFASC/TRIM2/ZDHHC17/CDC42EP4/KCNE5/FUT7/FYB/GABRA5/NCR3/PLA2G2D/OR2W1/OR2J2/GFRA3/B9D1/GJB5/SIGLEC9/GPR162/GNAT2/GP2/GPR17/CTNNA3/GRM5/CXCL3/TMOD2/ICOS/NPC1L1/LRP12/HIST1H1A/HBE1/HK3/HMGA1/APLP1/IL2RA/CXCL8/IL12RB2/AQP9/KCND3/KCNG1/KCNH2/KNG1/KRT16/AFF3/LCAT/TRPM1/MUC7/MVD/NF2/NOVA1/NRTN/ATP2B2/PAWR/PAX6/TRAT1/PCDH9/LEF1/CYP39A1/TLR8/NME8/ANGPT4/RHCG/PDGFRA/ENPP1/PKP1/PLA2G1B/UBASH3A/S1PR5/PPARD/HYDIN/HAUS6/SCN3B/NCLN/TTYH1/BCAT1/MAP4K2/ACE2/PRDM13/RSU1/S100A8/SERPINB3/CCL20/CXCL5/DPEP3/LHX5/ROBO3/BMP5/SLC7A1/SLC15A1/SMARCA4/SNTB1/SOX15/DST/BPI/STAR/BUB1/TMOD1/TNNI2/TRO/UGT8/C21orf2/IL1R2/MMP28/THAP9/TNIP3/FOXRED2/PNPLA3/PDCD1LG2/ALX1/ACTL8/COLQ/FZD9/KCNAB2/LGR5/LY6D/MARCO/B3GALT2/SERPINB7/TNFRSF10D/CDKL1/FGF18/NRP2/CD8B/MS4A3/LHX2/CPNE6/NRXN3/HAND2/TBX4/GABBR2/CD47/MAGI2/CDC25A/XYLB,166
GO:0048731,GO:0048731,system development,72/193,3975/18585,2.853872e-07,8.929765e-05,7.936767e-05,MSLN/KLHL41/RAPGEF3/HEXIM1/POSTN/PLK4/ADAMTS7/ADD2/COL3A1/CRYBA2/CSN3/DGKG/ELF5/EMX1/EMX2/EPHB1/FABP7/FANCA/MRAS/ARHGAP26/NFASC/FUT7/GABRA5/GFRA3/B9D1/GJB5/GNAT2/GRM5/TMOD2/ICOS/APLP1/IL2RA/CXCL8/KRT16/TRPM1/NF2/NRTN/ATP2B2/PAX6/PCDH9/LEF1/ANGPT4/PDGFRA/ENPP1/S1PR5/PPARD/HYDIN/SCN3B/PRDM13/SERPINB3/LHX5/ROBO3/BMP5/SMARCA4/SOX15/STAR/UGT8/ALX1/FZD9/KCNAB2/LGR5/LY6D/SERPINB7/CDKL1/FGF18/NRP2/LHX2/CPNE6/NRXN3/HAND2/TBX4/MAGI2,72


Behind the lines, enrichGO retrieves the GO annotations from the org.Hs.eg.db package, and calculates the enrichment.

We can also plot the enrichment:

In [4]:
plot(myenrich)

## Enrichment using other ontologies

We can also do an enrichment using other ontologies.

For example, Disease Ontology 

In [11]:
library(DOSE)
myenrich.do = enrichDO(mygenes)
plot(myenrich.do)


## Getting gene coordinates: the TxDB packages



# Calculating Enrichment

# Annotation Hub

In [10]:
library(AnnotationHub)
ahub = AnnotationHub()
ahub



Attaching package: ‘AnnotationHub’

The following object is masked from ‘package:Biobase’:

    cache



AnnotationHub with 35306 records
# snapshotDate(): 2016-08-15 
# $dataprovider: BroadInstitute, UCSC, Ensembl, NCBI, Haemcode, Inparanoid8,...
# $species: Homo sapiens, Mus musculus, Bos taurus, Pan troglodytes, Danio r...
# $rdataclass: GRanges, FaFile, BigWigFile, OrgDb, ChainFile, Inparanoid8Db,...
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype 
# retrieve records with, e.g., 'object[["AH2"]]' 

            title                                               
  AH2     | Ailuropoda_melanoleuca.ailMel1.69.dna.toplevel.fa   
  AH3     | Ailuropoda_melanoleuca.ailMel1.69.dna_rm.toplevel.fa
  AH4     | Ailuropoda_melanoleuca.ailMel1.69.dna_sm.toplevel.fa
  AH5     | Ailuropoda_melanoleuca.ailMel1.69.ncrna.fa          
  AH6     | Ailuropoda_melanoleuca.ailMel1.69.pep.all.fa        
  ...       ...                                                 
  AH49436 | Xiphophorus_maculatus.Xipmac4.4.2.dna_rm.toplevel.fa
  AH49437 | Xiphophorus_maculatus.Xipm

In [11]:
library(Homo.sapiens)
genes(TxDb.Hsapiens.UCSC.hg19.knownGene)



GRanges object with 23056 ranges and 1 metadata column:
        seqnames                 ranges strand   |     gene_id
           <Rle>              <IRanges>  <Rle>   | <character>
      1    chr19 [ 58858172,  58874214]      -   |           1
     10     chr8 [ 18248755,  18258723]      +   |          10
    100    chr20 [ 43248163,  43280376]      -   |         100
   1000    chr18 [ 25530930,  25757445]      -   |        1000
  10000     chr1 [243651535, 244006886]      -   |       10000
    ...      ...                    ...    ... ...         ...
   9991     chr9 [114979995, 115095944]      -   |        9991
   9992    chr21 [ 35736323,  35743440]      +   |        9992
   9993    chr22 [ 19023795,  19109967]      -   |        9993
   9994     chr6 [ 90539619,  90584155]      +   |        9994
   9997    chr22 [ 50961997,  50964905]      -   |        9997
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome