-
Notifications
You must be signed in to change notification settings - Fork 0
03 Pathway Genes
There are two ways to build the pathway-gene mapping required by pathway_dsge(): GAF files (mode A) or Bioconductor OrgDb (mode B). Both produce the same named-list output format.
Splits a GAF table into a named list by GO term. Each element is a data.frame of genes in that pathway.
pw <- get_pathway_genes(
gaf,
genes = c("db_object_id", "db_object_symbol"),
unique = TRUE,
min_size = 5,
qualifier = NULL,
evidence = NULL,
aspect = NULL,
go_names = go
)| Parameter | Default | Description |
|---|---|---|
gaf_data |
(required) | Output of read_gaf()
|
genes |
c("db_object_id", "db_object_symbol") |
Columns kept for downstream matching |
unique |
TRUE |
Remove duplicate gene entries per term |
min_size |
5 |
Discard pathways below this gene count |
qualifier |
NULL |
Filter by GAF qualifier (e.g. "enables", "involved_in") |
evidence |
NULL |
Filter by evidence code (e.g. c("IDA", "IPI")) |
aspect |
NULL |
Filter by ontology: "P" (BP), "F" (MF), "C" (CC) |
go_names |
NULL |
Output of read_obo() — adds gs_name, gs_source columns |
An alternative that uses Bioconductor's OrgDb packages, avoiding the need for GAF + OBO files.
library(org.Hs.eg.db) # human
library(org.Mm.eg.db) # mouse
library(org.Dr.eg.db) # zebrafish
library(org.Rn.eg.db) # rat
library(org.Dm.eg.db) # fruit fly
library(org.Ce.eg.db) # C. elegans
library(org.Sc.sgd.db) # yeast
library(org.At.tair.db) # Arabidopsis
pw <- get_pathway_genes_db(org.Hs.eg.db)library(AnnotationHub)
hub <- AnnotationHub()
query(hub, "Ovis aries") # search for sheep
sheep_orgdb <- hub[["AH72269"]] # load the OrgDb
pw <- get_pathway_genes_db(sheep_orgdb)| Parameter | Default | Description |
|---|---|---|
orgdb |
(required) | An OrgDb object (e.g., org.Hs.eg.db) |
keytype |
"ENTREZID" |
Key type for gene IDs in the OrgDb |
gene_id_col |
"db_object_id" |
Gene ID column name in output |
gene_symbol_col |
"db_object_symbol" |
Gene symbol column name in output |
min_size |
5 |
Drop pathways below this gene count |
aspect |
NULL |
Ontology filter: "BP", "MF", "CC", or NULL (all) |
evidence |
NULL |
Evidence code filter (e.g., "IDA"); NULL = all |
attach_names |
TRUE |
Fetch pathway names (requires GO.db) |
use_goall |
FALSE |
If TRUE, propagate annotations to all ancestor GO terms (broader pathway set, consistent with clusterProfiler default) |
Extracts KEGG pathway annotations from an OrgDb's PATH column, with online name lookup via KEGGREST.
library(org.Hs.eg.db)
# Human KEGG pathways (auto-detects organism)
pw_kegg <- get_pathway_genes_kegg(org.Hs.eg.db, min_size = 5L)
# Mouse
library(org.Mm.eg.db)
pw_kegg <- get_pathway_genes_kegg(org.Mm.eg.db, min_size = 5L)| Parameter | Default | Description |
|---|---|---|
orgdb |
(required) | An OrgDb object |
keytype |
"ENTREZID" |
Key type for querying the OrgDb |
gene_id_col |
"kegg_gene_id" |
Gene ID column name in output |
gene_symbol_col |
"kegg_gene_symbol" |
Gene symbol column name in output |
min_size |
5 |
Drop pathways below this gene count |
attach_path_names |
TRUE |
Fetch pathway names via KEGGREST::keggList() (requires network) |
str(pw_kegg[[1]])
# Classes 'kegg_pathway' and 'data.frame': 65 obs. of 4 variables:
# $ kegg_name : chr "Glycolysis / Gluconeogenesis - Homo sapiens (human)"
# $ organism_code : chr "hsa"
# $ kegg_gene_id : chr "124" "125" "126" ...
# $ kegg_gene_symbol: chr "ADH1A" "ADH1B" "ADH1C" ...KEGG pathway IDs (list names) include the organism prefix, e.g. "hsa00010".
The built-in KEGG_ORG_CODES table covers 15 common species — auto-detected from the OrgDb via AnnotationDbi::species().
| Species | Code | OrgDb |
|---|---|---|
| Human | hsa |
org.Hs.eg.db |
| Mouse | mmu |
org.Mm.eg.db |
| Rat | rno |
org.Rn.eg.db |
| Zebrafish | dre |
org.Dr.eg.db |
| Fruit fly | dme |
org.Dm.eg.db |
| C. elegans | cel |
org.Ce.eg.db |
| Yeast | sce |
org.Sc.sgd.db |
| Arabidopsis | ath |
org.At.tair.db |
| Pig | ssc |
org.Ss.eg.db |
| Cow | bta |
org.Bt.eg.db |
| Dog | cfa |
org.Cf.eg.db |
| Rhesus macaque | mcc |
org.Mmu.eg.db |
| Chicken | gga |
org.Gg.eg.db |
| Frog | xtr |
org.Xt.eg.db |
BiocManager::install(c("KEGGREST", "AnnotationDbi"))Extracts Reactome pathway annotations via reactome.db (local, no network needed) and resolves gene symbols from an OrgDb.
library(org.Hs.eg.db)
# Human Reactome pathways
pw_react <- get_pathway_genes_reactome(org.Hs.eg.db, min_size = 5L)
# Mouse Reactome
library(org.Mm.eg.db)
pw_react <- get_pathway_genes_reactome(org.Mm.eg.db, species_prefix = "R-MMU")| Parameter | Default | Description |
|---|---|---|
orgdb |
(required) | An OrgDb object (for gene symbol resolution) |
keytype |
"ENTREZID" |
Key type for querying the OrgDb |
gene_id_col |
"reactome_gene_id" |
Gene ID column name in output |
gene_symbol_col |
"reactome_gene_symbol" |
Gene symbol column name in output |
min_size |
5 |
Drop pathways below this gene count |
attach_path_names |
TRUE |
Fetch pathway names from reactome.db (local, no network) |
species_prefix |
"R-HSA" |
Reactome species prefix; NULL = all species |
str(pw_react[[1]])
# Classes 'reactome_pathway' and 'data.frame': 96 obs. of 3 variables:
# $ reactome_name : chr "Signaling by Interleukins"
# $ reactome_gene_id : chr "1" "2" "3" ...
# $ reactome_gene_symbol: chr "A1BG" "A2M" "A2MP1" ...| Species | Prefix | OrgDb |
|---|---|---|
| Human | R-HSA |
org.Hs.eg.db |
| Mouse | R-MMU |
org.Mm.eg.db |
| Rat | R-RNO |
org.Rn.eg.db |
| Zebrafish | R-DRE |
org.Dr.eg.db |
| Fly | R-DME |
org.Dm.eg.db |
| Worm | R-CEL |
org.Ce.eg.db |
BiocManager::install(c("reactome.db", "AnnotationDbi"))