Skip to content

03 Pathway Genes

songif edited this page Jun 15, 2026 · 4 revisions

Building the Pathway-Gene Map

There are two ways to build the pathway-gene mapping required by pathway_dsge(): GAF files (mode A) or Bioconductor OrgDb (mode B). Both produce the same named-list output format.

Mode A: get_pathway_genes()

Splits a GAF table into a named list by GO term. Each element is a data.frame of genes in that pathway.

pw <- get_pathway_genes(
  gaf,
  genes     = c("db_object_id", "db_object_symbol"),
  unique    = TRUE,
  min_size  = 5,
  qualifier = NULL,
  evidence  = NULL,
  aspect    = NULL,
  go_names  = go
)

Parameters

Parameter Default Description
gaf_data (required) Output of read_gaf()
genes c("db_object_id", "db_object_symbol") Columns kept for downstream matching
unique TRUE Remove duplicate gene entries per term
min_size 5 Discard pathways below this gene count
qualifier NULL Filter by GAF qualifier (e.g. "enables", "involved_in")
evidence NULL Filter by evidence code (e.g. c("IDA", "IPI"))
aspect NULL Filter by ontology: "P" (BP), "F" (MF), "C" (CC)
go_names NULL Output of read_obo() — adds gs_name, gs_source columns

Mode B: get_pathway_genes_db()

An alternative that uses Bioconductor's OrgDb packages, avoiding the need for GAF + OBO files.

Common model organisms

library(org.Hs.eg.db)    # human
library(org.Mm.eg.db)    # mouse
library(org.Dr.eg.db)    # zebrafish
library(org.Rn.eg.db)    # rat
library(org.Dm.eg.db)    # fruit fly
library(org.Ce.eg.db)    # C. elegans
library(org.Sc.sgd.db)   # yeast
library(org.At.tair.db)  # Arabidopsis

pw <- get_pathway_genes_db(org.Hs.eg.db)

Non-model organisms via AnnotationHub

library(AnnotationHub)
hub <- AnnotationHub()
query(hub, "Ovis aries")               # search for sheep
sheep_orgdb <- hub[["AH72269"]]        # load the OrgDb

pw <- get_pathway_genes_db(sheep_orgdb)

Parameters

Parameter Default Description
orgdb (required) An OrgDb object (e.g., org.Hs.eg.db)
keytype "ENTREZID" Key type for gene IDs in the OrgDb
gene_id_col "db_object_id" Gene ID column name in output
gene_symbol_col "db_object_symbol" Gene symbol column name in output
min_size 5 Drop pathways below this gene count
aspect NULL Ontology filter: "BP", "MF", "CC", or NULL (all)
evidence NULL Evidence code filter (e.g., "IDA"); NULL = all
attach_names TRUE Fetch pathway names (requires GO.db)
use_goall FALSE If TRUE, propagate annotations to all ancestor GO terms (broader pathway set, consistent with clusterProfiler default)

Mode C: get_pathway_genes_kegg()

Extracts KEGG pathway annotations from an OrgDb's PATH column, with online name lookup via KEGGREST.

Usage

library(org.Hs.eg.db)

# Human KEGG pathways (auto-detects organism)
pw_kegg <- get_pathway_genes_kegg(org.Hs.eg.db, min_size = 5L)

# Mouse
library(org.Mm.eg.db)
pw_kegg <- get_pathway_genes_kegg(org.Mm.eg.db, min_size = 5L)

Parameters

Parameter Default Description
orgdb (required) An OrgDb object
keytype "ENTREZID" Key type for querying the OrgDb
gene_id_col "kegg_gene_id" Gene ID column name in output
gene_symbol_col "kegg_gene_symbol" Gene symbol column name in output
min_size 5 Drop pathways below this gene count
attach_path_names TRUE Fetch pathway names via KEGGREST::keggList() (requires network)

Output format

str(pw_kegg[[1]])
# Classes 'kegg_pathway' and 'data.frame':  65 obs. of  4 variables:
#  $ kegg_name       : chr  "Glycolysis / Gluconeogenesis - Homo sapiens (human)"
#  $ organism_code   : chr  "hsa"
#  $ kegg_gene_id    : chr  "124" "125" "126" ...
#  $ kegg_gene_symbol: chr  "ADH1A" "ADH1B" "ADH1C" ...

KEGG pathway IDs (list names) include the organism prefix, e.g. "hsa00010".

Supported organisms

The built-in KEGG_ORG_CODES table covers 15 common species — auto-detected from the OrgDb via AnnotationDbi::species().

Species Code OrgDb
Human hsa org.Hs.eg.db
Mouse mmu org.Mm.eg.db
Rat rno org.Rn.eg.db
Zebrafish dre org.Dr.eg.db
Fruit fly dme org.Dm.eg.db
C. elegans cel org.Ce.eg.db
Yeast sce org.Sc.sgd.db
Arabidopsis ath org.At.tair.db
Pig ssc org.Ss.eg.db
Cow bta org.Bt.eg.db
Dog cfa org.Cf.eg.db
Rhesus macaque mcc org.Mmu.eg.db
Chicken gga org.Gg.eg.db
Frog xtr org.Xt.eg.db

Requirements

BiocManager::install(c("KEGGREST", "AnnotationDbi"))

Mode D: get_pathway_genes_reactome()

Extracts Reactome pathway annotations via reactome.db (local, no network needed) and resolves gene symbols from an OrgDb.

Usage

library(org.Hs.eg.db)

# Human Reactome pathways
pw_react <- get_pathway_genes_reactome(org.Hs.eg.db, min_size = 5L)

# Mouse Reactome
library(org.Mm.eg.db)
pw_react <- get_pathway_genes_reactome(org.Mm.eg.db, species_prefix = "R-MMU")

Parameters

Parameter Default Description
orgdb (required) An OrgDb object (for gene symbol resolution)
keytype "ENTREZID" Key type for querying the OrgDb
gene_id_col "reactome_gene_id" Gene ID column name in output
gene_symbol_col "reactome_gene_symbol" Gene symbol column name in output
min_size 5 Drop pathways below this gene count
attach_path_names TRUE Fetch pathway names from reactome.db (local, no network)
species_prefix "R-HSA" Reactome species prefix; NULL = all species

Output format

str(pw_react[[1]])
# Classes 'reactome_pathway' and 'data.frame':  96 obs. of  3 variables:
#  $ reactome_name       : chr  "Signaling by Interleukins"
#  $ reactome_gene_id    : chr  "1" "2" "3" ...
#  $ reactome_gene_symbol: chr  "A1BG" "A2M" "A2MP1" ...

Common species prefixes

Species Prefix OrgDb
Human R-HSA org.Hs.eg.db
Mouse R-MMU org.Mm.eg.db
Rat R-RNO org.Rn.eg.db
Zebrafish R-DRE org.Dr.eg.db
Fly R-DME org.Dm.eg.db
Worm R-CEL org.Ce.eg.db

Requirements

BiocManager::install(c("reactome.db", "AnnotationDbi"))