Skip to content

03 Pathway Genes

songif edited this page Jun 12, 2026 · 4 revisions

Building the Pathway-Gene Map

There are two ways to build the pathway-gene mapping required by pathway_dsge(): GAF files (mode A) or Bioconductor OrgDb (mode B). Both produce the same named-list output format.

Mode A: get_pathway_genes()

Splits a GAF table into a named list by GO term. Each element is a data.frame of genes in that pathway.

pw <- get_pathway_genes(
  gaf,
  genes     = c("db_object_id", "db_object_symbol"),
  unique    = TRUE,
  min_size  = 5,
  qualifier = NULL,
  evidence  = NULL,
  aspect    = NULL,
  go_names  = go
)

Parameters

Parameter Default Description
gaf_data (required) Output of read_gaf()
genes c("db_object_id", "db_object_symbol") Columns kept for downstream matching
unique TRUE Remove duplicate gene entries per term
min_size 5 Discard pathways below this gene count
qualifier NULL Filter by GAF qualifier (e.g. "enables", "involved_in")
evidence NULL Filter by evidence code (e.g. c("IDA", "IPI"))
aspect NULL Filter by ontology: "P" (BP), "F" (MF), "C" (CC)
go_names NULL Output of read_obo() — adds go_name, go_namespace columns

Mode B: get_pathway_genes_db()

An alternative that uses Bioconductor's OrgDb packages, avoiding the need for GAF + OBO files.

Common model organisms

library(org.Hs.eg.db)    # human
library(org.Mm.eg.db)    # mouse
library(org.Dr.eg.db)    # zebrafish
library(org.Rn.eg.db)    # rat
library(org.Dm.eg.db)    # fruit fly
library(org.Ce.eg.db)    # C. elegans
library(org.Sc.sgd.db)   # yeast
library(org.At.tair.db)  # Arabidopsis

pw <- get_pathway_genes_db(org.Hs.eg.db)

Non-model organisms via AnnotationHub

library(AnnotationHub)
hub <- AnnotationHub()
query(hub, "Ovis aries")               # search for sheep
sheep_orgdb <- hub[["AH72269"]]        # load the OrgDb

pw <- get_pathway_genes_db(sheep_orgdb)

Parameters

Parameter Default Description
orgdb (required) An OrgDb object (e.g., org.Hs.eg.db)
keytype "ENTREZID" Key type for gene IDs in the OrgDb
gene_id_col "db_object_id" Gene ID column name in output
gene_symbol_col "db_object_symbol" Gene symbol column name in output
min_size 5 Drop pathways below this gene count
aspect NULL Ontology filter: "BP", "MF", "CC", or NULL (all)
evidence NULL Evidence code filter (e.g., "IDA"); NULL = all
attach_go_names TRUE Fetch GO term names via GO.db
use_goall FALSE If TRUE, propagate annotations to all ancestor GO terms (broader pathway set, consistent with clusterProfiler default)

Clone this wiki locally