03 Pathway Genes

Building the Pathway-Gene Map

There are two ways to build the pathway-gene mapping required by pathway_dsge(): GAF files (mode A) or Bioconductor OrgDb (mode B). Both produce the same named-list output format.

Mode A: `get_pathway_genes()`

Splits a GAF table into a named list by GO term. Each element is a data.frame of genes in that pathway.

pw <- get_pathway_genes(
  gaf,
  genes     = c("db_object_id", "db_object_symbol"),
  unique    = TRUE,
  min_size  = 5,
  qualifier = NULL,
  evidence  = NULL,
  aspect    = NULL,
  go_names  = go
)

Parameters

Parameter	Default	Description
`gaf_data`	(required)	Output of `read_gaf()`
`genes`	`c("db_object_id", "db_object_symbol")`	Columns kept for downstream matching
`unique`	`TRUE`	Remove duplicate gene entries per term
`min_size`	`5`	Discard pathways below this gene count
`qualifier`	`NULL`	Filter by GAF qualifier (e.g. `"enables"`, `"involved_in"`)
`evidence`	`NULL`	Filter by evidence code (e.g. `c("IDA", "IPI")`)
`aspect`	`NULL`	Filter by ontology: `"P"` (BP), `"F"` (MF), `"C"` (CC)
`go_names`	`NULL`	Output of `read_obo()` — adds `gs_name`, `gs_source` columns

Mode B: `get_pathway_genes_db()`

An alternative that uses Bioconductor's OrgDb packages, avoiding the need for GAF + OBO files.

Common model organisms

library(org.Hs.eg.db)    # human
library(org.Mm.eg.db)    # mouse
library(org.Dr.eg.db)    # zebrafish
library(org.Rn.eg.db)    # rat
library(org.Dm.eg.db)    # fruit fly
library(org.Ce.eg.db)    # C. elegans
library(org.Sc.sgd.db)   # yeast
library(org.At.tair.db)  # Arabidopsis

pw <- get_pathway_genes_db(org.Hs.eg.db)

Non-model organisms via AnnotationHub

library(AnnotationHub)
hub <- AnnotationHub()
query(hub, "Ovis aries")               # search for sheep
sheep_orgdb <- hub[["AH72269"]]        # load the OrgDb

pw <- get_pathway_genes_db(sheep_orgdb)

Parameters

Parameter	Default	Description
`orgdb`	(required)	An `OrgDb` object (e.g., `org.Hs.eg.db`)
`keytype`	`"ENTREZID"`	Key type for gene IDs in the OrgDb
`gene_id_col`	`"db_object_id"`	Gene ID column name in output
`gene_symbol_col`	`"db_object_symbol"`	Gene symbol column name in output
`min_size`	`5`	Drop pathways below this gene count
`aspect`	`NULL`	Ontology filter: `"BP"`, `"MF"`, `"CC"`, or `NULL` (all)
`evidence`	`NULL`	Evidence code filter (e.g., `"IDA"`); `NULL` = all
`attach_names`	`TRUE`	Fetch pathway names (requires `GO.db`)
`use_goall`	`FALSE`	If `TRUE`, propagate annotations to all ancestor GO terms (broader pathway set, consistent with clusterProfiler default)

Mode C: `get_pathway_genes_kegg()`

Extracts KEGG pathway annotations from an OrgDb's PATH column, with online name lookup via KEGGREST.

Usage

library(org.Hs.eg.db)

# Human KEGG pathways (auto-detects organism)
pw_kegg <- get_pathway_genes_kegg(org.Hs.eg.db, min_size = 5L)

# Mouse
library(org.Mm.eg.db)
pw_kegg <- get_pathway_genes_kegg(org.Mm.eg.db, min_size = 5L)

Parameters

Parameter	Default	Description
`orgdb`	(required)	An `OrgDb` object
`keytype`	`"ENTREZID"`	Key type for querying the OrgDb
`gene_id_col`	`"kegg_gene_id"`	Gene ID column name in output
`gene_symbol_col`	`"kegg_gene_symbol"`	Gene symbol column name in output
`min_size`	`5`	Drop pathways below this gene count
`attach_path_names`	`TRUE`	Fetch pathway names via `KEGGREST::keggList()` (requires network)

Output format

str(pw_kegg[[1]])
# Classes 'kegg_pathway' and 'data.frame':  65 obs. of  4 variables:
#  $ kegg_name       : chr  "Glycolysis / Gluconeogenesis - Homo sapiens (human)"
#  $ organism_code   : chr  "hsa"
#  $ kegg_gene_id    : chr  "124" "125" "126" ...
#  $ kegg_gene_symbol: chr  "ADH1A" "ADH1B" "ADH1C" ...

KEGG pathway IDs (list names) include the organism prefix, e.g. "hsa00010".

Supported organisms

The built-in KEGG_ORG_CODES table covers 15 common species — auto-detected from the OrgDb via AnnotationDbi::species().

Species	Code	OrgDb
Human	`hsa`	`org.Hs.eg.db`
Mouse	`mmu`	`org.Mm.eg.db`
Rat	`rno`	`org.Rn.eg.db`
Zebrafish	`dre`	`org.Dr.eg.db`
Fruit fly	`dme`	`org.Dm.eg.db`
C. elegans	`cel`	`org.Ce.eg.db`
Yeast	`sce`	`org.Sc.sgd.db`
Arabidopsis	`ath`	`org.At.tair.db`
Pig	`ssc`	`org.Ss.eg.db`
Cow	`bta`	`org.Bt.eg.db`
Dog	`cfa`	`org.Cf.eg.db`
Rhesus macaque	`mcc`	`org.Mmu.eg.db`
Chicken	`gga`	`org.Gg.eg.db`
Frog	`xtr`	`org.Xt.eg.db`

Requirements

BiocManager::install(c("KEGGREST", "AnnotationDbi"))

Mode D: `get_pathway_genes_reactome()`

Extracts Reactome pathway annotations via reactome.db (local, no network needed) and resolves gene symbols from an OrgDb.

Usage

library(org.Hs.eg.db)

# Human Reactome pathways
pw_react <- get_pathway_genes_reactome(org.Hs.eg.db, min_size = 5L)

# Mouse Reactome
library(org.Mm.eg.db)
pw_react <- get_pathway_genes_reactome(org.Mm.eg.db, species_prefix = "R-MMU")

Parameters

Parameter	Default	Description
`orgdb`	(required)	An `OrgDb` object (for gene symbol resolution)
`keytype`	`"ENTREZID"`	Key type for querying the OrgDb
`gene_id_col`	`"reactome_gene_id"`	Gene ID column name in output
`gene_symbol_col`	`"reactome_gene_symbol"`	Gene symbol column name in output
`min_size`	`5`	Drop pathways below this gene count
`attach_path_names`	`TRUE`	Fetch pathway names from `reactome.db` (local, no network)
`species_prefix`	`"R-HSA"`	Reactome species prefix; `NULL` = all species

Output format

str(pw_react[[1]])
# Classes 'reactome_pathway' and 'data.frame':  96 obs. of  3 variables:
#  $ reactome_name       : chr  "Signaling by Interleukins"
#  $ reactome_gene_id    : chr  "1" "2" "3" ...
#  $ reactome_gene_symbol: chr  "A1BG" "A2M" "A2MP1" ...

Common species prefixes

Species	Prefix	OrgDb
Human	`R-HSA`	`org.Hs.eg.db`
Mouse	`R-MMU`	`org.Mm.eg.db`
Rat	`R-RNO`	`org.Rn.eg.db`
Zebrafish	`R-DRE`	`org.Dr.eg.db`
Fly	`R-DME`	`org.Dm.eg.db`
Worm	`R-CEL`	`org.Ce.eg.db`

Requirements

BiocManager::install(c("reactome.db", "AnnotationDbi"))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

03 Pathway Genes

Building the Pathway-Gene Map

Mode A: `get_pathway_genes()`

Parameters

Mode B: `get_pathway_genes_db()`

Common model organisms

Non-model organisms via AnnotationHub

Parameters

Mode C: `get_pathway_genes_kegg()`

Usage

Parameters

Output format

Supported organisms

Requirements

Mode D: `get_pathway_genes_reactome()`

Usage

Parameters

Output format

Common species prefixes

Requirements

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Uh oh!

03 Pathway Genes

Building the Pathway-Gene Map

Mode A: get_pathway_genes()

Parameters

Mode B: get_pathway_genes_db()

Common model organisms

Non-model organisms via AnnotationHub

Parameters

Mode C: get_pathway_genes_kegg()

Usage

Parameters

Output format

Supported organisms

Requirements

Mode D: get_pathway_genes_reactome()

Usage

Parameters

Output format

Common species prefixes

Requirements

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Mode A: `get_pathway_genes()`

Mode B: `get_pathway_genes_db()`

Mode C: `get_pathway_genes_kegg()`

Mode D: `get_pathway_genes_reactome()`