Skip to content

02 Input Data

songif edited this page Jun 15, 2026 · 6 revisions

Input Data

1. Differential Expression Results

DSGE accepts p-values from any differential expression tool (DESeq2, edgeR, limma, Seurat, etc.).

Required columns:

  • pvalue: nominal p-values (not adjusted)
  • geneName (or similar): gene symbols/identifiers, must be unique

Optional columns:

  • baseMean (or AveExpr): mean expression for expression-level filtering
  • log2FoldChange (or similar): direction vector for NDS computation
res <- read.csv("inst/data_exp/limma_FLT3_IR_vs_FLT3.csv", stringsAsFactors = FALSE)

# Remove genes without a valid symbol
res <- subset(res, gene != "" & !is.na(gene))

2. GO via OrgDb — get_pathway_genes_db()

Instead of GAF + OBO files, you can build the pathway-gene map directly from a Bioconductor OrgDb package. This is simpler for common model organisms and avoids managing external files.

library(org.Hs.eg.db)
pw <- get_pathway_genes_db(org.Hs.eg.db)

Parameters

Parameter Default Description
orgdb (required) An OrgDb object (e.g., org.Hs.eg.db)
keytype "ENTREZID" Key type for gene IDs in the OrgDb
min_size 5 Drop pathways below this gene count
aspect NULL Ontology filter: "BP", "MF", "CC", or NULL (all)
evidence NULL Evidence code filter (e.g., "IDA"); NULL = all
attach_names TRUE Fetch pathway names (requires GO.db, KEGGREST, or reactome.db depending on source)

For the complete parameter list and usage details (non-model organisms, AnnotationHub, etc.), see the 03-Pathway-Genes page.

3. GAF Annotations — read_gaf()

Reads GAF 2.2 tab-separated files with data.table::fread. Comment lines starting with ! are auto-skipped.

gaf <- read_gaf("data_exp/goa_human.gaf/goa_human.gaf")

# Inspect the file metadata
head(get_gaf_header("data_exp/goa_human.gaf/goa_human.gaf"))

Parameters

Parameter Default Description
file (required) Path to the GAF file
col_names GAF_COLUMNS 17 GAF 2.2 column names; set NULL to auto-detect
... Passed to data.table::fread

GAF 2.2 Columns

The 17 standard columns: db, db_object_id, db_object_symbol, qualifier, go_id, db_reference, evidence_code, with_from, aspect, db_object_name, db_object_synonym, db_object_type, taxon, date, assigned_by, annotation_extension, gene_product_form_id.

4. GO Term Names via OBO — read_obo()

Extracts id, name, and namespace from each [Term] stanza in OBO format. Only needed in GAF mode (not needed with get_pathway_genes_db()).

go <- read_obo("data_exp/go.obo")

Parameters

Parameter Default Description
file (required) Path to the OBO file

Input Data Sources

File Format Where to get it
Differential expression results CSV/table with pvalue, baseMean, geneName columns (column names can be adapted) Your own DE analysis (DESeq2, edgeR, Seurat, limma, etc.)
GAF annotations (mode A) GAF 2.2 (tab-separated, 17 cols) GOA Human
OBO ontology (mode A) OBO 1.2/1.4 Gene Ontology Downloads
OrgDb (mode B, GO) Bioconductor OrgDb package BiocManager::install("org.Hs.eg.db") or AnnotationHub
KEGGREST (KEGG) Online API BiocManager::install("KEGGREST")
reactome.db (Reactome) Bioconductor annotation package BiocManager::install("reactome.db")

5. KEGG Pathway Data — get_pathway_genes_kegg()

Extracts KEGG pathway-to-gene mappings from a Bioconductor OrgDb (via the PATH column). The organism is auto-detected from the OrgDb.

library(org.Hs.eg.db)
pw <- get_pathway_genes_kegg(org.Hs.eg.db, min_size = 5L)

Parameters

Parameter Default Description
orgdb (required) An OrgDb object (e.g., org.Hs.eg.db)
keytype "ENTREZID" Key type for gene IDs in the OrgDb
min_size 5 Drop pathways below this gene count
attach_path_names TRUE Fetch pathway names via KEGGREST::keggList() (online)

Requires: AnnotationDbi + KEGGREST. Name lookup requires network access; when unavailable, names are set to NA with a warning.

For the complete parameter list and supported organisms, see the 03-Pathway-Genes page.

6. Reactome Pathway Data — get_pathway_genes_reactome()

Extracts Reactome pathway-to-gene mappings from reactome.db (local, no network). Gene symbols are resolved from the OrgDb.

library(org.Hs.eg.db)
pw <- get_pathway_genes_reactome(org.Hs.eg.db, min_size = 5L)

Parameters

Parameter Default Description
orgdb (required) An OrgDb object (for gene symbol resolution)
keytype "ENTREZID" Key type for querying the OrgDb
min_size 5 Drop pathways below this gene count
attach_path_names TRUE Fetch pathway names from reactome.db (local)
species_prefix "R-HSA" Reactome species prefix; NULL = all species

Requires: AnnotationDbi + reactome.db. Both pathway data and names are fetched locally.

For the complete parameter list and species prefixes, see the 03-Pathway-Genes page.

Clone this wiki locally