In [1]:
# Load library and example datasets
library("pacman")

pacman::p_load("pathview", "gage", "tidyverse", "MetaboAnalystR", "KEGGREST")
data(gse16873.d)
# Load human pathways data
data(paths.hsa)
# load demo pathway-related data, including 3 pathway ids and related plotting params
# this is in dictionary format
data(demo.paths)

source("~/maca-utils/maca-kegg-utils.R")


## Get DE metabs

In [6]:
fn_auc_csv <- "test_dme_data.csv"
tbl0 <- call_maca_normalization(fn_auc_csv)
tbl0 <- get_de_metabs(tbl0, 0.01, "treatment", "control", 0.05)

[1] "MetaboAnalyst R objects initialized ..."
 [1] "Successfully passed sanity check!"                                                                    
 [2] "Samples are not paired."                                                                              
 [3] "2 groups were detected in samples."                                                                   
 [4] "Only English letters, numbers, underscore, hyphen and forward slash (/) are allowed."                 
 [5] "<font color=\"orange\">Other special characters or punctuations (if any) will be stripped off.</font>"
 [6] "All data values are numeric."                                                                         
 [7] "A total of 2 (0.1%) missing values were detected."                                                    
 [8] "<u>By default, these values will be replaced by a small value.</u>"                                   
 [9] "Click <b>Skip</b> button if you accept the default practice"                

“Duplicated column names deduplicated: 'Alanylglycine' => 'Alanylglycine_1' [59]”

[1] "MetaboAnalyst R objects initialized ..."
[1] "Loaded files from MetaboAnalyst web-server."
[1] "Loaded files from MetaboAnalyst web-server."
[1] "Loaded files from MetaboAnalyst web-server."
NULL


## Get enriched pathways from MACA

In [4]:
## x <- call_maca_pw_analysis("test_dme_data.csv", "dme")
tbl1 <- x[[1]]
pw_dict <- x[[2]]
pw_names <- get_kegg_pw_ref_tbl("dme")

# Manually replace some pw_names
tbl1$pw_name[tbl1$pw_name=="Fatty acid elongation in mitochondria"] <- "Fatty acid elongation"
tbl1$pw_name[tbl1$pw_name=="Glycolysis or Gluconeogenesis"] <- "Glycolysis / Gluconeogenesis"

# Join pw enrichment output with pw IDs
tbl1 <- inner_join(tbl1, pw_names, by="pw_name")


In [None]:
pw_name_ls0 <- as.vector(unlist(tbl0["pw_name"]))
pw_name_ls1 <- as.vector(unlist(tbl1["pw_name"]))

for (pw in pw_name_ls0) {
    if (pw %in% pw_name_ls1 == F) {
        print(pw)
    }
}

## Call PathviewR

* Use `pathviewR` to retrieve the sanitized KEGG graphs (which it claims to be able to)
* Also do data mapping, with a single named list of logFC values (names are KEGG Ids). But these will only result in coloured nodes; barcharts are better. 
* Will return a lot of `pngs` and `xmls`. 

In [28]:
# Get input compound data
# named list of logFCs, names are KEGG Id
tbl_tmp <- tbl0 %>% dplyr::select(c("KEGG", "log2_fc")) %>% filter(KEGG != "undef")
cpd_data_ls <- unlist(tbl_tmp$log2_fc)
names(cpd_data_ls) <- unlist(tbl_tmp$KEGG)

# Get list of pathway numbers
pw_num_ls <- lapply(unlist(tbl1$pw_id), function(x) {gsub("dme", "", x)})

In [31]:
i <- 1
suffix_i <- paste0("dme", pw_num_ls[i])
pathview(cpd.data = cpd_data_ls,
         pathway.id = pw_num_ls[i], 
         species = "dme", 
         out.suffix = suffix_i,
         keys.align = "y", 
         kegg.native = T
         )

Info: Downloading xml files for dme00970, 1/1 pathways..
Info: Downloading png files for dme00970, 1/1 pathways..
“Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
“Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
“Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
“Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
“Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
“Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
“Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
“Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
“Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
“Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
“Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
“Calling 'structure(NULL, *)' is deprec

In [30]:
?pathview

0,1
pathview {pathview},R Documentation

0,1
gene.data,"either vector (single sample) or a matrix-like data (multiple sample). Vector should be numeric with gene IDs as names or it may also be character of gene IDs. Character vector is treated as discrete or count data. Matrix-like data structure has genes as rows and samples as columns. Row names should be gene IDs. Here gene ID is a generic concepts, including multiple types of gene, transcript and protein uniquely mappable to KEGG gene IDs. KEGG ortholog IDs are also treated as gene IDs as to handle metagenomic data. Check details for mappable ID types. Default gene.data=NULL. numeric, character, continuous"
cpd.data,"the same as gene.data, excpet named with IDs mappable to KEGG compound IDs. Over 20 types of IDs included in CHEMBL database can be used here. Check details for mappable ID types. Default cpd.data=NULL. Note that gene.data and cpd.data can't be NULL simultaneously."
pathway.id,"character vector, the KEGG pathway ID(s), usually 5 digit, may also include the 3 letter KEGG species code."
species,"character, either the kegg code, scientific name or the common name of the target species. This applies to both pathway and gene.data or cpd.data. When KEGG ortholog pathway is considered, species=""ko"". Default species=""hsa"", it is equivalent to use either ""Homo sapiens"" (scientific name) or ""human"" (common name)."
kegg.dir,"character, the directory of KEGG pathway data file (.xml) and image file (.png). Users may supply their own data files in the same format and naming convention of KEGG's (species code + pathway id, e.g. hsa04110.xml, hsa04110.png etc) in this directory. Default kegg.dir=""."" (current working directory)."
cpd.idtype,"character, ID type used for the cpd.data. Default cpd.idtype=""kegg"" (include compound, glycan and drug accessions)."
gene.idtype,"character, ID type used for the gene.data, case insensitive. Default gene.idtype=""entrez"", i.e. Entrez Gene, which are the primary KEGG gene ID for many common model organisms. For other species, gene.idtype should be set to ""KEGG"" as KEGG use other types of gene IDs. For the common model organisms (to check the list, do: data(bods); bods), you may also specify other types of valid IDs. To check the ID list, do: data(gene.idtype.list); gene.idtype.list."
gene.annotpkg,"character, the name of the annotation package to use for mapping between other gene ID types including symbols and Entrez gene ID. Default gene.annotpkg=NULL."
min.nnodes,"integer, minimal number of nodes of type ""gene"",""enzyme"", ""compound"" or ""ortholog"" for a pathway to be considered. Default min.nnodes=3."
kegg.native,"logical, whether to render pathway graph as native KEGG graph (.png) or using graphviz layout engine (.pdf). Default kegg.native=TRUE."

0,1
plot.data.gene,"data.frame returned by node.map function for rendering mapped gene nodes, including node name, type, positions (x, y), sizes (width, height), and mapped gene.data. This data is also used as input for pseduo-color coding through node.color function. Default plot.data.gene=NULL."
plot.data.cpd,"same as plot.data.gene function, except for mapped compound node data. d plot.data.cpd=NULL. Default plot.data.cpd=NULL. Note that plot.data.gene and plot.data.cpd can't be NULL simultaneously."
cols.ts.gene,vector or matrix of colors returned by node.color function for rendering gene.data. Dimensionality is the same as the latter. Default cols.ts.gene=NULL.
cols.ts.cpd,"same as cols.ts.gene, except corresponding to cpd.data. d cols.ts.cpd=NULL. Note that cols.ts.gene and cols.ts.cpd plot.data.gene can't be NULL simultaneously."
node.data,"list returned by node.info function, which parse KGML file directly or indirectly, and extract the node data."
pathway.name,"character, the full KEGG pathway name in the format of 3-letter species code with 5-digit pathway id, eg ""hsa04612""."
out.suffix,"character, the suffix to be added after the pathway name as part of the output graph file. Sample names or column names of the gene.data or cpd.data are also added when there are multiple samples. Default out.suffix=""pathview""."
multi.state,"logical, whether multiple states (samples or columns) gene.data or cpd.data should be integrated and plotted in the same graph. Default match.data=TRUE. In other words, gene or compound nodes will be sliced into multiple pieces corresponding to the number of states in the data."
match.data,"logical, whether the samples of gene.data and cpd.data are paired. Default match.data=TRUE. When let sample sizes of gene.data and cpd.data be m and n, when m>n, extra columns of NA's (mapped to no color) will be added to cpd.data as to make the sample size the same. This will result in the smae number of slice in gene nodes and compound when multi.state=TRUE."
same.layer,"logical, control plotting layers: 1) if node colors be plotted in the same layer as the pathway graph when kegg.native=TRUE, 2) if edge/node type legend be plotted in the same page when kegg.native=FALSE."

0,1
kegg.names,standard KEGG IDs/Names for mapped nodes. It's Entrez Gene ID or KEGG Compound Accessions.
labels,Node labels to be used when needed.
all.mapped,All molecule (gene or compound) IDs mapped to this node.
type,"node type, currently 4 types are supported: ""gene"",""enzyme"", ""compound"" and ""ortholog""."
x,x coordinate in the original KEGG pathway graph.
y,y coordinate in the original KEGG pathway graph.
width,node width in the original KEGG pathway graph.
height,node height in the original KEGG pathway graph.
other columns,columns of the mapped gene/compound data and corresponding pseudo-color codes for individual samples
