Skip to content

Biochemical and Chemical Similarity Networks

Dmitry Grapov edited this page May 6, 2013 · 10 revisions

###Building Biochemical and Chemical Similarity Networks Example network Easiest way to follow along with example is to take a look at the accessory files:

  1. power point presentation detailing all the steps

  2. examples of an edge list, node attributes file and a cytoscape network file

or optionally download the full tutorial HERE.

Take a look at the power point presentation (see above), follow along with the calculations done in R (see below) and finally visualize the network in Cytoscape.

load needed functions: package devium which is stored on a github

source("http://pastebin.com/raw.php?i=Y0YYEBia")

Take a look at some chemical identifiers here

Use PubChem compound identifier (CID).

#Pubchem CIDs = cids
cids # overview
nrow(cids) # how many
str(cids) # structure, wan't numeric 
cids<-as.numeric(as.character(unlist(cids))) # hack to break factor 

Get biological product to precursor relationships

Based on KEGG reactant pairs (RPAIRS)

#making an edge list based on CIDs from KEGG reactant pairs
KEGG.edge.list<-CID.to.KEGG.pairs(cid=cids,database=get.KEGG.pairs(),lookup=get.CID.KEGG.pairs())
head(KEGG.edge.list)
dim(KEGG.edge.list) # a two column list with CID to CID connections based on KEGG RPAIS
# how did I get this?
#1) convert from CID to KEGG  using get.CID.KEGG.pairs(), which is a table stored:https://gist.github.com/dgrapov/4964546
#2) get KEGG RPAIRS  using get.KEGG.pairs() which is a table stored:https://gist.github.com/dgrapov/4964564
#3) return CID pairs

Get chemical similarity based on CIDs(Tanimoto coefficient > 0.7)

tanimoto.edges<-CID.to.tanimoto(cids=cids, cut.off = .7, parallel=FALSE)
head(tanimoto.edges)
# how did I get this?
#1) Use R package ChemmineR to querry Pubchem PUG to get molecular fingerprints
#2) calculate simialrity coefficient
#3) return edges with similarity above cut.off

Formatting and making the network

After a little bit of formatting make combined KEGG + tanimoto edge list.

Now upload this and a node attributes table to Cytoscape to make an amazing network.

Here is an example of a network connected based chemical relationships (green edges) and structural similarities (gray edges). This network displays the results from a multivariate classification model to discriminate between two groups, whose individual values for key factors are shown as box plots within the nodes. Example network