# Group Nodes
## Yihang Xin
## 2020-12-17

This vignette will show you how to use node grouping functions to manipulate graphs in Cytoscape.



# Prerequisites
In addition to this package (py4cytoscape version 0.0.6), you will need:

* Latest version of Cytoscape, which can be downloaded from https://cytoscape.org/download.html. Simply follow the installation instructions on screen.

* Complete installation wizard

* Launch Cytoscape

# Import the required package


In [1]:
import os
import sys
import pandas as pd
import py4cytoscape as p4c

# Setup Cytoscape


In [2]:
p4c.cytoscape_version_info()

{'apiVersion': 'v1',
 'cytoscapeVersion': '3.8.2',
 'automationAPIVersion': '1.0.0',
 'py4cytoscapeVersion': '0.0.6'}

# Background
The ability to group nodes together into “metanodes” and collapse them to a single node in a graph is useful for simplifying views of a complex network.

The example in this vignette describes application of node grouping functions to data that includes protein-protein interactions and clustered correlations of protein post-translational modifications (Grimes, et al., 2018). This vignette plots five proteins and their modifications, and uses the node grouping functions to manipulate the graph in Cytoscape.



# Example
First we set up the node and edge data frames.

In [3]:
net_nodes = ["ALK", "ALK p Y1078", "ALK p Y1096", "ALK p Y1586", "CTNND1", "CTNND1 p Y193", "CTNND1 p Y217", "CTNND1 p Y228", "CTNND1 p Y241", "CTNND1 p Y248", "CTNND1 p Y302", "CTNND1 p Y904", "CTTN", "CTTN ack K107", "CTTN ack K124", "CTTN ack K147", "CTTN ack K161", "CTTN ack K235", "CTTN ack K390", "CTTN ack K87", "CTTN p S113", "CTTN p S224", "CTTN p Y104", "CTTN p Y154", "CTTN p Y162", "CTTN p Y228", "CTTN p Y334", "CTTN p Y421", "IRS1", "IRS1 p Y632", "IRS1 p Y941", "IRS1 p Y989", "NPM1", "NPM1 ack K154", "NPM1 ack K223", "NPM1 p S214", "NPM1 p S218"]
parent = ["", "ALK", "ALK", "ALK", "", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "", "IRS1", "IRS1", "IRS1", "", "NPM1", "NPM1", "NPM1", "NPM1"]
nodeType = ["protein", "modification", "modification", "modification", "protein", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "protein", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "protein", "modification", "modification", "modification", "protein", "modification", "modification", "modification", "modification"]

In [4]:
net_genes = []
for n in net_nodes:
    s = n.split()[0]
    net_genes.append(s)

In [5]:
d = {'id': net_nodes, 'Gene.name':net_genes, 'parent': parent, 'nodeType':nodeType}

In [6]:
netnodes_df = pd.DataFrame(data=d)
netnodes_df.head()

Unnamed: 0,id,Gene.name,parent,nodeType
0,ALK,ALK,,protein
1,ALK p Y1078,ALK,ALK,modification
2,ALK p Y1096,ALK,ALK,modification
3,ALK p Y1586,ALK,ALK,modification
4,CTNND1,CTNND1,,protein


In [7]:
# Define edge data
source_nodes = ["ALK", "ALK", "ALK", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "IRS1", "IRS1", "IRS1", "NPM1", "NPM1", "NPM1", "NPM1", "ALK p Y1096", "CTNND1 p Y193", "CTNND1 p Y193", "CTNND1 p Y228", "CTNND1 p Y904", "CTNND1 p Y217", "CTNND1 p Y241", "CTNND1 p Y248", "ALK p Y1078", "ALK p Y1096", "ALK p Y1586", "IRS1 p Y941", "CTTN ack K147", "CTTN ack K107", "CTTN ack K235", "CTTN ack K87", "CTTN ack K147", "CTTN ack K124", "CTTN ack K147", "CTTN ack K235", "CTTN ack K161", "CTTN ack K390", "NPM1 ack K223", "NPM1 ack K154", "NPM1 ack K223", "ALK", "CTNND1", "CTNND1", "CTTN", "IRS1"]
target_nodes = ["ALK p Y1078", "ALK p Y1096", "ALK p Y1586", "CTNND1 p Y193", "CTNND1 p Y217", "CTNND1 p Y228", "CTNND1 p Y241", "CTNND1 p Y248", "CTNND1 p Y302", "CTNND1 p Y904", "CTTN ack K107", "CTTN ack K124", "CTTN ack K147", "CTTN ack K161", "CTTN ack K235", "CTTN ack K390", "CTTN ack K87", "CTTN p S113", "CTTN p S224", "CTTN p Y104", "CTTN p Y154", "CTTN p Y162", "CTTN p Y228", "CTTN p Y334", "CTTN p Y421", "IRS1 p Y632", "IRS1 p Y941", "IRS1 p Y989", "NPM1 ack K154", "NPM1 ack K223", "NPM1 p S214", "NPM1 p S218", "ALK p Y1586", "CTNND1 p Y228", "CTNND1 p Y302", "CTNND1 p Y302", "CTTN p Y154", "CTTN p Y162", "CTTN p Y162", "CTTN p Y334", "IRS1 p Y632", "IRS1 p Y989", "IRS1 p Y989", "IRS1 p Y989", "CTTN p S113", "CTTN p S224", "CTTN p S224", "CTTN p S224", "CTTN p Y104", "CTTN p Y228", "CTTN p Y228", "CTTN p Y228", "CTTN p Y421", "CTTN p Y421", "NPM1 p S214", "NPM1 p S218", "NPM1 p S218", "IRS1", "CTTN", "IRS1", "NPM1", "NPM1"]
Weight = [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 0.8060606, 0.7575758, 0.7454545, 0.9393939, 0.8949096, 0.7329699, 0.7553845, 0.7866191, 0.775, 0.6969697, 0.7818182, 0.8424242, -0.7714286, -0.8385965, -0.5017544, -0.7473684, -0.5252838, -0.9428571, -0.8285714, -0.6713287, -0.5508772, -0.9428571, -0.8857143, -0.6310881, -0.8285714, 0.6123365, 2.115272, 0.002461723, 0.3354451, 0.5661711]
d = {'source': source_nodes, 'target':target_nodes, 'Weight': Weight}
netedges_df = pd.DataFrame(data=d)
netedges_df.head()

Unnamed: 0,source,target,Weight
0,ALK,ALK p Y1078,100.0
1,ALK,ALK p Y1096,100.0
2,ALK,ALK p Y1586,100.0
3,CTNND1,CTNND1 p Y193,100.0
4,CTNND1,CTNND1 p Y217,100.0


In [8]:
#create network from data frames
net_suid = p4c.create_network_from_data_frames(netnodes_df, netedges_df, title="Group Nodes Test", collection = "Py4cytoscape Vignettes")

Applying default style...
Applying preferred layout


In [9]:
p4c.layout_network('force-directed')

{}

Note that for convenience the data frame has defined whether a node is a protein or a modification, and also defined the parent node for each modification.

The function selectNodes looks by default for the node SUID, which can be retrieved by getTableColumns. Alternatively, the data frame can be used to distinguish proteins and modifications.

In [10]:
nodedata = p4c.get_table_columns(table='node')
edgedata = p4c.get_table_columns(table='edge')
nodedata.head()

Unnamed: 0,SUID,shared name,id,Gene.name,parent,nodeType,NumChildren,NumDescendents,name,selected
826,826,ALK,ALK,ALK,,protein,,,ALK,False
827,827,ALK p Y1078,ALK p Y1078,ALK,ALK,modification,,,ALK p Y1078,False
828,828,ALK p Y1096,ALK p Y1096,ALK,ALK,modification,,,ALK p Y1096,False
829,829,ALK p Y1586,ALK p Y1586,ALK,ALK,modification,,,ALK p Y1586,False
830,830,CTNND1,CTNND1,CTNND1,,protein,,,CTNND1,False


In [11]:
genes = netnodes_df[netnodes_df['nodeType'].str.contains("protein")]['id']
genes = pd.DataFrame(genes)
genes

Unnamed: 0,id
0,ALK
4,CTNND1
12,CTTN
28,IRS1
32,NPM1


In [12]:
#select by gene SUIDs
geneSUIDs = nodedata[nodedata['nodeType'].str.contains("protein")]['SUID']
geneSUIDs = pd.DataFrame(geneSUIDs)
p4c.select_nodes(nodes=list(geneSUIDs['SUID']), preserve_current_selection=False)

{'nodes': [838, 854, 858, 826, 830], 'edges': []}

In [13]:
# or by names in the "id" column
p4c.select_nodes(nodes=["ALK","IRS1"],by_col='id', preserve_current_selection=False)

{'nodes': [854, 826], 'edges': []}

In [14]:
# or by names based on dataframe subsetting
modifications = netnodes_df[netnodes_df['nodeType'].str.contains("modification")]['id']
p4c.select_nodes(nodes=list(modifications),by_col='id', preserve_current_selection=False)

{'nodes': [827,
  828,
  829,
  831,
  832,
  833,
  834,
  835,
  836,
  837,
  839,
  840,
  841,
  842,
  843,
  844,
  845,
  846,
  847,
  848,
  849,
  850,
  851,
  852,
  853,
  855,
  856,
  857,
  859,
  860,
  861,
  862],
 'edges': []}

In [15]:
# Now select one protein and all its modifications
deltacatnodes = netnodes_df[netnodes_df['Gene.name'].str.contains("CTNND1")]['id']
p4c.select_nodes(nodes=list(deltacatnodes),by_col='id', preserve_current_selection=False)

{'nodes': [831, 832, 833, 834, 835, 836, 837, 830], 'edges': []}

Let’s create a new group of the selected nodes and collapse it into one node…



In [16]:
p4c.create_group(group_name='delta catenin group')
p4c.collapse_group(groups='delta catenin group')

{'groups': [1025]}

…then expand it again.



In [17]:
group_number = p4c.expand_group(groups='delta catenin group')
group_number['groups']

[1025]

For these data, we can create groups of all proteins together with their modifications. Here we name the groups by their gene names.

In [18]:
p4c.delete_group(groups=group_number['groups'])

{'groups': [1025]}

Reference

Grimes, et al., 2018. Sci. Signal. Vol. 11, Issue 531, DOI: 10.1126/scisignal.aaq1087, http://stke.sciencemag.org/content/11/531/eaaq1087.

