<a href="https://colab.research.google.com/github/cytoscape/cytoscape-automation/blob/master/for-scripters/Python/group-nodes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Group Nodes
## Yihang Xin and Alex Pico
## 2023-05-30

This notebook will show you how to use node grouping functions to manipulate graphs in Cytoscape.



# Installation
The following chunk of code installs the `py4cytoscape` module.

In [1]:
%%capture
!python3 -m pip install python-igraph requests pandas networkx
!python3 -m pip install py4cytoscape

If you are using a remote notebook environment such as Google Colab, please execute the cell below. (If you're running on your local notebook, you don't need to do that.)



In [2]:
#_PY4CYTOSCAPE = 'git+https://github.com/cytoscape/py4cytoscape@1.7.0' # optional
import requests
exec(requests.get("https://raw.githubusercontent.com/cytoscape/jupyter-bridge/master/client/p4c_init.py").text)
IPython.display.Javascript(_PY4CYTOSCAPE_BROWSER_CLIENT_JS) # Start browser client



You should consider upgrading via the '/opt/anaconda3/bin/python -m pip install --upgrade pip' command.


Loading Javascript client ... ff8f7b6e-99e4-42ed-abe7-176ece2fb241 on https://jupyter-bridge.cytoscape.org


<IPython.core.display.Javascript object>

# Prerequisites
In addition to this package (py4cytoscape version 1.7.0), you will need:

* Latest version of Cytoscape, which can be downloaded from https://cytoscape.org/download.html. Simply follow the installation instructions on screen.

* Complete installation wizard

* Launch Cytoscape

You can also install app inside Python notebook by running "py4cytoscape.install_app('Your App')"

# Import the required package


In [3]:
import os
import sys
import pandas as pd
import py4cytoscape as p4c

# Setup Cytoscape


In [4]:
p4c.cytoscape_version_info()

{'apiVersion': 'v1',
 'cytoscapeVersion': '3.9.0',
 'automationAPIVersion': '1.2.0',
 'py4cytoscapeVersion': '0.0.10'}

# Background
The ability to group nodes together into “metanodes” and collapse them to a single node in a graph is useful for simplifying views of a complex network.

The example in this vignette describes application of node grouping functions to data that includes protein-protein interactions and clustered correlations of protein post-translational modifications (Grimes, et al., 2018). This vignette plots five proteins and their modifications, and uses the node grouping functions to manipulate the graph in Cytoscape.



# Example
First we set up the node and edge data frames.

In [5]:
net_nodes = ["ALK", "ALK p Y1078", "ALK p Y1096", "ALK p Y1586", "CTNND1", "CTNND1 p Y193", "CTNND1 p Y217", "CTNND1 p Y228", "CTNND1 p Y241", "CTNND1 p Y248", "CTNND1 p Y302", "CTNND1 p Y904", "CTTN", "CTTN ack K107", "CTTN ack K124", "CTTN ack K147", "CTTN ack K161", "CTTN ack K235", "CTTN ack K390", "CTTN ack K87", "CTTN p S113", "CTTN p S224", "CTTN p Y104", "CTTN p Y154", "CTTN p Y162", "CTTN p Y228", "CTTN p Y334", "CTTN p Y421", "IRS1", "IRS1 p Y632", "IRS1 p Y941", "IRS1 p Y989", "NPM1", "NPM1 ack K154", "NPM1 ack K223", "NPM1 p S214", "NPM1 p S218"]
parent = ["", "ALK", "ALK", "ALK", "", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "", "IRS1", "IRS1", "IRS1", "", "NPM1", "NPM1", "NPM1", "NPM1"]
nodeType = ["protein", "modification", "modification", "modification", "protein", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "protein", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "modification", "protein", "modification", "modification", "modification", "protein", "modification", "modification", "modification", "modification"]

In [6]:
net_genes = []
for n in net_nodes:
    s = n.split()[0]
    net_genes.append(s)

In [7]:
d = {'id': net_nodes, 'Gene.name':net_genes, 'parent': parent, 'nodeType':nodeType}

In [8]:
netnodes_df = pd.DataFrame(data=d)
netnodes_df.head()

Unnamed: 0,id,Gene.name,parent,nodeType
0,ALK,ALK,,protein
1,ALK p Y1078,ALK,ALK,modification
2,ALK p Y1096,ALK,ALK,modification
3,ALK p Y1586,ALK,ALK,modification
4,CTNND1,CTNND1,,protein


In [9]:
# Define edge data
source_nodes = ["ALK", "ALK", "ALK", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "CTNND1", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "CTTN", "IRS1", "IRS1", "IRS1", "NPM1", "NPM1", "NPM1", "NPM1", "ALK p Y1096", "CTNND1 p Y193", "CTNND1 p Y193", "CTNND1 p Y228", "CTNND1 p Y904", "CTNND1 p Y217", "CTNND1 p Y241", "CTNND1 p Y248", "ALK p Y1078", "ALK p Y1096", "ALK p Y1586", "IRS1 p Y941", "CTTN ack K147", "CTTN ack K107", "CTTN ack K235", "CTTN ack K87", "CTTN ack K147", "CTTN ack K124", "CTTN ack K147", "CTTN ack K235", "CTTN ack K161", "CTTN ack K390", "NPM1 ack K223", "NPM1 ack K154", "NPM1 ack K223", "ALK", "CTNND1", "CTNND1", "CTTN", "IRS1"]
target_nodes = ["ALK p Y1078", "ALK p Y1096", "ALK p Y1586", "CTNND1 p Y193", "CTNND1 p Y217", "CTNND1 p Y228", "CTNND1 p Y241", "CTNND1 p Y248", "CTNND1 p Y302", "CTNND1 p Y904", "CTTN ack K107", "CTTN ack K124", "CTTN ack K147", "CTTN ack K161", "CTTN ack K235", "CTTN ack K390", "CTTN ack K87", "CTTN p S113", "CTTN p S224", "CTTN p Y104", "CTTN p Y154", "CTTN p Y162", "CTTN p Y228", "CTTN p Y334", "CTTN p Y421", "IRS1 p Y632", "IRS1 p Y941", "IRS1 p Y989", "NPM1 ack K154", "NPM1 ack K223", "NPM1 p S214", "NPM1 p S218", "ALK p Y1586", "CTNND1 p Y228", "CTNND1 p Y302", "CTNND1 p Y302", "CTTN p Y154", "CTTN p Y162", "CTTN p Y162", "CTTN p Y334", "IRS1 p Y632", "IRS1 p Y989", "IRS1 p Y989", "IRS1 p Y989", "CTTN p S113", "CTTN p S224", "CTTN p S224", "CTTN p S224", "CTTN p Y104", "CTTN p Y228", "CTTN p Y228", "CTTN p Y228", "CTTN p Y421", "CTTN p Y421", "NPM1 p S214", "NPM1 p S218", "NPM1 p S218", "IRS1", "CTTN", "IRS1", "NPM1", "NPM1"]
Weight = [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 0.8060606, 0.7575758, 0.7454545, 0.9393939, 0.8949096, 0.7329699, 0.7553845, 0.7866191, 0.775, 0.6969697, 0.7818182, 0.8424242, -0.7714286, -0.8385965, -0.5017544, -0.7473684, -0.5252838, -0.9428571, -0.8285714, -0.6713287, -0.5508772, -0.9428571, -0.8857143, -0.6310881, -0.8285714, 0.6123365, 2.115272, 0.002461723, 0.3354451, 0.5661711]
d = {'source': source_nodes, 'target':target_nodes, 'Weight': Weight}
netedges_df = pd.DataFrame(data=d)
netedges_df.head()

Unnamed: 0,source,target,Weight
0,ALK,ALK p Y1078,100.0
1,ALK,ALK p Y1096,100.0
2,ALK,ALK p Y1586,100.0
3,CTNND1,CTNND1 p Y193,100.0
4,CTNND1,CTNND1 p Y217,100.0


In [10]:
#create network from data frames
net_suid = p4c.create_network_from_data_frames(netnodes_df, netedges_df, title="Group Nodes Test", collection = "Py4cytoscape Vignettes")

Applying default style...
Applying preferred layout


In [11]:
p4c.layout_network('force-directed')

{}

Note that for convenience the data frame has defined whether a node is a protein or a modification, and also defined the parent node for each modification.

The function selectNodes looks by default for the node SUID, which can be retrieved by getTableColumns. Alternatively, the data frame can be used to distinguish proteins and modifications.

In [12]:
nodedata = p4c.get_table_columns(table='node')
edgedata = p4c.get_table_columns(table='edge')
nodedata.head()

Unnamed: 0,SUID,shared name,name,selected,id,Gene.name,parent,nodeType
77312,77312,IRS1,IRS1,False,IRS1,IRS1,,protein
77315,77315,IRS1 p Y632,IRS1 p Y632,False,IRS1 p Y632,IRS1,IRS1,modification
77318,77318,IRS1 p Y941,IRS1 p Y941,False,IRS1 p Y941,IRS1,IRS1,modification
77321,77321,IRS1 p Y989,IRS1 p Y989,False,IRS1 p Y989,IRS1,IRS1,modification
77324,77324,NPM1,NPM1,False,NPM1,NPM1,,protein


In [13]:
genes = netnodes_df[netnodes_df['nodeType'].str.contains("protein")]['id']
genes = pd.DataFrame(genes)
genes

Unnamed: 0,id
0,ALK
4,CTNND1
12,CTTN
28,IRS1
32,NPM1


In [14]:
#select by gene SUIDs
geneSUIDs = nodedata[nodedata['nodeType'].str.contains("protein")]['SUID']
geneSUIDs = pd.DataFrame(geneSUIDs)
p4c.select_nodes(nodes=list(geneSUIDs['SUID']), preserve_current_selection=False)

{'nodes': [77264, 77312, 77224, 77240, 77324], 'edges': []}

In [15]:
# or by names in the "id" column
p4c.select_nodes(nodes=["ALK","IRS1"],by_col='id', preserve_current_selection=False)

{'nodes': [77312, 77224], 'edges': []}

In [16]:
# or by names based on dataframe subsetting
modifications = netnodes_df[netnodes_df['nodeType'].str.contains("modification")]['id']
p4c.select_nodes(nodes=list(modifications),by_col='id', preserve_current_selection=False)

{'nodes': [77231,
  77234,
  77297,
  77300,
  77237,
  77303,
  77306,
  77243,
  77246,
  77309,
  77249,
  77252,
  77315,
  77318,
  77255,
  77258,
  77321,
  77261,
  77327,
  77330,
  77267,
  77270,
  77333,
  77336,
  77273,
  77276,
  77279,
  77282,
  77285,
  77288,
  77291,
  77294],
 'edges': []}

In [17]:
# Now select one protein and all its modifications
deltacatnodes = netnodes_df[netnodes_df['Gene.name'].str.contains("CTNND1")]['id']
p4c.select_nodes(nodes=list(deltacatnodes),by_col='id', preserve_current_selection=False)

{'nodes': [77249, 77252, 77240, 77255, 77258, 77243, 77246, 77261],
 'edges': []}

Let’s create a new group of the selected nodes and collapse it into one node…



In [18]:
p4c.create_group(group_name='delta catenin group')
p4c.collapse_group(groups='delta catenin group')

{'groups': [77776]}

…then expand it again.



In [19]:
group_number = p4c.expand_group(groups='delta catenin group')
group_number['groups']

[77776]

For these data, we can create groups of all proteins together with their modifications. Here we name the groups by their gene names.

In [20]:
p4c.delete_group(groups=group_number['groups'])

{'groups': [77776]}

Reference

Grimes, et al., 2018. Sci. Signal. Vol. 11, Issue 531, DOI: 10.1126/scisignal.aaq1087, http://stke.sciencemag.org/content/11/531/eaaq1087.

