<a href="https://colab.research.google.com/github/cytoscape/cytoscape-automation/blob/master/for-scripters/Python/importing-network-from-table.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Importing Network From Table


## Yihang Xin and Alex Pico
## 2021-11-16

In addition to importing networks in network file formats, such as sif and xgmml, Cytoscape also supports importing networks from tabular data. In this notebook, the data table represents protein-protein interaction data from a mass-spectrometry experiment.



# Installation
The following chunk of code installs the `py4cytoscape` module.

In [1]:
%%capture
!python3 -m pip install python-igraph requests pandas networkx
!python3 -m pip install py4cytoscape

If you are using a remote notebook environment such as Google Colab, please execute the cell below. (If you're running on your local notebook, you don't need to do that.)



In [2]:
import requests
exec(requests.get("https://raw.githubusercontent.com/cytoscape/jupyter-bridge/master/client/p4c_init.py").text)
IPython.display.Javascript(_PY4CYTOSCAPE_BROWSER_CLIENT_JS) # Start browser client



Unable to revert mtime: /Library/Fonts


Loading Javascript client ... 6108da5e-e09d-4a20-b962-dd54025645a0 on https://jupyter-bridge.cytoscape.org


<IPython.core.display.Javascript object>

# Prerequisites
In addition to this package (py4cytoscape version 0.0.11), you will need:

* Latest version of Cytoscape, which can be downloaded from https://cytoscape.org/download.html. Simply follow the installation instructions on screen.

* Complete installation wizard

* Launch Cytoscape

You can also install app inside Python notebook by running "py4cytoscape.install_app('Your App')"

# Import the required package¶


In [3]:
import os
import sys
import pandas as pd
import py4cytoscape as p4c

# Setup Cytoscape


In [4]:
p4c.cytoscape_version_info()

{'apiVersion': 'v1',
 'cytoscapeVersion': '3.9.1',
 'automationAPIVersion': '1.7.0',
 'py4cytoscapeVersion': '1.7.0'}

# Background
The data used for this protocol represents interactions between human and HIV proteins by Jäger et al (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3310911/). In this quantitative AP-MS experiment, a relatively small number of bait proteins were used to pull down a larger set of prey proteins.



# Import Network


First we need to read in the example data file:



In [42]:
gu_core_data = pd.read_table("/Users/shuhanliu/Downloads/individual_project/PlaqueMS_data/PlaqueMS_data/Networks/gu_core_filtered_directed_network.txt")

In [43]:
gu_core_data.head()

Unnamed: 0,Regulator,Target,MI,pvalue,directionality
0,SFRP2,ITB1,0.31047,0.04063898,-0.050244
1,TRFE,IGKC,0.406766,2.180235e-07,0.324126
2,FBLN1,SAA1,0.241169,0.002924214,0.192902
3,ANXA6,LRP1,0.396194,0.04063898,0.017116
4,SAA4,RARR2,0.293181,0.04063898,0.047233


Now we can create a data frame for the network edges (interactions) using the imported data. We can also add the AP-MS score from the data as an edge attribute:

In [44]:
edge_data = {'source':gu_core_data["Regulator"],
             'target':gu_core_data["Target"],
             'MI':gu_core_data["MI"],
             'pvalue':gu_core_data["pvalue"],
             'directionality':gu_core_data["directionality"]
            }
edges = pd.DataFrame(data=edge_data, columns=['source', 'target','MI','pvalue','directionality'])
edges.head()

Unnamed: 0,source,target,MI,pvalue,directionality
0,SFRP2,ITB1,0.31047,0.04063898,-0.050244
1,TRFE,IGKC,0.406766,2.180235e-07,0.324126
2,FBLN1,SAA1,0.241169,0.002924214,0.192902
3,ANXA6,LRP1,0.396194,0.04063898,0.017116
4,SAA4,RARR2,0.293181,0.04063898,0.047233


Finally, we use the edge data fram to create the network. Note that we don’t need to define a data frame for nodes, as all nodes in this case are represented in the edge data frame.



In [45]:
p4c.create_network_from_data_frames(edges=edges, title='gu_core network', collection="gu_core collection")

Applying default style...
Applying preferred layout


89824

In [46]:
p4c.get_table_columns()

Unnamed: 0,SUID,shared name,id,name,selected
90113,90113,HPLN1,HPLN1,HPLN1,False
90626,90626,IGHG4,IGHG4,IGHG4,False
90629,90629,ANXA1,ANXA1,ANXA1,False
90116,90116,LG3BP,LG3BP,LG3BP,False
90119,90119,CO4A2,CO4A2,CO4A2,False
...,...,...,...,...,...
90104,90104,CFAB,CFAB,CFAB,False
90107,90107,LEG1,LEG1,LEG1,False
90620,90620,TSP1,TSP1,TSP1,False
90623,90623,C1R,C1R,C1R,False


The imported network consists of multiple smaller subnetworks, each representing a bait node and its associated prey nodes.



下面开始拼接log2FC的数据

In [48]:
gu_core_CalcifiedVSNon_calcified_data = pd.read_table("/Users/shuhanliu/Downloads/individual_project/PlaqueMS_data/PlaqueMS_data/Statistics/diff_exp_resultsCalcifiedVSNon-calcified_gu_core.txt",index_col=0)
gu_core_CalcifiedVSNon_calcified_data.head()

Unnamed: 0,logFC,CI.L,CI.R,AveExpr,t,P.Value,adj.P.Val,B,Unnamed: 9
FETUA,1.556751,0.96042,2.153082,6.096402,-5.174453,1e-06,0.000328,5.331106,
OSTP,1.298994,0.71085,1.887138,5.993654,-4.377802,2.8e-05,0.004291,2.308815,
CERU,-0.737408,-1.10048,-0.374336,6.605667,4.025763,0.000106,0.008182,1.082575,
APOC2,0.764442,0.379058,1.149826,6.588996,-3.931726,0.000149,0.008182,0.767583,
COLA1,0.602471,0.298252,0.90669,6.361054,-3.925384,0.000153,0.008182,0.746535,


In [49]:
df_dict = {'logFC':gu_core_CalcifiedVSNon_calcified_data["logFC"],
             'CI.L':gu_core_CalcifiedVSNon_calcified_data["CI.L"],
             'CI.R':gu_core_CalcifiedVSNon_calcified_data["CI.R"],
             'AveExpr':gu_core_CalcifiedVSNon_calcified_data["AveExpr"],
             't':gu_core_CalcifiedVSNon_calcified_data["t"],
           'P.Value':gu_core_CalcifiedVSNon_calcified_data["P.Value"],
             'adj.P.Val':gu_core_CalcifiedVSNon_calcified_data["adj.P.Val"],
           'B':gu_core_CalcifiedVSNon_calcified_data["B"]
            }
df = pd.DataFrame(data=df_dict, columns=['logFC','CI.L','CI.R','AveExpr','t', 'P.Value','adj.P.Val','B'])
df.head()

Unnamed: 0,logFC,CI.L,CI.R,AveExpr,t,P.Value,adj.P.Val,B
FETUA,1.556751,0.96042,2.153082,6.096402,-5.174453,1e-06,0.000328,5.331106
OSTP,1.298994,0.71085,1.887138,5.993654,-4.377802,2.8e-05,0.004291,2.308815
CERU,-0.737408,-1.10048,-0.374336,6.605667,4.025763,0.000106,0.008182,1.082575
APOC2,0.764442,0.379058,1.149826,6.588996,-3.931726,0.000149,0.008182,0.767583
COLA1,0.602471,0.298252,0.90669,6.361054,-3.925384,0.000153,0.008182,0.746535


In [50]:
p4c.load_table_data(df)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_subset[col] = col_val


'Success: Data loaded in defaultnode table'

In [51]:
p4c.get_table_columns()

Unnamed: 0,SUID,shared name,id,name,selected,logFC,CI.L,CI.R,AveExpr,t,P.Value,adj.P.Val,B,row.names
90113,90113,HPLN1,HPLN1,HPLN1,False,0.084057,-0.263428,0.431542,5.576715,-0.479479,0.632565,0.794564,-6.113795,HPLN1
90626,90626,IGHG4,IGHG4,IGHG4,False,-0.436433,-0.763791,-0.109075,6.475532,2.642576,0.009448,0.069512,-2.945488,IGHG4
90629,90629,ANXA1,ANXA1,ANXA1,False,-0.015234,-0.245899,0.215432,6.482024,0.130906,0.896093,0.953087,-6.216917,ANXA1
90116,90116,LG3BP,LG3BP,LG3BP,False,-0.426231,-0.775095,-0.077367,6.033263,2.421710,0.017110,0.096128,-3.456676,LG3BP
90119,90119,CO4A2,CO4A2,CO4A2,False,0.356603,0.021515,0.691692,6.162156,-2.109397,0.037217,0.147816,-4.110942,CO4A2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
90104,90104,CFAB,CFAB,CFAB,False,-0.308860,-0.553196,-0.064523,6.748105,2.505566,0.013715,0.084237,-3.267239,CFAB
90107,90107,LEG1,LEG1,LEG1,False,0.008214,-0.349820,0.366248,6.126948,-0.045475,0.963812,0.976452,-6.224229,LEG1
90620,90620,TSP1,TSP1,TSP1,False,0.060086,-0.266739,0.386911,6.519813,-0.364411,0.716263,0.854537,-6.160834,TSP1
90623,90623,C1R,C1R,C1R,False,-0.162295,-0.442498,0.117908,6.744808,1.148061,0.253476,0.518702,-5.589659,C1R


开始分簇
下面是使用log2fc变颜色的代码
from py4cytoscape import palette_color_brewer_d_RdBu
p4c.set_node_color_mapping(**gen_node_color_map('log2fc', palette_color_brewer_d_RdBu())

此处生成了一堆cluster
commands.commands_post('mcode cluster degreeCutoff=2 fluff=true fluffNodeDensityCutoff=0.1 haircut=true includeLoops=false kCore=2 maxDepthFromStart=100 network=current nodeScoreCutoff=0.2 scope=NETWORK')

In [53]:
p4c.commands.commands_post('mcode cluster degreeCutoff=2 fluff=true fluffNodeDensityCutoff=0.1 haircut=true includeLoops=false kCore=2 maxDepthFromStart=100 network=current nodeScoreCutoff=0.2 scope=NETWORK')

{'id': 1,
 'parameters': {'scope': 'NETWORK',
  'includeLoops': False,
  'degreeCutoff': 2,
  'kCore': 2,
  'maxDepthFromStart': 100,
  'nodeScoreCutoff': 0.2,
  'haircut': True,
  'fluff': True,
  'fluffNodeDensityCutoff': 0.1,
  'selectedNodes': []},
 'clusters': [{'rank': 1,
   'name': 'Cluster 1',
   'score': 41.22605363984674,
   'seedNode': 90227,
   'nodes': [90113,
    90497,
    90182,
    90122,
    90701,
    90062,
    90257,
    90389,
    89876,
    90710,
    90653,
    90524,
    90593,
    89891,
    90275,
    90473,
    90536,
    90602,
    90668,
    90608,
    90227,
    90674,
    90356,
    90041,
    90617,
    90296,
    89900,
    89933,
    89960,
    89966,
    89975,
    89993,
    90005,
    90026,
    90029,
    90089,
    90125,
    90137,
    90155,
    90170,
    90197,
    90221,
    90236,
    90281,
    90290,
    90302,
    90311,
    90317,
    90362,
    90374,
    90422,
    90440,
    90482,
    90491,
    90494,
    90509,
    90515,
    9054