# Loading Networks
## Yihang Xin
## 2020-11-14

In Cytoscape, network data can be loaded from a variety of sources, and in several different formats. Where you get your network data depends on your biological question and analysis plan. This tutorial outlines how to load network data from several popular sources and formats.

1. Public databases
    * NDEx
    * PSICQUIC
    * STRING/STITCH
    * WikiPathways
2. Local and remote files
3. Cytoscape apps (Biopax, KEGG and other formats)


# Prerequisites
## In addition to this package (py4cytoscape latest version 0.0.6), you will need:
* Download the latest Cytoscape from http://www.cytoscape.org/download.php
* Complete installation wizard
* Launch Cytoscape
For this vignette, you’ll also need the WikiPathways app to access the WikiPathways database from within Cytoscape.

Install the WikiPathways app from http://apps.cytoscape.org/apps/wikipathways

Install the STRING app from https://apps.cytoscape.org/apps/stringapp

## Import the required package


In [1]:
import os
import sys
import requests
import pandas as pd
import py4cytoscape as p4c
import ndex2.client as nc
import wget
from lxml import etree as ET

In [2]:
# Check Version
p4c.cytoscape_version_info()

{'apiVersion': 'v1',
 'cytoscapeVersion': '3.8.2',
 'automationAPIVersion': '1.0.0',
 'py4cytoscapeVersion': '0.0.6'}

# Networks from Public Data
Cytoscape includes a Network Search tool for easy import of public network data. In addition to core apps that are included with your Cytoscape installation (NDEx and PSICQUIC), the resources listed here will depend on which apps you have installed.

In [3]:
p4c.apps.get_installed_apps()

[{'appName': 'CyNDEx-2',
  'version': '3.3.1',
  'description': 'null',
  'status': 'Installed'},
 {'appName': 'Diffusion',
  'version': '1.6.1',
  'description': 'null',
  'status': 'Installed'},
 {'appName': 'WikiPathways',
  'version': '3.3.7',
  'description': 'null',
  'status': 'Installed'},
 {'appName': 'Network Merge',
  'version': '3.9.2',
  'description': 'null',
  'status': 'Installed'},
 {'appName': 'JSON Support',
  'version': '3.7.0',
  'description': 'null',
  'status': 'Installed'},
 {'appName': 'BioPAX Reader',
  'version': '3.4.0',
  'description': 'null',
  'status': 'Installed'},
 {'appName': 'cyBrowser',
  'version': '1.2.3',
  'description': 'null',
  'status': 'Installed'},
 {'appName': 'PSI-MI Reader',
  'version': '3.4.0',
  'description': 'null',
  'status': 'Installed'},
 {'appName': 'SBML Reader',
  'version': '3.4.0',
  'description': 'null',
  'status': 'Installed'},
 {'appName': 'cyChart',
  'version': '0.3.0',
  'description': 'null',
  'status': 'Instal

# NDEx
The NDEx Project provides an open-source framework where scientists and organizations can share, store, manipulate, and publish biological network knowledge.
* To search NDEx run the following code chunk. Here, we use “TP53 AND BARD1” as our search terms.

In [4]:
anon_ndex=nc.Ndex2("http://public.ndexbio.org")
anon_ndex.update_status()

In [5]:
networks = anon_ndex.search_networks(search_string='TP53 AND BARD1')
df_dict = networks["networks"]

In [6]:
ownerUUID_list = []
externalId_list = []
nodeCount_list = []
edgeCount_list = []

In [7]:
for d in df_dict:
    ownerUUID_list.append(d["ownerUUID"])
    externalId_list.append(d["externalId"])
    nodeCount_list.append(d["nodeCount"])
    edgeCount_list.append(d["edgeCount"])

In [8]:
df = pd.DataFrame(list(zip(ownerUUID_list,externalId_list,nodeCount_list,edgeCount_list)), columns =['ownerUUID', 'externalId','nodeCount','edgeCount'])
df.head()

Unnamed: 0,ownerUUID,externalId,nodeCount,edgeCount
0,301a91c6-a37b-11e4-bda0-000c29202374,5a1fcfb9-78c3-11e8-a4bf-0ac135e8bacf,30,101
1,363f49e0-4cf0-11e9-9f06-0ac135e8bacf,0d4f26c3-f912-11ea-99da-0ac135e8bacf,255,403
2,363f49e0-4cf0-11e9-9f06-0ac135e8bacf,7f6602f1-f916-11ea-99da-0ac135e8bacf,213,198
3,363f49e0-4cf0-11e9-9f06-0ac135e8bacf,c8a2cdf5-204b-11ea-bb65-0ac135e8bacf,213,198
4,363f49e0-4cf0-11e9-9f06-0ac135e8bacf,fdfc44e6-f911-11ea-99da-0ac135e8bacf,59,51


In [9]:
networkId = df["externalId"][0]

To import the network into Cytoscape, run the following code chunk.


In [10]:
p4c.cy_ndex.import_network_from_ndex(networkId)

9677

# STRING/STITCH
STRING is a database of known and predicted protein-protein interactions, and STITCH stored known and predicted interactions between chemicals and proteins. Data types include:

* Genomic Context Predictions
* High-throughput Lab Experiments
* (Conserved) Co-Expression
* Automated Textmining
* Previous Knowledge in Databases

To search STRING with the disease keyword “ovarian cancer”, run the following code chunk. (The resulting network will load automatically.)

In [11]:
string_cmd_list = ['string disease query','disease="ovarian cancer"']
string_cmd = " ".join(string_cmd_list)
p4c.commands.commands_run(string_cmd)

["Loaded network 'String Network - ovarian cancer - 1' with 100 nodes and 2238 edges"]

* Networks load with a STRING-specific style, which includes 3D protein structure diagrams.


In [12]:
p4c.network_views.export_image('ovarian_cancer', type='PNG')

{'file': 'C:\\Users\\YihangXin\\CytoscapeConfiguration\\filetransfer\\default_sandbox\\ovarian_cancer.png'}

* STRING networks also inlcude data as node/interaction attributes, that can be used to create a Style.

In [13]:
column_names = p4c.tables.get_table_column_names()
column_names.remove( 'stringdb::structures')

In [14]:
df = p4c.tables.get_table_columns(columns=column_names)
df.head()

Unnamed: 0,SUID,shared name,name,selected,stringdb::canonical name,display name,stringdb::full name,stringdb::database identifier,stringdb::description,@id,...,tissue::muscle,tissue::nervous system,tissue::pancreas,tissue::saliva,tissue::skin,tissue::spleen,tissue::stomach,tissue::thyroid gland,tissue::urine,stringdb::disease score
9984,9984,9606.ENSP00000386559,9606.ENSP00000386559,False,P01732,CD8A,,9606.ENSP00000386559,T-lymphocyte differentiation antigen T8/Leu-2;...,stringdb:9606.ENSP00000386559,...,2.71942,3.33795,2.62511,2.6662,3.09994,4.76379,2.81467,2.36684,2.27309,2.937
9985,9985,9606.ENSP00000288602,9606.ENSP00000288602,False,Q3MIN6,BRAF,,9606.ENSP00000288602,v-Raf murine sarcoma viral oncogene homolog B1...,stringdb:9606.ENSP00000288602,...,2.15169,4.61342,1.86505,1.29641,2.39333,2.30836,1.81651,2.63527,1.30629,3.14864
9986,9986,9606.ENSP00000320147,9606.ENSP00000320147,False,Q15910,EZH2,,9606.ENSP00000320147,Enhancer of zeste 2 polycomb repressive comple...,stringdb:9606.ENSP00000320147,...,2.35256,4.65875,4.42839,1.28562,2.96434,2.30101,2.12785,1.856,1.49375,2.57885
9987,9987,9606.ENSP00000263025,9606.ENSP00000263025,False,P27361,MAPK3,,9606.ENSP00000263025,Extracellular signal-regulated kinase 1; Serin...,stringdb:9606.ENSP00000263025,...,3.57996,4.67236,3.02153,2.6346,3.41705,3.20468,3.21876,3.08721,2.4265,2.61742
9988,9988,9606.ENSP00000354558,9606.ENSP00000354558,False,P42345,MTOR,,9606.ENSP00000354558,FK506-binding protein 12-rapamycin complex-ass...,stringdb:9606.ENSP00000354558,...,3.44592,4.90812,2.93785,2.15169,3.943,2.82402,2.56994,2.6197,1.99613,2.58686


* The STRING app includes options to change interaction confidence level, expand the network etc.

In [15]:
p4c.networks.get_edge_count() #Before changing interaction confidence level

2238

In [16]:
string_cmd_list = ['string change confidence confidence=0.9 network=CURRENT']
string_cmd = " ".join(string_cmd_list)
p4c.commands.commands_run(string_cmd)

['']

In [17]:
p4c.networks.get_edge_count() #After changing interaction confidence level

443

In [18]:
p4c.network_views.export_image('before_expand', type='PNG')

{'file': 'C:\\Users\\YihangXin\\CytoscapeConfiguration\\filetransfer\\default_sandbox\\before_expand.png'}

In [19]:
string_cmd_list = ['string expand network=CURRENT']
string_cmd = " ".join(string_cmd_list)
p4c.commands.commands_run(string_cmd)

["Loaded network 'String Network - ovarian cancer - 1' with 110 nodes and 613 edges"]

In [20]:
p4c.network_views.export_image('after_expand', type='PNG')

{'file': 'C:\\Users\\YihangXin\\CytoscapeConfiguration\\filetransfer\\default_sandbox\\after_expand.png'}

# WikiPathways

WikiPathways is a collaborative wiki platform with manually pathway models. It currently covers over 2,600 pathways in 25 species-specific collections.

* To search WikiPathways, load rWikiPathways and call the find_pathways_by_text function with your search terms (here we use ‘statin’ as the term)

In [21]:
def find_pathways_by_text(query, species):
    base_iri = 'http://webservice.wikipathways.org/'
    request_params = {'query':query, 'species':species}
    response = requests.get(base_iri + 'findPathwaysByText', params=request_params)
    return response

In [22]:
response = find_pathways_by_text("statin", "Homo sapiens") # restrict the results to Homo sapiens

In [23]:
def find_pathway_dataframe(response):
    data = response.text
    dom = ET.fromstring(data)
    pathways = []
    NAMESPACES = {'ns1':'http://www.wso2.org/php/xsd','ns2':'http://www.wikipathways.org/webservice/'}
    for node in dom.findall('ns1:result', NAMESPACES):
        pathway_using_api_terms = {}
        for child in node:
            pathway_using_api_terms[ET.QName(child).localname] = child.text
            pathways.append(pathway_using_api_terms)
    id_list = []
    score_list = []
    url_list = []
    name_list = []
    species_list = []
    revision_list = []
    for p in pathways:
        id_list.append(p["id"])
        score_list.append(p["score"])
        url_list.append(p["url"])
        name_list.append(p["name"])
        species_list.append(p["species"])
        revision_list.append(p["revision"])
    df = pd.DataFrame(list(zip(id_list,score_list,url_list,name_list,species_list,revision_list)), columns =['id', 'score','url','name','species','revision'])
    return df

In [24]:
df = find_pathway_dataframe(response)
df.head(10)

Unnamed: 0,id,score,url,name,species,revision
0,WP430,4.9863944,https://www.wikipathways.org/index.php/Pathway...,Statin Pathway,Homo sapiens,108375
1,WP430,4.9863944,https://www.wikipathways.org/index.php/Pathway...,Statin Pathway,Homo sapiens,108375
2,WP430,4.9863944,https://www.wikipathways.org/index.php/Pathway...,Statin Pathway,Homo sapiens,108375
3,WP430,4.9863944,https://www.wikipathways.org/index.php/Pathway...,Statin Pathway,Homo sapiens,108375
4,WP430,4.9863944,https://www.wikipathways.org/index.php/Pathway...,Statin Pathway,Homo sapiens,108375
5,WP430,4.9863944,https://www.wikipathways.org/index.php/Pathway...,Statin Pathway,Homo sapiens,108375
6,WP3590,3.3777823,https://www.wikipathways.org/index.php/Pathway...,Demo,Homo sapiens,106743
7,WP3590,3.3777823,https://www.wikipathways.org/index.php/Pathway...,Demo,Homo sapiens,106743
8,WP3590,3.3777823,https://www.wikipathways.org/index.php/Pathway...,Demo,Homo sapiens,106743
9,WP3590,3.3777823,https://www.wikipathways.org/index.php/Pathway...,Demo,Homo sapiens,106743


In [25]:
df = df.drop_duplicates()
df = df.reset_index(drop=True)
df

Unnamed: 0,id,score,url,name,species,revision
0,WP430,4.9863944,https://www.wikipathways.org/index.php/Pathway...,Statin Pathway,Homo sapiens,108375
1,WP3590,3.3777823,https://www.wikipathways.org/index.php/Pathway...,Demo,Homo sapiens,106743
2,WP3539,3.2797348,https://www.wikipathways.org/index.php/Pathway...,WikiPathways Tutorial: demo_step3,Homo sapiens,106739
3,WP3418,3.2633936,https://www.wikipathways.org/index.php/Pathway...,Demo_complete,Homo sapiens,106736


In [26]:
cmd_list = ['wikipathways','import-as-pathway','id="',df["id"][0],'"']
cmd = " ".join(cmd_list)
p4c.commands.commands_get(cmd) 

[]

To open the pathway as a network, run the following chunk.



In [27]:
cmd_list = ['wikipathways','import-as-network','id="',df["id"][0],'"']
cmd = " ".join(cmd_list)
p4c.commands.commands_get(cmd) 

[]

# Local and Remote Files
Cytoscape can load locally and remotely stored network data files in a variety of file formats:

- SIF: Simple interaction format
- NNF: Nested network format
- GML and XGMML formats
- CYS: Cytoscape session file
- Delimited text and Excel format

## Loading SIF files
SIF is a simple interaction format consisting of three columns of data: source, interaction and target. To learn more about the SIF format, see the Cytoscape manual.

Download galFiltered.sif and load the network via

In [28]:
sif_url = "https://cytoscape.github.io/cytoscape-tutorials/protocols/data/galFiltered.sif"
file_name = wget.download(sif_url)
file_name

'galFiltered.sif'

In [29]:
p4c.sandbox.sandbox_send_to(file_name)

{'filePath': 'C:\\Users\\YihangXin\\CytoscapeConfiguration\\filetransfer\\default_sandbox\\galFiltered.sif'}

In [30]:
p4c.networks.import_network_from_file(file_name)

{'networks': [15709], 'views': [16412]}

- To see the whole network, run

In [31]:
p4c.network_views.fit_content()

{}

## Loading XGMML files
XGMML is an XML format and can includes node and edge attributes as well as visual style properties. To learn more about the XGMML format, see the Cytoscape manual.

Download https://raw.githubusercontent.com/cytoscape/cytoscape-tutorials/gh-pages/protocols/data/BasicDataVizDemo.xgmml and load the network via

In [32]:
xgmll_url = "https://raw.githubusercontent.com/cytoscape/cytoscape-tutorials/gh-pages/protocols/data/BasicDataVizDemo.xgmml"
file_name = wget.download(xgmll_url)
file_name

'BasicDataVizDemo.xgmml'

In [33]:
p4c.sandbox.sandbox_send_to(file_name)

{'filePath': 'C:\\Users\\YihangXin\\CytoscapeConfiguration\\filetransfer\\default_sandbox\\BasicDataVizDemo.xgmml'}

In [34]:
p4c.networks.import_network_from_file(file_name)

{'networks': [17106], 'views': [17478]}