<a href="https://colab.research.google.com/github/cytoscape/cytoscape-automation/blob/master/for-scripters/Python/wikiPathways-and-py4cytoscape.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# WikiPathways and py4cytoscape
## Yihang Xin and Alex Pico
## 2020-11-10

WikiPathways is a well-known repository for biological pathways that provides unique tools to the research community for content creation, editing and utilization [@Pico2008].

Python is an interpreted, high-level and general-purpose programming language.

py4cytoscape leverages the WikiPathways API to communicate between Python and WikiPathways, allowing any pathway to be queried, interrogated and downloaded in both data and image formats. Queries are typically performed based on “Xrefs”, standardized identifiers for genes, proteins and metabolites. Once you can identified a pathway, you can use the WPID (WikiPathways identifier) to make additional queries.

py4cytoscape leverages the CyREST API to provide a number of functions related to network visualization and analysis. 


# Installation
The following chunk of code installs the `py4cytoscape` module.

In [None]:
%%capture
!python3 -m pip install python-igraph requests pandas networkx
!python3 -m pip install py4cytoscape

If you are using a remote notebook environment such as Google Colab, please execute the cell below. (If you're running on your local notebook, you don't need to do that.)



In [None]:
import requests
exec(requests.get("https://raw.githubusercontent.com/cytoscape/jupyter-bridge/master/client/p4c_init.py").text)
IPython.display.Javascript(_PY4CYTOSCAPE_BROWSER_CLIENT_JS) # Start browser client

# Prerequisites
## In addition to this package (py4cytoscape latest version 0.0.9), you will need:

* Latest version of Cytoscape, which can be downloaded from https://cytoscape.org/download.html. Simply follow the installation instructions on screen.
* Complete installation wizard
* Launch Cytoscape

For this vignette, you’ll also need the WikiPathways app to access the WikiPathways database from within Cytoscape. 

Install the WikiPathways app from http://apps.cytoscape.org/apps/wikipathways

Install the filetransfer app from https://apps.cytoscape.org/apps/filetransfer

You can also install app inside Python notebook by running "py4cytoscape.install_app('Your App')"

# Import the required package

In [None]:
import os
import sys
import requests
import pandas as pd
from lxml import etree as ET
from collections import OrderedDict
import py4cytoscape as p4c

In [None]:
# Check Version
p4c.cytoscape_version_info()

{'apiVersion': 'v1',
 'cytoscapeVersion': '3.8.2',
 'automationAPIVersion': '1.0.0',
 'py4cytoscapeVersion': '0.0.7'}

# Working together
Ok, with all of these components loaded and launched, you can now perform some nifty sequences. For example, search for a pathway based on a keyword search and then load it into Cytoscape.

In [None]:
def find_pathways_by_text(query, species):
    base_iri = 'http://webservice.wikipathways.org/'
    request_params = {'query':query, 'species':species}
    response = requests.get(base_iri + 'findPathwaysByText', params=request_params)
    return response

In [None]:
response = find_pathways_by_text("colon cancer", "Homo sapiens")

In [None]:
def find_pathway_dataframe(response):
    data = response.text
    dom = ET.fromstring(data)
    pathways = []
    NAMESPACES = {'ns1':'http://www.wso2.org/php/xsd','ns2':'http://www.wikipathways.org/webservice/'}
    for node in dom.findall('ns1:result', NAMESPACES):
        pathway_using_api_terms = {}
        for child in node:
            pathway_using_api_terms[ET.QName(child).localname] = child.text
            pathways.append(pathway_using_api_terms)
    id_list = []
    score_list = []
    url_list = []
    name_list = []
    species_list = []
    revision_list = []
    for p in pathways:
        id_list.append(p["id"])
        score_list.append(p["score"])
        url_list.append(p["url"])
        name_list.append(p["name"])
        species_list.append(p["species"])
        revision_list.append(p["revision"])
    df = pd.DataFrame(list(zip(id_list,score_list,url_list,name_list,species_list,revision_list)), columns =['id', 'score','url','name','species','revision'])
    return df

In [None]:
df = find_pathway_dataframe(response)
df.head(10)

Unnamed: 0,id,score,url,name,species,revision
0,WP4290,3.2125466,https://www.wikipathways.org/index.php/Pathway...,Metabolic reprogramming in colon cancer,Homo sapiens,113958
1,WP4290,3.2125466,https://www.wikipathways.org/index.php/Pathway...,Metabolic reprogramming in colon cancer,Homo sapiens,113958
2,WP4290,3.2125466,https://www.wikipathways.org/index.php/Pathway...,Metabolic reprogramming in colon cancer,Homo sapiens,113958
3,WP4290,3.2125466,https://www.wikipathways.org/index.php/Pathway...,Metabolic reprogramming in colon cancer,Homo sapiens,113958
4,WP4290,3.2125466,https://www.wikipathways.org/index.php/Pathway...,Metabolic reprogramming in colon cancer,Homo sapiens,113958
5,WP4290,3.2125466,https://www.wikipathways.org/index.php/Pathway...,Metabolic reprogramming in colon cancer,Homo sapiens,113958
6,WP4239,2.9802983,https://www.wikipathways.org/index.php/Pathway...,Epithelial to mesenchymal transition in colore...,Homo sapiens,111457
7,WP4239,2.9802983,https://www.wikipathways.org/index.php/Pathway...,Epithelial to mesenchymal transition in colore...,Homo sapiens,111457
8,WP4239,2.9802983,https://www.wikipathways.org/index.php/Pathway...,Epithelial to mesenchymal transition in colore...,Homo sapiens,111457
9,WP4239,2.9802983,https://www.wikipathways.org/index.php/Pathway...,Epithelial to mesenchymal transition in colore...,Homo sapiens,111457


We have a list of human pathways that mention “Colon Cancer”. The results include lots of information, so let’s get a unique list of just the WPIDs.

In [None]:
unique_id = list(OrderedDict.fromkeys(df["id"]))
unique_id[0]

'WP4290'

Let’s import the first one of these into Cytoscape!

In [None]:
cmd_list = ['wikipathways','import-as-pathway','id="',unique_id[0],'"']
cmd = " ".join(cmd_list)
p4c.commands.commands_get(cmd) 

[]

Once in Cytoscape, you can load data, apply visual style mappings, perform analyses, and export images and data formats. See py4cytoscape package for details.

# From networks to pathways
If you are already with with networks and data in Cytoscape, you may end up focusing on one or few particular genes, proteins or metabolites, and want to query WikiPathways.

For example, let’s open a sample network from Cytoscape and identify the gene with the largest number of connections, i.e., node degree.

Note: this next chunk will overwrite your current session. Save if you want to keep anything.

In [None]:
p4c.session.open_session()

Opening sampleData/sessions/Yeast Perturbation.cys...


{}

In [None]:
net_data = p4c.tables.get_table_columns(columns=['name','degree.layout','COMMON'])

In [None]:
max_gene = net_data[net_data["degree.layout"] == net_data["degree.layout"].max()]
max_gene

Unnamed: 0,name,degree.layout,COMMON
684,YMR043W,18,MCM1


Great. It looks like MCM1 has the larget number of connections (18) in this network. Let’s use it’s identifier (YMR043W) to query WikiPathways to learn more about the gene and its biological role, and load it into Cytoscape.

Pro-tip: We need to know the datasource that provides a given identifier. In this case, it’s sort of tricky: Ensembl provides these Yeast ORF identifiers for this organism rather than they typical format. So, we’ll include the ‘En’ system code. See other vignettes for more details.

In [None]:
def find_pathways_by_xref(ids, codes):
    base_iri = 'http://webservice.wikipathways.org/'
    request_params = {'ids':ids, 'codes':codes}
    response = requests.get(base_iri + 'findPathwaysByXref', params=request_params)
    return response

In [None]:
response = find_pathways_by_xref('YMR043W','En')
mcm1_pathways = find_pathway_dataframe(response)

In [None]:
unique_id = list(OrderedDict.fromkeys(mcm1_pathways["id"]))
unique_id = "".join(unique_id)
unique_id

'WP510'

In [None]:
cmd_list = ['wikipathways','import-as-pathway','id="',unique_id,'"']
cmd = " ".join(cmd_list)
p4c.commands.commands_get(cmd) 

[]

And we can easily select the MCM1 node by name in the newly imported pathway to help see where exactly it plays its role.



In [None]:
p4c.network_selection.select_nodes(['Mcm1'], by_col='name')

{'nodes': [2017, 2020], 'edges': []}