# Cancer networks and data

This notebook will demonstrate network retrieval from the STRING database, basic analysis, TCGA data loading and visualization in Cytoscape from Python using the py4cytoscape package.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cytoscape/py4cytoscape/blob/0.0.11/doc/tutorials/Cancer-networks-and-data.ipynb)

**by Kozo Nishida, Alexander Pico, Barry Demchak**

**py4cytoscape 0.0.11**

## Prerequisites
In addition to this package (py4cytoscape), you will need:

- Cytoscape 3.8 or greater, which can be downloaded from https://cytoscape.org/download.html. Simply follow the installation instructions on screen.
- Complete installation wizard
- Launch Cytoscape
- If your Cytoscape is 3.8.2 or earlier, install **FileTransfer App** (Follow [here](https://py4cytoscape.readthedocs.io/en/0.0.10/tutorials/index.html) to do it.)

**NOTE: To run this notebook, you must manually start Cytoscape first – don’t proceed until you have started Cytoscape.**

### Setup required only in a remote notebook environment

If you’re using a remote Jupyter Notebook environment such as Google Colab, run the cell below. (If you’re running a local Jupyter Notebook server on the desktop machine same with Cytoscape, you don’t need to do that.)


In [None]:
_PY4CYTOSCAPE = 'git+https://github.com/cytoscape/py4cytoscape@0.0.11'
import requests
exec(requests.get("https://raw.githubusercontent.com/cytoscape/jupyter-bridge/master/client/p4c_init.py").text)
IPython.display.Javascript(_PY4CYTOSCAPE_BROWSER_CLIENT_JS) # Start browser client

Note that to use the current py4cytoscape release (instead of v0.0.11), remove the _PY4CYTOSCAPE= line in the snippet above.


### Sanity test to verify Cytoscape connection
By now, the connection to Cytoscape should be up and available. To verify this, try a simple operation that doesn't alter the state of Cytoscape, but verifies that you have everything installed.

In [1]:
import py4cytoscape as p4c

In [2]:
p4c.cytoscape_ping()

You are connected to Cytoscape!


'You are connected to Cytoscape!'

In [3]:
p4c.install_app('STRINGapp')

In commands_post(): java.lang.NullPointerException


CyError: In commands_post(): java.lang.NullPointerException

## Getting Disease Networks

Use Cytoscape to query the STRING database for networks of genes associated with breast cancer and ovarian cancer.

**If the STRING app is not installed, no error is reported, but your network will be empty**

### Query STRING database by disease to generate networks
#### Breast cancer

In [None]:
string_cmd = 'string disease query disease="breast cancer" cutoff=0.9 species="Homo sapiens" limit=150'
p4c.commands_run(string_cmd)

In [None]:
p4c.notebook_export_show_image()

Here we are using Cytoscape’s command line syntax, which can be used for any core or app automation function, and then making a GET request. Use *p4c.commands_help* to interrogate the functions and parameters available in your active Cytoscape session, including the apps you’ve installed!

In [None]:
p4c.commands_help('string')

In [None]:
p4c.commands_help('string disease query')

#### Ovarian cancer

In [None]:
string_cmd = 'string disease query disease="ovarian cancer" cutoff=0.9 species="Homo sapiens" limit=150'
p4c.commands_run(string_cmd)

In [None]:
p4c.notebook_export_show_image()

## Interacting with Cytoscape
Now that we’ve got a couple networks into Cytoscape, let’s see what we can do with them from Python…

### Get list of networks

In [None]:
p4c.get_network_list()

### Layout network

In [None]:
p4c.layout_network(layout_name='circular')

In [None]:
p4c.notebook_export_show_image()

#### List of layout algorithms available

In [None]:
p4c.get_layout_names()

#### Layout with parameters!

In [None]:
p4c.get_layout_property_names(layout_name='force-directed')

In [None]:
p4c.layout_network('force-directed defaultSpringCoefficient=0.0000008 defaultSpringLength=70')

In [None]:
p4c.notebook_export_show_image()

### Get table data from network

Now, let’s look at the tablular data associated with our STRING networks…

In [None]:
p4c.get_table_column_names('node')

One of the great things about the STRING database is all the node and edge attriubtes they provide. Let’s pull some of it into Python to play with…

#### Retrieve disease scores
We can retrieve any set of columns from Cytoscape and store them as a Python pandas.DataFrame keyed by SUID. In this case, let’s retrieve the disease score column from the node table. Those will be our two parameters:

In [None]:
disease_score_table = p4c.get_table_columns('node','stringdb::disease score')

In [None]:
disease_score_table

In [None]:
disease_score = disease_score_table['stringdb::disease score'].astype('float')
node_suid = disease_score_table.index.values.astype(str)

In [None]:
disease_score

In [None]:
node_suid

#### Plot distribution and pick threshold
Now you can use Python like you normally would explore the data.

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize=(25.6,19.2))
plt.xticks(rotation=270)
plt.scatter(node_suid, disease_score)

In [None]:
disease_score.describe()

### Generate subnetworks
In order to reflect your exploration back onto the network, let’s generate subnetworks…

…from top quartile of ‘disease score’

In [None]:
top_quart = disease_score.quantile(q=0.75)

In [None]:
top_quart

In [None]:
top_nodes = disease_score[disease_score > top_quart].index.values.astype(str)

In [None]:
top_nodes.tolist()

In [None]:
p4c.create_subnetwork(top_nodes.tolist(), subnetwork_name='top disease quartile')
#returns a Cytoscape network SUID

In [None]:
p4c.notebook_export_show_image()

…of connected nodes only

In [None]:
p4c.create_subnetwork(edges='all',subnetwork_name='top disease quartile connected')  #handy way to exclude unconnected nodes!

In [None]:
p4c.notebook_export_show_image()

…from first neighbors of top disease score genes, using the network connectivity together with the data to direct discovery.

In [None]:
p4c.set_current_network(network="STRING network - ovarian cancer")

In [None]:
max(disease_score)

In [None]:
top_nodes = disease_score[disease_score==max(disease_score)].index.values.astype(str).tolist()

In [None]:
top_nodes

In [None]:
p4c.select_nodes(nodes=top_nodes)

In [None]:
p4c.select_first_neighbors()

In [None]:
p4c.create_subnetwork('selected', subnetwork_name='top disease neighbors') # selected nodes, all connecting edges (default)

In [None]:
p4c.notebook_export_show_image()

…from diffusion algorithm starting with top disease score genes, using the network connectivity in a more subtle way than just first-degree neighbors.

In [None]:
p4c.set_current_network(network="STRING network - ovarian cancer")

In [None]:
p4c.select_nodes(nodes=top_nodes)

In [None]:
p4c.commands_post('diffusion diffuse') # diffusion!

In [None]:
p4c.create_subnetwork('selected', subnetwork_name='top disease diffusion')

In [None]:
p4c.notebook_export_show_image()

In [None]:
p4c.layout_network('force-directed')

In [None]:
p4c.notebook_export_show_image()

Pro-tip: don’t forget to **p4c.set_current_network()** to the correct parent network before getting table column data and making selections.