<a href="https://colab.research.google.com/github/bdemchak/cytoscape-jupyter/blob/main/gangsu/basic%20protocol%201.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a work-in-progress reproduction of the [Biological Network Exploration with Cytoscape 3](https://pubmed.ncbi.nlm.nih.gov/25199793/) Basic Protocol 1, which loads an s. cervesiae network, filters out unneeded nodes, lays out the resulting network, and then creates a dendogram display.

While much of it works, there are compromises, mainly due to Cytoscape features that aren't at full strength yet.

---
#Setup data files, py4cytoscape and Cytoscape connection
---
**NOTE: To run this notebook, you must manually start Cytoscape first -- don't proceed until you have started Cytoscape.**

---
##Setup: Notebook data files

Create the 'output' directory, which will be used to store files uploaded from Cytoscape.

This is a good place to prepare any other system resources that might be needed by downstream Notebook cells.





In [1]:
!rm -r output/
!ls -l 
OUTPUT_DIR = 'output/'

rm: cannot remove 'output/': No such file or directory
total 4
drwxr-xr-x 1 root root 4096 Feb 24 17:49 sample_data


---
##Setup: Fetch latest py4cytoscape




**Note that you can fetch from a specific github branch by adding "@<branch>" to the "py4cytocape" at the end of the github URL.**

For example, to get branch 0.0.5: git+https://github.com/cytoscape/py4cytoscape@0.0.5

In [2]:
!pip uninstall -y py4cytoscape

#!pip install py4cytoscape
!pip install git+https://github.com/cytoscape/py4cytoscape@0.0.8
#!pip install git+https://github.com/cytoscape/py4cytoscape

import py4cytoscape as p4c

Collecting git+https://github.com/cytoscape/py4cytoscape@0.0.8
  Cloning https://github.com/cytoscape/py4cytoscape (to revision 0.0.8) to /tmp/pip-req-build-5dn3lxx_
  Running command git clone -q https://github.com/cytoscape/py4cytoscape /tmp/pip-req-build-5dn3lxx_
  Running command git checkout -b 0.0.8 --track origin/0.0.8
  Switched to a new branch '0.0.8'
  Branch '0.0.8' set up to track remote branch '0.0.8' from 'origin'.
Collecting python-igraph
[?25l  Downloading https://files.pythonhosted.org/packages/ed/dd/debbb217cc6e128a7f9e788db36fcdce2c313611bb98cae1a73b65fab8a6/python_igraph-0.9.0-cp37-cp37m-manylinux2010_x86_64.whl (3.1MB)
[K     |████████████████████████████████| 3.1MB 4.9MB/s 
Collecting texttable>=1.6.2
  Downloading https://files.pythonhosted.org/packages/06/f5/46201c428aebe0eecfa83df66bf3e6caa29659dbac5a56ddfd83cae0d4a4/texttable-1.6.3-py2.py3-none-any.whl
Building wheels for collected packages: py4cytoscape
  Building wheel for py4cytoscape (setup.py) ... [?25

---
##Setup: Set up Cytoscape connection


Set up a "browser client", which is the glue that allows the server-based Notebook to communicate with Cytoscape.

*Note that the IPython.display.Javascript() call must be the last line in this cell.*

In [3]:
import IPython

print(f'Loading Javascript client ... {p4c.get_browser_client_channel()} on {p4c.get_jupyter_bridge_url()}')
browser_client_js = p4c.get_browser_client_js(True)
IPython.display.Javascript(browser_client_js) # Start browser client


Loading Javascript client ... cfe25c91-1fae-495c-b46a-39648aca86ad on https://jupyter-bridge.cytoscape.org


<IPython.core.display.Javascript object>

---
#Sanity test to verify Cytoscape connection


By now, the connection to Cytoscape should be up and available. To verify this, try a simple operation that doesn't alter the state of Cytoscape.

In [4]:
p4c.cytoscape_version_info()


{'apiVersion': 'v1',
 'automationAPIVersion': '1.0.0',
 'cytoscapeVersion': '3.8.2',
 'jupyterBridgeVersion': '0.0.2',
 'py4cytoscapeVersion': '0.0.8'}

---
##Setup: Import source data files

The network and annotation files are in a Dropbox folder, and this cell downloads them into the default Sandbox from where Cytoscape will access them.

The files could just as well have been on any cloud resource, including Google Drive, Github, Microsoft OneDrive or a private web site. Note that in this case, the network file was so large that it could not be saved on GitHub, so Dropbox was a handy alternative.

*An alternative would be to load the files into this Notebook file system (or create them there) and then download those files to the Sandbox. Loading them into the Notebook file system would require the use of Notebook "!" commands (e.g., !wget).*

In this cell, we show how the network file can be directly downloaded from Dropbox. The cell that demonstrates gene expression merging (below) shows the use of the "!" command.

**Sandboxing is explained in https://py4cytoscape.readthedocs.io/en/latest/concepts.html#sandboxing**

In [5]:
p4c.sandbox_set(None) # Revert to default sandbox in case some other workflow selected a different one

res_mitab = p4c.sandbox_url_to("https://www.dropbox.com/s/8wc8o897tsxewt1/BIOGRID-ORGANISM-Saccharomyces_cerevisiae-3.2.105.mitab?dl=0", "BIOGRID-ORGANISM-Saccharomyces_cerevisiae-3.2.105.mitab")
print(f'Network file BIOGRID-ORGANISM-Saccharomyces_cerevisiae-3.2.105.mitab has {res_mitab["fileByteCount"]} bytes')

Network file BIOGRID-ORGANISM-Saccharomyces_cerevisiae-3.2.105.mitab has 166981992 bytes


# Load the s. cerevisiae MITAB network into Cytoscape

Note that the import_network_from_file function (incorrectly) throws an exception, so we explicitly ignore the exception.

**Note**: Once CYTOSCAPE-12772 is fixed, we can remove the try-block in this cell.

In [6]:
from requests import HTTPError
p4c.close_session(False)

try:
  p4c.import_network_from_file('BIOGRID-ORGANISM-Saccharomyces_cerevisiae-3.2.105.mitab')
except:  
  pass
if p4c.get_network_count() != 1:
  raise Exception('Failed to load network')
net_suid = p4c.get_network_suid()
net_suid



In commands_post(): {'status': 500, 'type': 'urn:cytoscape:ci:cyrest-core:v1:handle-json-command:errors:3', 'message': 'Task returned invalid json.', 'link': 'file:/C:/Users/CyDeveloper/CytoscapeConfiguration/3/framework-cytoscape.log'}


1364923

# Merge the gene expression data into the node table

For Cytoscape 3.9.0 and later, call Cytoscape to merge the gene expression data into the node attribute table. 

For pre-Cytoscape 3.9.0, do most of the work in Pandas and then import the dataframe into the node attribute table. Explicitly set the Gene ID as a string even though it's originally parsed as a number. To Cytoscape, the string will be compatible the 'name' column already in the BIOGRID network. The Gene ID column in the dataframe is matched to the network's name column.



**Note:** ... add a table import function in Cytoscape Automation so the sandbox can be used.


In [7]:
if p4c.check_supported_versions(cytoscape='3.9') is None:
  # Load file directly into Sandbox so Cytoscape can import it
  res_soft = p4c.sandbox_url_to("https://www.dropbox.com/s/r15azh0xb53smu1/GDS112_full.soft?dl=0", "GDS112_full.soft")
  print(f'Annotation file GDS112_full.soft has {res_soft["fileByteCount"]} bytes')

  # soft_file_path = p4c.sandbox_get_file_info('GDS112_full.soft')['filePath']
  soft_file_path = res_soft["filePath"]
  res = p4c.commands_post(f'table import file startLoadRow="83" keyColumnIndex="10" file="{soft_file_path}"')
  print(res)
else:
  # Load file into Notebook file system so Python can import it, tweak it, and download to Cytoscape
  !wget -q --no-check-certificate https://www.dropbox.com/s/r15azh0xb53smu1/GDS112_full.soft?dl=0
  !mv GDS112_full.soft?dl=0 GDS112_full.soft

  import pandas as df
  GDS112_full = df.read_csv('GDS112_full.soft', skiprows=82, sep='\t')
  GDS112_full.dropna(subset=['Gene ID'], inplace=True)
  GDS112_full['Gene ID'] = df.to_numeric(GDS112_full['Gene ID'], downcast='integer')
  GDS112_full = GDS112_full.astype({'Gene ID': 'string'})
  print(GDS112_full.dtypes)
  print(GDS112_full)
  p4c.load_table_data(GDS112_full, data_key_column='Gene ID')

  import os
  os.remove('GDS112_full.soft')


ID_REF                    object
IDENTIFIER                object
GSM1029                  float64
GSM1030                  float64
GSM1032                  float64
GSM1033                  float64
GSM1034                  float64
Gene title                object
Gene symbol               object
Gene ID                   string
UniGene title            float64
UniGene symbol           float64
UniGene ID               float64
Nucleotide Title         float64
GI                       float64
GenBank Accession        float64
Platform_CLONEID         float64
Platform_ORF              object
Platform_SPOTID           object
Chromosome location      float64
Chromosome annotation     object
GO:Function               object
GO:Process                object
GO:Component              object
GO:Function ID            object
GO:Process ID             object
GO:Component ID           object
dtype: object
     ID_REF  ...                                    GO:Component ID
24       25  ...  GO:000573

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_subset[col] = col_val


# Create a filter to remove nodes having no Gene Symbol

In [8]:
# This cell is troubleshooting for the filter concurrency problem. It's not part of this workflow.

#     AUTO_APPLY_THRESHOLD = 100000
#     cmd = 'commands/filter/create'
#     cmd_body = {"name":"SymbolOK","json":"{\"id\": \"ColumnFilter\", \"parameters\": {\"criterion\": \"[A-Z0-9]*\", \"columnName\": \"Gene symbol\", \"predicate\": \"REGEX\", \"caseSensitive\": false, \"anyMatch\": true, \"type\": \"nodes\"}}","apply":"True"}
#     base_url = 'http://127.0.0.1:1234/v1'
#     network = "current"
#     # Before Cytoscape 3.9, the filter was automatically applied when it was created unless
#     # the total of nodes and edges was 100,000 or more. So, we create the filter and then
#     # apply it if it wasn't automatically applied already.
#     res = p4c.cyrest_post(cmd, body=cmd_body, base_url=base_url)
#     if p4c.get_node_count(network=network, base_url=base_url) \
#         + p4c.get_edge_count(network=network, base_url=base_url) > AUTO_APPLY_THRESHOLD:
#         print('Warning -- Cytoscape version pre-3.9 in use ... explicitly applying filter')
# #        res = p4c.commands_post(
# #            f'filter apply container="filter" name="{cmd_body["name"]}" network="{network}"',
#             base_url=base_url)
#     print("done")

In [9]:
# p4c.create_column_filter('SymbolOK', 'Gene symbol', '[A-Z0-9]*', 'REGEX')
input( "There is a concurrency problem in Filters ... \nplease create and execute this filter by hand:\n SymbolOK filter where Gene Symbol matches-regex [A-Z0-9]*\n then hit [Enter] ")


There is a concurrency problem in Filters ... 
please create and execute this filter by hand:
 SymbolOK filter where Gene Symbol regex-contains [A-Z0-9]*
 then hit [Enter] 


''

# Create a subnetwork containing only named nodes

This could take several minutes

In [10]:
new_suid = p4c.create_subnetwork()
new_suid

1710449

# Get rid of the original network, which isn't needed anymore

In [11]:
p4c.delete_network(net_suid)
net_suid = new_suid

# Layout the subnetwork in case it wasn't already

**Note:** Why does this have to be run twice?? The first time seems to drop a lot of lines in the log, but doesn't lay out the network.

Actually, does it even need to be run at all? The CreateNetwork seems to do the job.

In [13]:
p4c.layout_network('force-directed')


{}

# Install clusterMaker2 if it hasn't already been installed

In [14]:
p4c.install_app('clusterMaker2')

{}


{}

# Create the hierarchical clustering and dendogram

This returns a large data structure that describes the dendogram.

It also creates a dendogram window that's designed for GUI manipulation. It's unclear this can be controlled or used by automation calls.

**Note:** Having the dendogram is important, and so is having the data that created it. When CSD-420 is addressed, it will be possible to snapshot the dendogram and perform other operations with it.

In [15]:
p4c.commands_post('cluster hierarchical showUI=true clusterAttributes=false nodeAttributeList="GSM1029,GSM1030,GSM1032,GSM1033,GSM1034"')

[{'nodeOrder': [{'nodeName': '850455', 'suid': 1366820},
   {'nodeName': '850599', 'suid': 1408916},
   {'nodeName': '851320', 'suid': 1384803},
   {'nodeName': '851440', 'suid': 1374535},
   {'nodeName': '853982', 'suid': 1411028},
   {'nodeName': '854120', 'suid': 1378747},
   {'nodeName': '852485', 'suid': 1374429},
   {'nodeName': '854733', 'suid': 1368437},
   {'nodeName': '854245', 'suid': 1369330},
   {'nodeName': '853283', 'suid': 1372107},
   {'nodeName': '854099', 'suid': 1408522},
   {'nodeName': '854134', 'suid': 1483036},
   {'nodeName': '854878', 'suid': 1365588},
   {'nodeName': '856364', 'suid': 1379054},
   {'nodeName': '853997', 'suid': 1367995},
   {'nodeName': '854932', 'suid': 1407664},
   {'nodeName': '850318', 'suid': 1371762},
   {'nodeName': '852552', 'suid': 1379052},
   {'nodeName': '854212', 'suid': 1378533},
   {'nodeName': '854214', 'suid': 1408239},
   {'nodeName': '851321', 'suid': 1377938},
   {'nodeName': '850627', 'suid': 1379480},
   {'nodeName': '85

#Use BiNGO for enrichment analysis

The BiNGO app doesn't have automation entrypoints, so this analysis isn't possible right now. Is there a different app that can do this?

**NOTE:** We need CSD-421 fixed because we don't have any analysis right now, which is very important.