This is a work-in-progress reproduction of the [Biological Network Exploration with Cytoscape 3](https://pubmed.ncbi.nlm.nih.gov/25199793/) Basic Protocol 1, which loads an s. cervesiae network, filters out unneeded nodes, lays out the resulting network, and then creates a dendogram display.

While much of it works, there are compromises, mainly due to Cytoscape features that aren't at full strength yet.

# Setup data files, py4cytoscape and Cytoscape connection
---
**NOTE: To run this notebook, you must manually start Cytoscape first -- don't proceed until you have started Cytoscape.**

---
## Setup: Fetch latest py4cytoscape




In [12]:
import py4cytoscape as p4c

---
## Setup: Sanity test to verify Cytoscape connection


By now, the connection to Cytoscape should be up and available. To verify this, try a simple operation that doesn't alter the state of Cytoscape.

In [23]:
p4c.cytoscape_version_info()


{'apiVersion': 'v1',
 'cytoscapeVersion': '3.9.1',
 'automationAPIVersion': '1.4.0',
 'py4cytoscapeVersion': '1.3.0'}

---
## Setup: Notebook data files

Create the 'output' directory, which will be used to store files uploaded from Cytoscape.

This is a good place to prepare any other system resources that might be needed by downstream Notebook cells.

Pro Tip: The "!" commands in this cell are passed to the host operating system. In this example, they're correct for a Windows host. Different commands would be appropriate for a Linux or Mac host.


In [24]:
!del /s/q/f output
!rmdir output
!mkdir output
!dir output
OUTPUT_DIR = 'output/'

 Volume in drive C has no label.
 Volume Serial Number is 50EF-8726

 Directory of C:\Users\CyDeveloper\PycharmProjects\py4cytoscape\tests\Notebooks\output

05/03/2022  05:59 PM    <DIR>          .
05/03/2022  05:59 PM    <DIR>          ..
               0 File(s)              0 bytes
               2 Dir(s)   2,308,538,368 bytes free


---
## Setup: Import source data files

The network and annotation files are in a Dropbox folder, and this cell downloads them into the current Cytoscape directory, which also holds this Jupyter Notebook.

The files could just as well have been on any cloud resource, including Google Drive, Github, Microsoft OneDrive or a private web site. Note that in this case, the network file was so large that it could not be saved on GitHub, so Dropbox was a handy alternative.

*An alternative would be to loading them into the Cytoscape workstation file system using the Notebook "!" commands (e.g., !wget). That's out of the scope of this tutorial, though there's an example in the Merging cell below.*

Generally, the py4cytoscape Sandboxing functions are not necessary for executing a Notebook on the local Cytoscape workstation. However, the sandbox_url_to() function is *very* useful for loading cloud resources. Sandboxing is out of the scope of this tutorial, but if you're interested or are running a workflow in the cloud, **you can see Sandboxing explained in https://py4cytoscape.readthedocs.io/en/latest/concepts.html#sandboxing**

In [25]:
p4c.sandbox_set(None) # Revert to default sandbox in case some other workflow selected a different one

res_mitab = p4c.sandbox_url_to("https://www.dropbox.com/s/8wc8o897tsxewt1/BIOGRID-ORGANISM-Saccharomyces_cerevisiae-3.2.105.mitab?dl=0", "BIOGRID-ORGANISM-Saccharomyces_cerevisiae-3.2.105.mitab")
print(f'Network file BIOGRID-ORGANISM-Saccharomyces_cerevisiae-3.2.105.mitab has {res_mitab["fileByteCount"]} bytes')

Network file BIOGRID-ORGANISM-Saccharomyces_cerevisiae-3.2.105.mitab has 166981992 bytes


# Load the s. cerevisiae MITAB network into Cytoscape

Note that the import_network_from_file function (incorrectly) throws an exception, so we explicitly ignore the exception.

**Note**: Once CYTOSCAPE-12772 is fixed, we can remove the try-block in this cell.

In [26]:
from requests import HTTPError
p4c.close_session(False)

try:
  p4c.import_network_from_file('BIOGRID-ORGANISM-Saccharomyces_cerevisiae-3.2.105.mitab')
except:  
  pass
if p4c.get_network_count() != 1:
  raise Exception('Failed to load network')
net_suid = p4c.get_network_suid()
print(f'Network identifier: {net_suid}')



Network identifier: 6948671


In commands_post(): {'status': 500, 'type': 'urn:cytoscape:ci:cyrest-core:v1:handle-json-command:errors:3', 'message': 'Task returned invalid json.', 'link': 'file:/C:/Users/CyDeveloper/CytoscapeConfiguration/3/framework-cytoscape.log'}


# Merge the gene expression data into the node table

For Cytoscape 3.9.0 and later, call Cytoscape to merge the gene expression data into the node attribute table. 

For pre-Cytoscape 3.9.0, do most of the work in Pandas and then import the dataframe into the node attribute table. Explicitly set the Gene ID as a string even though it's originally parsed as a number. To Cytoscape, the string will be comparable to the 'name' column already in the BIOGRID network. The Gene ID column in the dataframe is matched to the network's name column.

Pro Tip: The wget and mv commands work on a Jupyter system on a Linux host. You may have to choose different commands for a Windows host.

In [27]:
if p4c.check_supported_versions(cytoscape='3.9') is None:
  # Load file directly into Sandbox so Cytoscape can import it
  res_soft = p4c.sandbox_url_to("https://www.dropbox.com/s/r15azh0xb53smu1/GDS112_full.soft?dl=0", "GDS112_full.soft")
  print(f'Annotation file GDS112_full.soft has {res_soft["fileByteCount"]} bytes')

  res = p4c.load_table_data_from_file('GDS112_full.soft', start_load_row=83, data_key_column_index=10, delimiters='\t')
  print(f'Load result contains table identifiers: {res["mappedTables"]}')
else:
  # Load file into Notebook file system so Python can import it, tweak it, and download to Cytoscape
  !wget -q --no-check-certificate https://www.dropbox.com/s/r15azh0xb53smu1/GDS112_full.soft?dl=0
  !mv GDS112_full.soft?dl=0 GDS112_full.soft

  import pandas as df
  GDS112_full = df.read_csv('GDS112_full.soft', skiprows=82, sep='\t')
  GDS112_full.dropna(subset=['Gene ID'], inplace=True)
  GDS112_full['Gene ID'] = df.to_numeric(GDS112_full['Gene ID'], downcast='integer')
  GDS112_full = GDS112_full.astype({'Gene ID': 'string'})
  print(GDS112_full.dtypes)
  print(GDS112_full)
  p4c.load_table_data(GDS112_full, data_key_column='Gene ID')

  import os
  os.remove('GDS112_full.soft')


Annotation file GDS112_full.soft has 5536880 bytes
Load result contains table identifiers: [6948642, 6948680]


# Create a filter to remove nodes having no Gene Symbol

In [28]:
res = p4c.create_column_filter('SymbolOK', 'Gene symbol', '[A-Z0-9]*', 'REGEX')
print(f'Nodes selected: {len(res["nodes"])}')

No edges selected.
Nodes selected: 5512


# Create a subnetwork containing only named nodes

This could take several minutes.

At the end, you should see a view containing all nodes laid out. 

If you see only a single rectangle, it could be that your Cytoscape is set to operate with a small stack size. To increase the stack:

1. terminate Cytoscape

2. a) upgrade Cytoscape to 3.9.0 or later 

  ... or b) use a text editor to add -Xss5M to the cytoscape.vmoptions file in your Cytoscape program directory

3. restart Cytoscape

4. re-run this workflow

In [29]:
new_suid = p4c.create_subnetwork()
print(f'New network identifier: {new_suid}')

New network identifier: 8336569


# Get rid of the original network, which isn't needed anymore

In [30]:
p4c.delete_network(net_suid)
net_suid = new_suid

# Install clusterMaker2 if it hasn't already been installed

In [31]:
p4c.install_app('clusterMaker2')

{}


{}

# Create the hierarchical clustering and dendogram

This returns a large data structure that describes the dendogram.

It also creates a dendogram window that's designed for GUI manipulation. It's unclear this can be controlled or used by automation calls.

**Note:** Having the dendogram is important, and so is having the data that created it. When CSD-420 is addressed, it will be possible to snapshot the dendogram and perform other operations with it.

In [32]:
res = p4c.commands_post('cluster hierarchical showUI=true clusterAttributes=false nodeAttributeList="GSM1029,GSM1030,GSM1032,GSM1033,GSM1034"')
print(f'Dendogram tree: {res}')

Dendogram tree: [{'nodeOrder': [{'nodeName': '850532', 'suid': 6971169}, {'nodeName': '851822', 'suid': 6989613}, {'nodeName': '851616', 'suid': 7045791}, {'nodeName': '853875', 'suid': 6959724}, {'nodeName': '851314', 'suid': 6990915}, {'nodeName': '852893', 'suid': 6981396}, {'nodeName': '854009', 'suid': 7381164}, {'nodeName': '854225', 'suid': 6967122}, {'nodeName': '852744', 'suid': 6950514}, {'nodeName': '853466', 'suid': 7069725}, {'nodeName': '854007', 'suid': 7463997}, {'nodeName': '854242', 'suid': 7006152}, {'nodeName': '854740', 'suid': 6955050}, {'nodeName': '854117', 'suid': 6996294}, {'nodeName': '854129', 'suid': 7106313}, {'nodeName': '851379', 'suid': 6951824}, {'nodeName': '852236', 'suid': 6966129}, {'nodeName': '855818', 'suid': 7268970}, {'nodeName': '855787', 'suid': 6951051}, {'nodeName': '850618', 'suid': 6989328}, {'nodeName': '855796', 'suid': 7030461}, {'nodeName': '850743', 'suid': 7388757}, {'nodeName': '851449', 'suid': 7356519}, {'nodeName': '851577', 's

# Use BiNGO for enrichment analysis

The BiNGO app doesn't have automation entrypoints, so this analysis isn't possible right now. Is there a different app that can do this?

**NOTE:** We need CSD-421 fixed because we don't have any analysis right now, which is very important.