This is a reproduction of the [Biological Network Exploration with Cytoscape 3](https://pubmed.ncbi.nlm.nih.gov/25199793/) Basic Protocol 1, which loads an s. cervesiae network, filters out unneeded nodes, lays out the resulting network, creates clusters of similar nodes and then performs an enrichment calculation on one cluster.


# Setup data files, py4cytoscape and Cytoscape connection
---
**NOTE: To run this notebook, you must manually start Cytoscape first -- don't proceed until you have started Cytoscape.**

## Setup: Fetch latest py4cytoscape




In [None]:
!pip install py4cytoscape
import py4cytoscape as p4c

## Setup: Sanity test to verify Cytoscape connection


By now, the connection to Cytoscape should be up and available. To verify this, try a simple operation that doesn't alter the state of Cytoscape.

In [None]:
p4c.cytoscape_version_info()


## Setup: Notebook data files

Create the 'output' directory, which could be used to store files created by or needed by Cytoscape.

In this example, there are no such files.

Pro Tip: The "!" commands in this cell are passed to the host operating system. In this example, they're correct for a Windows host. Different commands would be appropriate for a Linux or Mac host.


In [None]:
!del /s/q/f output
!rmdir output
!mkdir output
!dir output
OUTPUT_DIR = 'output/'

---
## Setup: Import source data files

The network and annotation files are in a Dropbox folder, and this cell downloads them into the current Cytoscape directory, which also holds this Jupyter Notebook.

The files could just as well have been on any cloud resource, including Google Drive, Github, Microsoft OneDrive or a private web site. Note that in this case, the network file was so large that it could not be saved on GitHub, so Dropbox was a handy alternative.

*An alternative would be to loading them into the Cytoscape workstation file system using the Notebook "!" commands (e.g., !wget). That's out of the scope of this tutorial, though there's an example in the Merging cell below.*

---
Note that this cell uses the import_file_from_url() function to load resources from cloud storage.
This function is appropriate for Notebooks running on the same workstation as Cytoscape, but
not for Notebooks running on a remote server. On a remote server, the Notebook's file system is not the same as the Cytoscape workstation's file system, so Sandbox functions (https://py4cytoscape.readthedocs.io/en/latest/concepts.html#sandboxing) should be used instead.

In [None]:
res_mitab = p4c.import_file_from_url("https://www.dropbox.com/s/8wc8o897tsxewt1/BIOGRID-ORGANISM-Saccharomyces_cerevisiae-3.2.105.mitab?dl=0", "BIOGRID-ORGANISM-Saccharomyces_cerevisiae-3.2.105.mitab")
print(f'Network file BIOGRID-ORGANISM-Saccharomyces_cerevisiae-3.2.105.mitab has {res_mitab["fileByteCount"]} bytes')

---
# Load the Protein-protein Interaction Network into Cytoscape

The network is contained in the s. cerevisiae MITAB file.

Note that in this cell, the import_network_from_file function (incorrectly) throws an exception. To ignore the exception, we enclose it in a try/except block. 

**Note:** Once the CYTOSCAPE-12772 issue is solved, we can remove the try/except block.

In [None]:
from requests import HTTPError
p4c.close_session(False)

try:
  p4c.import_network_from_file('BIOGRID-ORGANISM-Saccharomyces_cerevisiae-3.2.105.mitab')
except:  
  pass
if p4c.get_network_count() != 1:
  raise Exception('Failed to load network')
net_suid = p4c.get_network_suid()
print(f'Network identifier: {net_suid}')



---
# Import the gene expression data

The expression data is downloaded and merged into the network's node attribute table.

---
*Tip:* This cell shows how to create code that works around changes in Cytoscape capabilities. 

In this case, starting with Cytoscape 3.9.0, the load_table_data_from_file() function works as expected, so the gene expression data is merged into the node attribute table. 

Prior to 3.9.0, load_table_data_from_file() didn't work. As a workaround, we do most of the work in Pandas and then import the dataframe into the node attribute table. After Pandas reads the CSV, we will try to match dataframe Gene ID column to the *name* column in the Cytoscape node attribute table. To do this, we must explicitly set the Gene ID as a string (even though it's originally parsed as a number) because Cytoscape's *name* column is already a string. 

*Pro Tip:* The wget and mv commands work on a Jupyter system on a Linux host. You may have to choose different commands for a Windows host. A Windows wget command can be found [here](https://eternallybored.org/misc/wget/).

In [None]:
if p4c.check_supported_versions(cytoscape='3.9') is None:
  # Load file directly into Sandbox so Cytoscape can import it
  res_soft = p4c.import_file_from_url("https://www.dropbox.com/s/r15azh0xb53smu1/GDS112_full.soft?dl=0", "GDS112_full.soft")
  print(f'Annotation file GDS112_full.soft has {res_soft["fileByteCount"]} bytes')

  res = p4c.load_table_data_from_file('GDS112_full.soft', start_load_row=83, data_key_column_index=10, delimiters='\t')
  print(f'Load result contains table identifiers: {res["mappedTables"]}')
else:
  # Load file into Notebook file system so Python can import it, tweak it, and download to Cytoscape
  !wget -q --no-check-certificate https://www.dropbox.com/s/r15azh0xb53smu1/GDS112_full.soft?dl=0
  !mv GDS112_full.soft?dl=0 GDS112_full.soft

  import pandas as df
  GDS112_full = df.read_csv('GDS112_full.soft', skiprows=82, sep='\t')
  GDS112_full.dropna(subset=['Gene ID'], inplace=True)
  GDS112_full['Gene ID'] = df.to_numeric(GDS112_full['Gene ID'], downcast='integer')
  GDS112_full = GDS112_full.astype({'Gene ID': 'string'})
  print(GDS112_full.dtypes)
  print(GDS112_full)
  p4c.load_table_data(GDS112_full, data_key_column='Gene ID')

  import os
  os.remove('GDS112_full.soft')


---
# Filter the Network with the Genes that have Expression Data

For this, we assume that if a node has no *Gene symbol*, it also has no expression data. 

The filter compares each node's *Gene symbol* attribute to a regular expression. If there is a match, the gene is selected; for no match, the gene isn't selected.

In [None]:
res = p4c.create_column_filter('SymbolOK', 'Gene symbol', '[A-Z0-9]*', 'REGEX')
print(f'Nodes selected: {len(res["nodes"])}')

---
# Create a New Network with the Selected Subset

Create a subnetwork containing only nodes selected by the filter (i.e., having a *Gene symbol* value, which implies that expression data is present for that node).

This could take several minutes.

At the end, you should see a view containing all nodes laid out. 

If you see only a single rectangle, it could be that your Cytoscape is set to operate with a small stack size. To increase the stack:

1. terminate Cytoscape

2. a) upgrade Cytoscape to 3.9.0 or later 

  ... or b) use a text editor to add -Xss5M to the cytoscape.vmoptions file in your Cytoscape program directory

3. restart Cytoscape

4. re-run this workflow

In [None]:
new_suid = p4c.create_subnetwork()
print(f'New network identifier: {new_suid}')

## Get rid of the original network, which isn't needed anymore

In [None]:
p4c.delete_network(net_suid)
net_suid = new_suid

---
# Identify Network Modules

The overall strategy is to find clusters of nodes that share some common attribute. In this case, we use expression data values. Specifically:

* Load Cytoscape's clusterMaker2 app
* Use clusterMaker2 to create a dendogram showing a hierarchy of similar network modules


## Install clusterMaker2 if it hasn't already been installed

In [None]:
p4c.install_app('clusterMaker2')

## Identify network modules

Create a hierarchic clustering of similar nodes based on the expression data columns. Cytoscape renders the hierarchy as a dendogram.

*Tip:* Cytoscape's dendogram window can be used to manually explore module similarity. 

In [None]:
dendo_clustering = p4c.commands_post('cluster hierarchical showUI=true clusterAttributes=false nodeAttributeList="GSM1029,GSM1030,GSM1032,GSM1033,GSM1034"')

# dendo_clustering is a dictionary [{nodeOrder: [{nodeName: xxx, suid: sss}, ...]}
#                                   {nodeTree: [{name: ggg, left: lll, right: rrr}]}]
# where nodeOrder is a mapping between a leaf node name xxx and the suid sss of a network node,
# and nodeTree is a tree where the left node lll and right node rrr can be leaf nodes xxx or
# internal nodes ggg. 

---
# Perform an enrichment analysis using the gprofiler package

Use a package commonly available in PyPI to calculate functional enrichment for nodes similar to a
node in which we may be interested. In this case, we choose HBT1 (entrez-gene ID 851303).

1. Find HBT1 in the similarity tree contained in the dendogram

1. Find a set of nodes similar to HBT1 by collecting nodes nearby in the tree

1. Use each node's SUID to look up its entrez-gene ID

1. Pass the set of entrez-gene IDs to gprofiler as an enrichment query


## Find HBD1 in the similarity tree

In [None]:
node_order = dendo_clustering[0]['nodeOrder']
node_tree = dendo_clustering[0]['nodeTree']
node_suid = p4c.node_name_to_node_suid('851303')[0] # Use entrez-gene ID to get SUID for HBT1


## Collect set of SUIDs representing 85 similar nodes

In [None]:
import parse_dendogram as pde # Use custom functions to decode dendogram tree

node_bag = pde.create_node_bag(node_order, node_tree)
similar_nodes = list(pde.find_node_set(node_suid, 85, node_order, node_bag))

## Using SUIDs, query Cytoscape for each node's entrez-gene ID

In [None]:
suid_to_entrez_gene = p4c.get_table_columns(columns='name')['name']
entrez_gene_query = [int(suid_to_entrez_gene[suid])  for suid in similar_nodes]

print(entrez_gene_query)


## Install gprofiler package if it's not already installed

In [None]:
!pip install gprofiler-official
from gprofiler import GProfiler

## Use entrez-gene IDs to query gprofiler for GO functional enrichment

In [None]:
gp = GProfiler(user_agent='py4cytoscape', return_dataframe=True)
gp.profile(organism='scerevisiae', query=entrez_gene_query)

## Install gprofiler package if it's not already installed

In [91]:
!pip install gprofiler-official
from gprofiler import GProfiler



You should consider upgrading via the 'C:\Users\CyDeveloper\PycharmProjects\py4cytoscape\venv\Scripts\python.exe -m pip install --upgrade pip' command.


## Use entrez-gene IDs to query gprofiler for GO functional enrichment

In [92]:
gp = GProfiler(user_agent='py4cytoscape', return_dataframe=True)
gp.profile(organism='scerevisiae', query=entrez_gene_query)

Unnamed: 0,source,native,name,p_value,significant,description,term_size,query_size,intersection_size,effective_domain_size,precision,recall,query,parents
0,GO:CC,GO:0005838,proteasome regulatory particle,4.1461999999999996e-20,True,"""A multisubunit complex, which caps one or bot...",22,83,14,6569,0.168675,0.636364,query_1,"[GO:0022624, GO:0032991]"
1,GO:CC,GO:0022624,proteasome accessory complex,4.1461999999999996e-20,True,"""A protein complex, that caps one or both ends...",22,83,14,6569,0.168675,0.636364,query_1,"[GO:0000502, GO:0005622, GO:0032991]"
2,GO:CC,GO:0000502,proteasome complex,4.835471e-14,True,"""A large multisubunit complex which catalyzes ...",48,83,14,6569,0.168675,0.291667,query_1,"[GO:0140535, GO:1905369]"
3,KEGG,KEGG:03050,Proteasome,5.645811e-14,True,Proteasome,36,36,13,2085,0.361111,0.361111,query_1,[KEGG:00000]
4,GO:CC,GO:1905369,endopeptidase complex,2.293605e-13,True,"""A protein complex which is capable of endopep...",53,83,14,6569,0.168675,0.264151,query_1,[GO:1905368]
5,WP,WP:WP158,Proteasome degradation,1.030702e-10,True,Proteasome degradation,36,23,12,834,0.521739,0.333333,query_1,[WP:000000]
6,GO:CC,GO:1905368,peptidase complex,1.573096e-10,True,"""A protein complex which is capable of peptida...",82,83,14,6569,0.168675,0.170732,query_1,[GO:1902494]
7,GO:CC,GO:0008541,"proteasome regulatory particle, lid subcomplex",7.740747e-10,True,"""The subcomplex of the proteasome regulatory p...",10,83,7,6569,0.084337,0.7,query_1,"[GO:0005838, GO:0032991]"
8,GO:CC,GO:0034515,proteasome storage granule,2.312965e-09,True,"""An aggregation of proteasome core protease (C...",26,83,9,6569,0.108434,0.346154,query_1,"[GO:0005737, GO:0043232]"
9,GO:BP,GO:0043248,proteasome assembly,1.188348e-07,True,"""The aggregation, arrangement and bonding toge...",34,81,9,6548,0.111111,0.264706,query_1,[GO:0065003]
