**This is an tutorial of visualizing hierarchical protein network modules, with a script interfacing the DDOT python package (v1.0.1) and the HiView web browser (v2.6).**

**Author: Fan Zheng**

**Date: Aug. 2020**

# Get started

Please check the DDOT package has been installed and all dependencies are satisfied. To complete this tutorial, you just need the upload script `tohiview.py`, and a few input files of that script. We will walk over the creation of hierarchical models and their visualization in HiView.   

In [20]:
username = 'fzheng' # replace with your username

In [16]:
import getpass
passwd = getpass.getpass("Password here: ")

Passwd here:  ········


The available options for the upload script are listed below. Many options are available, but only `--ont`, `--hier_name`, `--ndex_acount` are required.

`--ont` should be a 3-column tab-separated file defined in DDOT, which represents parent, child and type of the relationship.
`--hier_name` is just a string to label the files.   
`--ndex_acount` contains 3 strings, the server name (http://test.ndexbio.org), a username, and a password.

**Note:** so far we require using the NDEx test server, as this pipeline can potentially create a large number of networks in one's NDEx account.

In [33]:
%%bash 

python ../../ddot/tohiview.py -h

usage: tohiview.py [-h] --ont ONT --hier_name HIER_NAME
                   [--ndex_account NDEX_ACCOUNT NDEX_ACCOUNT NDEX_ACCOUNT]
                   [--score SCORE] [--subnet_size SUBNET_SIZE SUBNET_SIZE]
                   [--node_attr NODE_ATTR] [--evinet_links EVINET_LINKS]
                   [--evinet_size EVINET_SIZE] [--gene_attr GENE_ATTR]
                   [--term_2_uuid TERM_2_UUID]
                   [--visible_cols [VISIBLE_COLS [VISIBLE_COLS ...]]]
                   [--max_num_edges MAX_NUM_EDGES] [--col_color COL_COLOR]
                   [--col_label COL_LABEL] [--rename RENAME] [--skip_main]

optional arguments:
  -h, --help            show this help message and exit
  --ont ONT             ontology file, 3 col table
  --hier_name HIER_NAME
                        name of the hierarchy
  --ndex_account NDEX_ACCOUNT NDEX_ACCOUNT NDEX_ACCOUNT
  --score SCORE         integrated edge score
  --subnet_size SUBNET_SIZE SUBNET_SIZE
                        minimum and maximum

# 1. A simple hierarchy

We will first create and upload a decoy hierarchy.

In [69]:
d = './data'
df = pd.read_csv(d + '/test1.ont', sep='\t', header=None)
df

Unnamed: 0,0,1,2
0,ROOT,Coarse-1,default
1,ROOT,Coarse-2,default
2,Coarse-1,Fine-1,default
3,Coarse-1,Fine-2,default
4,Coarse-1,Fine-3,default
5,Coarse-2,Fine-3,default
6,Coarse-2,Fine-4,default
7,Fine-1,geneA,gene
8,Fine-1,geneB,gene
9,Coarse-1,geneC,gene


Note this hierarchy is a DAG (directed acyclic graph). The node "Fine-3" has two parents: "Coarse-1" and "Coarse-2". In HiView, a circle "Fine-3" will be found nested under the circles of both "Coarse-1" and "Coarse-2".

**Warning**: we require node names not containing "." and "_".  

In [71]:
%%bash -s "$username" "$passwd"

python ../../ddot/tohiview.py --ont ./data/test1.ont --hier_name test1 --ndex_account http://test.ndexbio.org $1 $2


http://hiview.ucsd.edu/17ea0c7b-d763-11ea-9101-0660b7976219?type=test&server=http://test.ndexbio.org


Paste the above link to the browser to launch HiView to visualize this hierarchy (it will be a different UUID each time).

<img src="fig/fig1.png" width="600" />

In HiView, a hierarchy is represented by the "circle-packing" layout. The biggest circle represents the root (a node in the DAG with only outgoing edges); thus we require the input data to contain exactly one root. If there are multiple roots, the script will add a root on top of them. Double-clicking a circle expands deeper structures, one level at a time.

# 2. Adding integrated networks to communities

HiView is a powerful platform to display nested communities (of multiple scales) in a network. It is often of interest to visualize edges in the source network that support a community.   
Precisely, for a source network $G = (V, E)$, a subnetwork of a community $s$ is defined as $G_s = (V_s, E_s)$, where $V_s \in V, E_s \in E$, and $\forall e = (u,v) \in E_s$, $u,v \in V_s$.  


This is achieved by the `--score` argument. It is a tab-separated file with three columns: `geneA`,`geneB`, and `score`. We recommend having the values of scores between (0,1). 

In this example, we will use a small sub-hierarchy with some gene-gene association scores. Let's see their format:

In [42]:
df_ont = pd.read_csv(d + '/test2.ont', sep='\t', header=None)
df_ont.head(3)

Unnamed: 0,0,1,2
0,22133,21875,default
1,22435,22133,default
2,22451,21851,default


In [43]:
df_ont.tail(3)

Unnamed: 0,0,1,2
37,23161,SUPT5H,gene
38,23248,CSNK2A2,gene
39,23248,HIST1H3A,gene


In [44]:
df_score = pd.read_csv(d + '/test2_score.txt', sep='\t', header=None)
df_score.head(3)

Unnamed: 0,0,1,2
0,CDC73,CTR9,0.798
1,CDC73,DNMT3A,0.521
2,CDC73,HIST1H3A,0.47


Now do the upload:

In [72]:
%%bash -s "$username" "$passwd"

python ../../ddot/tohiview.py --ont ./data/test2.ont --hier_name test2 --ndex_account http://test.ndexbio.org $1 $2 --score ./data/test2_score.txt


http://hiview.ucsd.edu/8ce3e160-d763-11ea-9101-0660b7976219?type=test&server=http://test.ndexbio.org


The communities in this hierarchy (shown in the "model view") are now each associated with a network shown in the "data view".

<img src="fig/fig2.png" width="600" />

## "score" of a community.   

This is a concept specific to certain community detection algorithms, e.g. CliXO, which takes a weighted graph as the input, and iterate community detection at different thresholds. Thus, each community in CliXO is associated with a "score".

By default, edges in a subnetwork have a uniform color in HiView. However, if communities are associated with scores, the edges will be shown with a discrete color map (which often visually highlights the community structures), determined by the score of the community itself, and the score(s) of its children community(ies). This can be achieved by adding a 4-th column to the file for the `--ont` argument, as in the following example:

In [54]:
df_ont = pd.read_csv(d + '/test2_ww.ont', sep='\t', header=None)
df_ont.head(3)

Unnamed: 0,0,1,2,3
0,S22133,S21875,default,0.72
1,S22435,S22133,default,0.65
2,S22451,S21851,default,0.58


The values in the column "3" (e.g. 0.72, 0.65) indicate the "score" of the community in the column "0". The score of a parent community is required to be smaller than the scores of its children. In this example, "S22435" is the parent of "S22133", and thus 0.65 < 0.72. 

In [73]:
%%bash -s "$username" "$passwd"

python ../../ddot/tohiview.py --ont ./data/test2_ww.ont --hier_name test2.1 --ndex_account http://test.ndexbio.org $1 $2 --score ./data/test2_score.txt


http://hiview.ucsd.edu/45b03af5-d764-11ea-9101-0660b7976219?type=test&server=http://test.ndexbio.org


After upload, we can see the change of edge colors in the data view.

<img src="fig/fig3.png" width="600" />

# 3. Adding multiple evidence networks to communities

In addition to a single master network, it is also possible to overlay more networks supporting a community and visualize them HiView. For example, if the master network is the result of integrating multiple datasets, it is often of interest to visualize the interactions in these datasets (jointly or separately).

This can be achieved by passing a file to the `--evinet_links` argument. It is a two-column file, providing the name of individual datasets, and the path to the actual files containing the interactions:

In [60]:
%%bash

cat ./data/net_links.txt

Physical	./data/test3_ppisample.txt
Co_protein_expr	./data/test3_coxsample.txt
CCMI	./data/test3_binarysample.txt


A source file is a 3-column tab-separated file, which can contain binary interactions, or interactions with weights.

In [63]:
%%bash

cat ./data/test3_ppisample.txt |head -5

CTR9	LEO1	5.05
SSRP1	SUPT16H	4.42
LEO1	PAF1	5.35
CTR9	PAF1	5.11
CSNK2A1	CSNK2B	5.13


In [64]:
%%bash

cat ./data/test3_binarysample.txt |head -5

MTDH	SUPT16H	True
SUPT16H	TSPYL5	True
MTDH	SSRP1	True
SSRP1	TSPYL5	True


Now we do the upload:

In [74]:
%%bash -s "$username" "$passwd"

python ../../ddot/tohiview.py --ont ./data/test2_ww.ont --hier_name test3 --ndex_account http://test.ndexbio.org $1 $2 --score ./data/test2_score.txt --evinet_links ./data/net_links.txt


http://hiview.ucsd.edu/0f7f704a-d765-11ea-9101-0660b7976219?type=test&server=http://test.ndexbio.org


<img src="fig/fig4.png" width="600" />

For the evidence network with real values, users can toggle the threshold to adjust the number of edges from this particular network to be shown in the data view.

## Large networks

Large scale networks are often bottlenecks of the speed of uploading and HiView visualization (we are working on improving that). To reduce overhead, subnetwork uploading can be disabled for large communities, while still being enabled for smaller communities. 

It is achieved by the `--subnet_size` argument, which takes two integers, specifying the lower and upper bound of community sizes for which upload of the integrated subnetworks is enabled.

Similarly `--evinet_size` takes one integer, and for communities larger than this threshold, upload of evidence networks will be disabled.

We require `subnet_size[0] < evinet_size <= subnet_size[1]`.

## Reuse uploaded subnetworks

After uploading a hierarchy with subnetworks, you will notice a file starting with `term_2_uuid` written to the working directory. This file describes the mapping between community names and community subnetworks. 

This file can also be later used as the input of `--term_2_uuid` argument, so subnetworks can be shared across different hierarchical models.

# 4. Control the information displayed in HiView

## update the metadata of communities

Users can show some metadata associated with each community in the bottom-right area of HiView ("Subsystem details"). This is achieved by the `--node_attr` argument. The input is a data frame with rows being communities and columns being the names of those metadata.

For example, assuming we can assign a robustness score for each community. We have created a file with some made-up values:

In [81]:
%%bash

cat ./data/test4_nodeattr.txt

	robustness
S21851	0.462515
S21875	0.781374
S22133	0.686939
S22435	0.848153
S22451	0.471456
S22573	0.247138
S22871	0.201151
S23161	0.55682
S23248	0.277122


Now upload:

In [79]:
%%bash -s "$username" "$passwd"

python ../../ddot/tohiview.py --ont ./data/test2_ww.ont --hier_name test4 --ndex_account http://test.ndexbio.org $1 $2 --score ./data/test2_score.txt --term_2_uuid term_2_uuid.test3 --node_attr ./data/test4_nodeattr.txt


http://hiview.ucsd.edu/71ac490b-d76b-11ea-9101-0660b7976219?type=test&server=http://test.ndexbio.org


Note that we also reused uploaded subnetworks with the `--term_2_uuid` argument here. 

## update display labels in the model view

It is possible to update the displayed labels on communities in the model view without a new upload. This can be achieved by the NDEx python client, which should have been installed as a prerequisite of DDOT.

In [82]:
import ndex.client as nc
from ndex.networkn import NdexGraph

In [84]:
my_ndex = nc.Ndex("http://test.ndexbio.org", username, passwd)

We now change the label of `S22573` in the above model to `helloworld`.

In [94]:
def change_hiview_label(uuid, dict_rename):

    Gcx = my_ndex.get_network_as_cx_stream(uuid).json()
    k1 = [i for i in range(len(Gcx)) if 'nodes' in Gcx[i].keys()][0]
    k2 = [i for i in range(len(Gcx)) if 'nodeAttributes' in Gcx[i].keys()][0]
    
    dict_nid_label = {}
    for d in Gcx[k1]['nodes']:
        if d['n'].split('.')[0] in dict_rename:
            dict_nid_label[d['@id']] = dict_rename[d['n'].split('.')[0]]
    
    
    for i in range(len(Gcx[k2]['nodeAttributes'])):
        nid = Gcx[k2]['nodeAttributes'][i]['po']
        if (nid in dict_nid_label) and (Gcx[k2]['nodeAttributes'][i]['n'] == 'Label'):
            Gcx[k2]['nodeAttributes'][i]['v'] = dict_nid_label[Gcx[k2]['nodeAttributes'][i]['po']]
            
    G = NdexGraph(Gcx)
    Gcx_new_stream = G.to_cx_stream()
    my_ndex.update_cx_network(Gcx_new_stream, uuid)
    return

In [95]:
uuid = '71ac490b-d76b-11ea-9101-0660b7976219'
change_hiview_label(uuid, {'S22573':'helloworld'})


consistency group max: 2


<img src="fig/fig5.png" width="600" />

## update node layout in the data view

By default, DDOT calls the `spring_layout` function in the `NetworkX` package to create a layout for nodes in the data view. But it is not necessarily the most informative layout, especially for large networks. With NDEx python client, users can provide their own node positions from alternative algorithms, and update the layout in the HiView data view.

To make an alteration, we need to know the UUID of a subnetwork. It can be found in the `term_uuid_XXX` file created after an upload.

In [97]:
%%bash

cat term_2_uuid.test3 |head -n 2

S21851	S21851	0e2e3818-d765-11ea-9101-0660b7976219
S21875	S21875	0e4828ba-d765-11ea-9101-0660b7976219


We now choose the first subnetwork (S21851), and divide x and y coordinates by 5 (to make nodes closer to each other)

In [108]:
uuid = '0e4828ba-d765-11ea-9101-0660b7976219'
Gcx = my_ndex.get_network_as_cx_stream(uuid).json()
G = NdexGraph(Gcx)




In [109]:
Gpos_new = {k:[v[0]/5, v[1]/5] for k,v in G.pos.items()} # alter node position
G.pos = Gpos_new
Gcx_new_stream = G.to_cx_stream()
my_ndex.update_cx_network(Gcx_new_stream, uuid) # update the network on NDEx

consistency group max: 2


''

# 5. Delete a HiView session from NDEx account

After upload has been finished, the script creates a folder (network set) on the NDEx account containing one network of the hierarchical model (to be used in the model view), as well as many subnetworks (to be used in the data view). With the button `Delete Network Set`, the set and all networks in this set can be deleted.

Warning: the name of the network set is equal to the value of `--hier_name`. NDEx cannot have two sets with identical names in the same account. Be sure to change the value of `--hier_name` every time.

**Warning**: note that if subnetworks are shared across models (see Section 3 above), the deleting operation could affect other models (unwanted).

# Conclusion

Congratulations! You have finished the tutorial and now you should be able to create your own hierarchical network visualization with DDOT and HiView. We haven't discussed all the options in the upload script in this tutorial, but we believe we have covered the most common use cases. If you encountered issues feel free to contact `fanzheng1101 at gmail dot com`, or leave comments in the "issues" tab of Github.

# Acknowledgements

Keiichiro Ono (development of HiView)  
Michael Ku Yu (conception of frameworks and initial development of DDOT)  
Anton Kratz (constructive feedback)