# Huntington consensus network automated workflow
This notebook gives a workflow for creating a consensus protein-protein interaction network using the library *py4cytoscape*. This uses the *CyREST* framework to make contact with Cytoscape where the commands will be executed. So, the **Cytoscape app needs to be opened in the browser** when executing this file. Also, pay attention to the amount of proteins imported from STRING, there are 2000 proteins imported and an additional 2000 for the expansion. Running the whole workflow can then take up to thirty minutes.

## Setup

In [4]:
pip install py4cytoscape

Note: you may need to restart the kernel to use updated packages.


In [5]:
## Imports
import py4cytoscape as p4c
import pandas as pd

In [6]:
## Execute to check whether contact with the Cytoscape app via CyREST works 
p4c.cytoscape_version_info()

{'apiVersion': 'v1',
 'cytoscapeVersion': '3.10.2',
 'automationAPIVersion': '1.9.0',
 'py4cytoscapeVersion': '1.9.0'}

## Load networks
Three types of networks will be loaded, firstly from KEGG, then Wikipathways and lastly STRING. It is important to indicate the desired STRING cutoff score in the beginning, which will influence the number of edges in the network. 

### Load KEGG

In [7]:
## Set string confidence in a variable with 0 > confidence >= 1.0 ##
string_confidence = 0.4

In [8]:
## Load KEGG network ##
p4c.networks.import_network_from_file("HD_hsa05016.gml")
p4c.layout_network('force-directed', network="current") # the gml file does not contain any layout information so layout has to be set
p4c.rename_network("KEGG hsa05016 network")

{'network': 28191, 'title': 'KEGG hsa05016 network'}

In [9]:
## Create a Stringified version of KEGG network ##
cmd_list = ['string', 'stringify', 'colDisplayName="name"', 'column="uniprot"',
            'compoundQuery="false"', 'includeNotMapped="false"', 
            'networkNoGui="KEGG hsa05016 network"', 'species="Homo sapiens"']
cmd = " ".join(cmd_list)
p4c.commands.commands_run(cmd) 
p4c.rename_network("KEGG hsa05016 network stringified")

{'network': 42845, 'title': 'KEGG hsa05016 network stringified'}

In [10]:
## Set cutoff ##
change_cutoff_cmd_list = ['string change confidence', f'confidence={string_confidence}', 'network="KEGG hsa05016 network stringified"']
change_cutoff_cmd = " ".join(change_cutoff_cmd_list)
p4c.commands.commands_run(change_cutoff_cmd)

['']

### Load Wikipathways 

In [25]:
## Load Wikipathways network ##
cmd_list = ['wikipathways','import-as-network','id="WP3853"']
cmd = " ".join(cmd_list)
p4c.commands.commands_get(cmd) 
p4c.rename_network("Wikipathways WP3853 network")

{'network': 82921, 'title': 'Wikipathways WP3853 network'}

In [26]:
## Clone and rename Wikipathways network ##
p4c.clone_network("Wikipathways WP3853 network")
p4c.rename_network(title="Wikipathways WP3853 network uniprot", network="Wikipathways WP3853 network")

{'network': 82921, 'title': 'Wikipathways WP3853 network uniprot'}

In [32]:
## Add uniprot columns to Wikipathways network ##
p4c.load_table_data_from_file("wikipathway_to_uniprot_HD.xlsx", first_row_as_column_names=True,
                              data_key_column_index=1, network="Wikipathways WP3853 network uniprot")

{'mappedTables': [82892, 82930]}

In [33]:
## Remove nodes without uniprot ID ##
p4c.create_column_filter(filter_name="Columns without uniprot", 
                         column="uniprot", 
                         criterion="", predicate="DOES_NOT_CONTAIN") 

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without uniprot", network="Wikipathways WP3853 network uniprot")
p4c.delete_selected_nodes("Wikipathways WP3853 network uniprot")

No edges selected.
No edges selected.


{'nodes': [83070, 83096, 83103, 83043, 83089, 83082, 83034, 83028],
 'edges': [83152,
  83122,
  83091,
  83093,
  83128,
  83098,
  83100,
  83131,
  83105,
  83137,
  83140,
  83107,
  83110,
  83143,
  83146,
  83113,
  83084,
  83116,
  83086]}

In [34]:
## Create a Stringified version of Wikipathways network ##
cmd_list_stringify_wp = ['string', 'stringify', 'colDisplayName="name"', 'column="uniprot"',
                        'compoundQuery="false"', 'includeNotMapped="false"', 
                        'networkNoGui="Wikipathways WP3853 network uniprot"', 'species="Homo sapiens"']
cmd_stringify_wp = " ".join(cmd_list_stringify_wp)
p4c.commands.commands_run(cmd_stringify_wp)
p4c.rename_network("Wikipathways WP3853 network stringified")

{'network': 83918, 'title': 'Wikipathways WP3853 network stringified'}

In [35]:
## Set cutoff to string confidence ##
change_cutoff_cmd_list = ['string change confidence', f'confidence={string_confidence}', 'network="Wikipathways WP3853 network stringified"']
change_cutoff_cmd = " ".join(change_cutoff_cmd_list)
p4c.commands.commands_run(change_cutoff_cmd)

['']

### Load STRING

In [13]:
## Load STRING network ##
string_cmd_list = ['string disease query','disease="doid:12858"', 'species="Homo sapiens"', 'limit=5000', f'cutoff={string_confidence}'] # 2000 proteins takes quite long
string_cmd = " ".join(string_cmd_list)
p4c.commands.commands_run(string_cmd)
p4c.rename_network(f"STRING 12858 network C{string_confidence}")

In commands_post(): Task Cancelled. Could not validate Tunable inputs: Duplicated network name.


CyError: In commands_post(): Task Cancelled. Could not validate Tunable inputs: Duplicated network name.

In [17]:
## Create a Stringified version of Wikipathways network ##
cmd_list_stringify_wp = ['string', 'stringify', 'colDisplayName="name"', 'column="uniprot"',
                        'compoundQuery="false"', 'includeNotMapped="false"', 
                        'networkNoGui="STRING 12858 network C0.4"', 'species="Homo sapiens"']
cmd_stringify_wp = " ".join(cmd_list_stringify_wp)
p4c.commands.commands_run(cmd_stringify_wp)
p4c.rename_network("STRING 12858 network C0.4 test")

In commands_get(): Failed: Cannot invoke "org.cytoscape.model.CyColumn.getValues(java.lang.Class)" because "col" is null


CyError: In commands_get(): Failed: Cannot invoke "org.cytoscape.model.CyColumn.getValues(java.lang.Class)" because "col" is null

In [14]:
## Check whether networks are imported and store suids ##
network_names = p4c.get_network_list()
kegg_suid = p4c.get_network_suid("KEGG hsa05016 network stringified")
wiki_suid = p4c.get_network_suid("Wikipathways WP3853 network stringified")
string_suid = p4c.get_network_suid(f"STRING 12858 network C{string_confidence}")

print(network_names)
print(kegg_suid, wiki_suid, string_suid)

['KEGG hsa05016 network stringified', 'Wikipathways WP3853 network stringified', 'Wikipathways WP3853 network_1', 'Wikipathways WP3853 network uniprot', 'KEGG hsa05016 network', 'STRING 12858 network C0.4']
128 15360 21598


## Add database columns to tables
This part is so that there is an extra column in every node table that indicates the database where the data is from. This way, in the merged network, it will be easier to find where the overlap is between the databases.

In [15]:
## Create dataframe with new column that indicates which database the data originates from ##
kegg_nodes = p4c.get_all_nodes(kegg_suid)
kegg_label_df = pd.DataFrame(data=kegg_nodes, columns=["shared name"])
kegg_label_df["kegg_db"] = 1 # Column value will be 1 for all data in kegg network

wiki_nodes = p4c.get_all_nodes(wiki_suid)
wiki_label_df = pd.DataFrame(data=wiki_nodes, columns=["shared name"])
wiki_label_df["wiki_db"] = 1 # Column value will be 1 for all data in wikipathways network

string_04_nodes = p4c.get_all_nodes(string_suid)
string_04_label_df = pd.DataFrame(data=string_04_nodes, columns=["shared name"])
string_04_label_df["string_db"] = 1 # Column value will be 1 for all data in string network

In [16]:
## Merge dataframes with tables in networks ##
p4c.set_current_network(kegg_suid)
p4c.load_table_data(kegg_label_df, data_key_column="shared name", network=kegg_suid)
p4c.delete_table_column("row.names") 

p4c.set_current_network(wiki_suid)
p4c.load_table_data(wiki_label_df, data_key_column="shared name", network=wiki_suid)
p4c.delete_table_column("row.names")

p4c.set_current_network(string_suid)
p4c.load_table_data(string_04_label_df, data_key_column="shared name", network=string_suid)
p4c.delete_table_column("row.names")

''

## Remove rows without canonical name
To ensure that the merge will not give conflicting results because some of the column values are empty, all rows that do not have a Uniprot identifier (canonical name) will be removed.

In [11]:
## Create filter for entries that don't have a stringdb canonical name (uniprot ID). Delete selected rows in each network. ##
p4c.create_column_filter(filter_name="Columns without canonical name", 
                         column="stringdb::canonical name", 
                         criterion="", predicate="DOES_NOT_CONTAIN") 

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without canonical name", network=kegg_suid)
p4c.delete_selected_nodes(kegg_suid)

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without canonical name", network=wiki_suid)
p4c.delete_selected_nodes(wiki_suid)

# Apply filter and delete nodes for STRING networks
p4c.apply_filter("Columns without canonical name", network=string_suid)
p4c.delete_selected_nodes(string_suid)

In create_column_filter(): Column "stringdb::canonical name" does not exist in the "node" table


CyError: In create_column_filter(): Column "stringdb::canonical name" does not exist in the "node" table

# Network merging

In [12]:
## Merge the three networks ##
network_names = ['KEGG hsa05016 network stringified', f'STRING 12858 network C{string_confidence}', 'Wikipathways WP3853 network stringified']
node_keys_list = ["stringdb::canonical name", "stringdb::canonical name", "stringdb::canonical name"]
merged_network_name = f"Merged STRING KEGG Wiki network cutoff {string_confidence}"

## This try except structure is to catch a type error in the py4cytoscape library
try: 
    p4c.merge_networks(sources=network_names, 
                       title=merged_network_name, 
                       node_keys=node_keys_list)
except TypeError:
    print(p4c.get_network_list())

merged_network_suid = p4c.get_network_suid(merged_network_name)

## Filtering database columns
To check the overlap between the databases, a filter will be applied.

In [64]:
## Create dataframe with the column indicators ##
database_filter_df = p4c.get_table_columns('node', columns=["name", "kegg_db", "wiki_db", "string_db"], network=merged_network_suid)
database_filter_df["database"] = "" # Create new column with empty string values

## Give the database column a value based on which databases the nodes are from
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "kegg;wiki;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "kegg;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] != 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "kegg;wiki"
database_filter_df.loc[(database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "wiki;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] != 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "kegg"
database_filter_df.loc[((database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] != 1)) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "wiki"
database_filter_df.loc[(database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "string"

print(database_filter_df.groupby("database")["database"].count()) # Give the distribution of nodes in the merged network

p4c.set_current_network(merged_network_suid)
p4c.load_table_data(database_filter_df, data_key_column="name") # Add the database filter column to the network
p4c.delete_table_column("row.names")

database
kegg                298
kegg;wiki             2
kegg;wiki;string      1
string                1
wiki                 11
Name: database, dtype: int64


''

## Network expansion
Expand the merged network with another 2000 nodes

In [65]:
## Expand Merged STRING_0.4 KEGG Wiki network ##
p4c.clone_network(merged_network_suid) # Creates a clone of the merged network, with "_1" added to the name
expansion_cmd_list = ['string expand', 'additionalNodes=2000', f'network="{merged_network_name}_1"', 
                      'nodeTypes="Homo sapiens"', 'selectivityAlpha=0.5']
expansion_cmd = " ".join(expansion_cmd_list)
p4c.commands.commands_run(expansion_cmd)
p4c.rename_network(f"{merged_network_name} expanded")
merged_network_expanded_suid = p4c.get_network_suid(f"{merged_network_name} expanded")

## Visualization
More additional visualization can be done in Cytoscape itself, this only visualizes the database distribution and the nervous tissue confidence.

In [66]:
## Set color of nodes according to a gradient for the confidence that the protein is active in the nervous system
if "ADCN style" not in p4c.get_visual_style_names():
    p4c.create_visual_style("ADCN style")

p4c.set_current_network(merged_network_expanded_suid)
p4c.set_node_color_mapping(**p4c.gen_node_color_map('tissue::nervous system', p4c.palette_color_brewer_s_YlOrBr(), style_name="ADCN style"))

''

In [67]:
## Set shape of nodes according to database values
database_values = ["kegg", "string", "wiki", "kegg;string", "kegg;wiki", "wiki;string", "kegg;wiki;string"]
mapping_shapes = ["HEXAGON", "ROUND_RECTANGLE", "VEE", "PARALLELOGRAM", "DIAMOND", "ELLIPSE", "TRIANGLE"]

p4c.set_node_shape_mapping(table_column="database", table_column_values=database_values,
                            shapes=mapping_shapes, default_shape="RECTANGLE", style_name="ADCN style")

''

In [68]:
## Set ADCN style for the two networks
p4c.set_visual_style("ADCN style", merged_network_suid)
p4c.set_visual_style("ADCN style", merged_network_expanded_suid)

{'message': 'Visual Style applied.'}

## Analyse the networks
Analyse the normal and expanded networks with the analysis software from Cytoscape

In [69]:
## Analyse network
p4c.set_current_network(merged_network_suid)
analysis = p4c.analyze_network()
analysis_df = pd.DataFrame.from_dict(analysis, 'index')
print(analysis_df)

                                                                0
networkTitle    Merged STRING KEGG Wiki network cutoff 0.4 (un...
nodeCount                                                     313
edgeCount                                                    7266
avNeighbors                                     46.34504792332268
diameter                                                        4
radius                                                          2
avSpl                                           2.216965675432129
cc                                             0.7474970609227527
density                                       0.14854182026705987
heterogeneity                                  0.7452214164559688
centralization                                0.24726894220463352
ncc                                                             1
time                                                         0.19


In [70]:
## Analyse expanded network
p4c.set_current_network(merged_network_expanded_suid)
analysis_expanded = p4c.analyze_network()
analysis_expanded_df = pd.DataFrame.from_dict(analysis_expanded, 'index')
print(analysis_expanded_df)

                                                                0
networkTitle    Merged STRING KEGG Wiki network cutoff 0.4 exp...
nodeCount                                                    2313
edgeCount                                                  148866
avNeighbors                                    128.70990056204064
diameter                                                        4
radius                                                          3
avSpl                                          2.2440355176174385
cc                                            0.49465858071111674
density                                        0.0556703722154155
heterogeneity                                  0.7629084926933136
centralization                                 0.3092163775174845
ncc                                                             1
time                                                        8.755
