# Huntington consensus network automated workflow
This notebook gives a workflow for creating a consensus protein-protein interaction network using the library *py4cytoscape*. This uses the *CyREST* framework to make contact with Cytoscape where the commands will be executed. So, the **Cytoscape app needs to be opened in the browser** when executing this file. Also, pay attention to the amount of proteins imported from STRING, there are 2000 proteins imported and an additional 2000 for the expansion. Running the whole workflow can then take up to thirty minutes.

## Setup

In [11]:
pip install py4cytoscape




In [12]:
## Imports
import py4cytoscape as p4c
import pandas as pd

In [13]:
## Execute to check whether contact with the Cytoscape app via CyREST works 
p4c.cytoscape_version_info()

{'apiVersion': 'v1',
 'cytoscapeVersion': '3.10.1',
 'automationAPIVersion': '1.9.0',
 'py4cytoscapeVersion': '1.9.0'}

## Load networks
Three types of networks will be loaded, firstly from KEGG, then Wikipathways and lastly STRING. It is important to indicate the desired STRING cutoff score in the beginning, which will influence the number of edges in the network. 

### Load KEGG

In [14]:
## Set string confidence in a variable with 0 > confidence >= 1.0 ##
string_confidence = 0.4

In [15]:
## Load KEGG network ##
p4c.networks.import_network_from_file("HD_hsa05016.gml")
p4c.layout_network('force-directed', network="current") # the gml file does not contain any layout information so layout has to be set
p4c.rename_network("KEGG hsa05016 network")

{'network': 67355, 'title': 'KEGG hsa05016 network'}

In [16]:
## Create a Stringified version of KEGG network ##
cmd_list = ['string', 'stringify', 'colDisplayName="name"', 'column="uniprot"',
            'compoundQuery="false"', 'includeNotMapped="false"', 
            'networkNoGui="KEGG hsa05010 network"', 'species="Homo sapiens"']
cmd = " ".join(cmd_list)
p4c.commands.commands_run(cmd) 
p4c.rename_network("KEGG hsa05016 network stringified")

{'network': 81309, 'title': 'KEGG hsa05016 network stringified'}

In [17]:
## Set cutoff ##
change_cutoff_cmd_list = ['string change confidence', f'confidence={string_confidence}', 'network="KEGG hsa05010 network stringified"']
change_cutoff_cmd = " ".join(change_cutoff_cmd_list)
p4c.commands.commands_run(change_cutoff_cmd)

['']

### Load Wikipathways 

In [18]:
## Load Wikipathways network ##
cmd_list = ['wikipathways','import-as-network','id="WP3853"']
cmd = " ".join(cmd_list)
p4c.commands.commands_get(cmd) 
p4c.rename_network("Wikipathways WP3853 network")

{'network': 119711, 'title': 'Wikipathways WP3853 network'}

In [19]:
## Clone and rename Wikipathways network ##
p4c.clone_network("Wikipathways WP3853 network")
p4c.rename_network(title="Wikipathways WP3853 network uniprot", network="Wikipathways WP3853 network")

{'network': 119711, 'title': 'Wikipathways WP3853 network uniprot'}

In [25]:
## Add uniprot columns to Wikipathways network ##
p4c.load_table_data_from_file("wikipathway_to_uniprot_test.xlsx", first_row_as_column_names=True,
                              data_key_column_index=2, network="Wikipathways WP3853 network uniprot")

{'mappedTables': [119682, 119720]}

In [26]:
## Remove nodes without uniprot ID ##
p4c.create_column_filter(filter_name="Columns without uniprot", 
                         column="uniprot", 
                         criterion="", predicate="DOES_NOT_CONTAIN") 

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without uniprot", network="Wikipathways WP3853 network uniprot")
p4c.delete_selected_nodes("Wikipathways WP3853 network uniprot")

No edges selected.
No edges selected.


{'nodes': [119824, 119860, 119886, 119833, 119893, 119818, 119872, 119879],
 'edges': [119888,
  119890,
  119921,
  119927,
  119895,
  119930,
  119897,
  119900,
  119933,
  119936,
  119903,
  119906,
  119874,
  119876,
  119942,
  119912,
  119881,
  119883,
  119918]}

In [27]:
## Create a Stringified version of Wikipathways network ##
cmd_list_stringify_wp = ['string', 'stringify', 'colDisplayName="name"', 'column="uniprot"',
                        'compoundQuery="false"', 'includeNotMapped="false"', 
                        'networkNoGui="Wikipathways WP3853 network uniprot"', 'species="Homo sapiens"']
cmd_stringify_wp = " ".join(cmd_list_stringify_wp)
p4c.commands.commands_run(cmd_stringify_wp)
p4c.rename_network("Wikipathways WP3853 network stringified")

{'network': 121377, 'title': 'Wikipathways WP3853 network stringified'}

In [28]:
## Set cutoff to string confidence ##
change_cutoff_cmd_list = ['string change confidence', f'confidence={string_confidence}', 'network="Wikipathways WP3853 network stringified"']
change_cutoff_cmd = " ".join(change_cutoff_cmd_list)
p4c.commands.commands_run(change_cutoff_cmd)

['']

### Load STRING

In [29]:
## Load STRING network ##
string_cmd_list = ['string disease query','disease="doid:12858"', 'species="Homo sapiens"', 'limit=2000', f'cutoff={string_confidence}'] # 2000 proteins takes quite long
string_cmd = " ".join(string_cmd_list)
p4c.commands.commands_run(string_cmd)
p4c.rename_network(f"STRING 12858 network C{string_confidence}")

{'network': 122141, 'title': 'STRING 12858 network C0.4'}

In [36]:
## Check whether networks are imported and store suids ##
network_names = p4c.get_network_list()
kegg_suid = p4c.get_network_suid("KEGG hsa05016 network stringified")
wiki_suid = p4c.get_network_suid("Wikipathways WP3853 network stringified")
string_suid = p4c.get_network_suid(f"STRING 12858 network C{string_confidence}")

print(network_names)
print(kegg_suid, wiki_suid, string_suid)

['Wikipathways WP3853 network stringified', 'KEGG hsa05016 network', 'KEGG hsa05016 network stringified', 'Wikipathways WP3853 network_1', 'STRING 12858 network C0.4', 'Wikipathways WP3853 network uniprot']
81309 121377 122141


## Add database columns to tables
This part is so that there is an extra column in every node table that indicates the database where the data is from. This way, in the merged network, it will be easier to find where the overlap is between the databases.

In [32]:
## Create dataframe with new column that indicates which database the data originates from ##
kegg_nodes = p4c.get_all_nodes(kegg_suid)
kegg_label_df = pd.DataFrame(data=kegg_nodes, columns=["shared name"])
kegg_label_df["kegg_db"] = 1 # Column value will be 1 for all data in kegg network

wiki_nodes = p4c.get_all_nodes(wiki_suid)
wiki_label_df = pd.DataFrame(data=wiki_nodes, columns=["shared name"])
wiki_label_df["wiki_db"] = 1 # Column value will be 1 for all data in wikipathways network

string_04_nodes = p4c.get_all_nodes(string_suid)
string_04_label_df = pd.DataFrame(data=string_04_nodes, columns=["shared name"])
string_04_label_df["string_db"] = 1 # Column value will be 1 for all data in string network

In [33]:
## Merge dataframes with tables in networks ##
p4c.set_current_network(kegg_suid)
p4c.load_table_data(kegg_label_df, data_key_column="shared name", network=kegg_suid)
p4c.delete_table_column("row.names") 

p4c.set_current_network(wiki_suid)
p4c.load_table_data(wiki_label_df, data_key_column="shared name", network=wiki_suid)
p4c.delete_table_column("row.names")

p4c.set_current_network(string_suid)
p4c.load_table_data(string_04_label_df, data_key_column="shared name", network=string_suid)
p4c.delete_table_column("row.names")

''

## Remove rows without canonical name
To ensure that the merge will not give conflicting results because some of the column values are empty, all rows that do not have a Uniprot identifier (canonical name) will be removed.

In [34]:
## Create filter for entries that don't have a stringdb canonical name (uniprot ID). Delete selected rows in each network. ##
p4c.create_column_filter(filter_name="Columns without canonical name", 
                         column="stringdb::canonical name", 
                         criterion="", predicate="DOES_NOT_CONTAIN") 

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without canonical name", network=kegg_suid)
p4c.delete_selected_nodes(kegg_suid)

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without canonical name", network=wiki_suid)
p4c.delete_selected_nodes(wiki_suid)

# Apply filter and delete nodes for STRING networks
p4c.apply_filter("Columns without canonical name", network=string_suid)
p4c.delete_selected_nodes(string_suid)

No edges selected.
No nodes selected.
No edges selected.
No nodes selected.
No edges selected.
No edges selected.


{'nodes': [124201,
  127516,
  127114,
  122377,
  125026,
  124387,
  122608,
  126532,
  125140,
  124234,
  122563,
  125059,
  125257,
  123862,
  123355,
  127357,
  125410,
  126658,
  124447,
  124933,
  127315,
  128194,
  126898,
  123400,
  124462,
  127555,
  126484,
  126718,
  123898,
  126271,
  127786,
  128317,
  126160,
  125515,
  126319,
  126829,
  127501,
  126742,
  125923,
  124309,
  125485,
  124888,
  128038,
  126838,
  125008,
  127099,
  126886,
  124420,
  127321,
  123802,
  126724,
  126634,
  124438,
  127225,
  124060,
  124354,
  124258,
  123352,
  122515,
  125908,
  124291,
  126940,
  123052,
  123100,
  122659,
  124336,
  125581,
  123907,
  122734,
  127300,
  124945,
  125947,
  122800,
  122338,
  126346,
  127831,
  122656,
  126286,
  127807,
  126712,
  123580,
  126805,
  127717,
  125125,
  125770,
  126925,
  127810,
  125197,
  127963,
  124222,
  122503,
  125443,
  128095,
  127336,
  124396,
  123175,
  125038,
  126055,
  123184],


# Network merging

In [37]:
## Merge the three networks ##
network_names = ['KEGG hsa05016 network stringified', f'STRING 12858 network C{string_confidence}', 'Wikipathways WP3853 network stringified']
node_keys_list = ["stringdb::canonical name", "stringdb::canonical name", "stringdb::canonical name"]
merged_network_name = f"Merged STRING KEGG Wiki network cutoff {string_confidence}"

## This try except structure is to catch a type error in the py4cytoscape library
try: 
    p4c.merge_networks(sources=network_names, 
                       title=merged_network_name, 
                       node_keys=node_keys_list)
except TypeError:
    print(p4c.get_network_list())

merged_network_suid = p4c.get_network_suid(merged_network_name)

## Filtering database columns
To check the overlap between the databases, a filter will be applied.

In [38]:
## Create dataframe with the column indicators ##
database_filter_df = p4c.get_table_columns('node', columns=["name", "kegg_db", "wiki_db", "string_db"], network=merged_network_suid)
database_filter_df["database"] = "" # Create new column with empty string values

## Give the database column a value based on which databases the nodes are from
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "kegg;wiki;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "kegg;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] != 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "kegg;wiki"
database_filter_df.loc[(database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "wiki;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] != 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "kegg"
database_filter_df.loc[((database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] != 1)) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "wiki"
database_filter_df.loc[(database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "string"

print(database_filter_df.groupby("database")["database"].count()) # Give the distribution of nodes in the merged network

p4c.set_current_network(merged_network_suid)
p4c.load_table_data(database_filter_df, data_key_column="name") # Add the database filter column to the network
p4c.delete_table_column("row.names")

database
kegg                 130
kegg;string          169
kegg;wiki;string       3
string              1718
wiki                   3
wiki;string            8
Name: database, dtype: int64


''

## Network expansion
Expand the merged network with another 2000 nodes

In [39]:
## Expand Merged STRING_0.4 KEGG Wiki network ##
p4c.clone_network(merged_network_suid) # Creates a clone of the merged network, with "_1" added to the name
expansion_cmd_list = ['string expand', 'additionalNodes=2000', f'network="{merged_network_name}_1"', 
                      'nodeTypes="Homo sapiens"', 'selectivityAlpha=0.5']
expansion_cmd = " ".join(expansion_cmd_list)
p4c.commands.commands_run(expansion_cmd)
p4c.rename_network(f"{merged_network_name} expanded")
merged_network_expanded_suid = p4c.get_network_suid(f"{merged_network_name} expanded")

## Visualization
More additional visualization can be done in Cytoscape itself, this only visualizes the database distribution and the nervous tissue confidence.

In [40]:
## Set color of nodes according to a gradient for the confidence that the protein is active in the nervous system
if "ADCN style" not in p4c.get_visual_style_names():
    p4c.create_visual_style("ADCN style")

p4c.set_current_network(merged_network_expanded_suid)
p4c.set_node_color_mapping(**p4c.gen_node_color_map('tissue::nervous system', p4c.palette_color_brewer_s_YlOrBr(), style_name="ADCN style"))

''

In [41]:
## Set shape of nodes according to database values
database_values = ["kegg", "string", "wiki", "kegg;string", "kegg;wiki", "wiki;string", "kegg;wiki;string"]
mapping_shapes = ["HEXAGON", "ROUND_RECTANGLE", "VEE", "PARALLELOGRAM", "DIAMOND", "ELLIPSE", "TRIANGLE"]

p4c.set_node_shape_mapping(table_column="database", table_column_values=database_values,
                            shapes=mapping_shapes, default_shape="RECTANGLE", style_name="ADCN style")

''

In [42]:
## Set ADCN style for the two networks
p4c.set_visual_style("ADCN style", merged_network_suid)
p4c.set_visual_style("ADCN style", merged_network_expanded_suid)

{'message': 'Visual Style applied.'}

## Analyse the networks
Analyse the normal and expanded networks with the analysis software from Cytoscape

In [44]:
## Analyse network
p4c.set_current_network(merged_network_suid)
analysis = p4c.analyze_network()
analysis_df = pd.DataFrame.from_dict(analysis, 'index')
print(analysis_df)

                                                                0
networkTitle    Merged STRING KEGG Wiki network cutoff 0.4 (un...
nodeCount                                                    2031
edgeCount                                                  105642
avNeighbors                                    103.27846229669788
diameter                                                        5
radius                                                          3
avSpl                                           2.229900175269247
cc                                             0.4319631282057894
density                                        0.0509262634599102
heterogeneity                                   0.944561566460152
centralization                                 0.3937419297083067
ncc                                                             3
time                                                        4.299


In [45]:
## Analyse expanded network
p4c.set_current_network(merged_network_expanded_suid)
analysis_expanded = p4c.analyze_network()
analysis_expanded_df = pd.DataFrame.from_dict(analysis_expanded, 'index')
print(analysis_expanded_df)

                                                                0
networkTitle    Merged STRING KEGG Wiki network cutoff 0.4 exp...
nodeCount                                                    4031
edgeCount                                                  317355
avNeighbors                                     157.1054852320675
diameter                                                        5
radius                                                          3
avSpl                                           2.211373204643692
cc                                             0.3826036381717503
density                                       0.03900334787290653
heterogeneity                                  0.8403900351616043
centralization                                0.35466824111034034
ncc                                                             3
time                                                       30.872
