# Alzheimer consensus network automated workflow
This notebook gives a workflow for creating a consensus protein-protein interaction network using the library *py4cytoscape*. This uses the *CyREST* framework to make contact with Cytoscape where the commands will be executed. So, the **Cytoscape app needs to be opened in the browser** when executing this file. Also, pay attention to the amount of proteins imported from STRING, there are 2000 proteins imported and an additional 2000 for the expansion. Running the whole workflow can then take up to thirty minutes.

## Setup

In [1]:
pip install py4cytoscape

Note: you may need to restart the kernel to use updated packages.


In [2]:
## Imports
import py4cytoscape as p4c
import pandas as pd

In [3]:
## Execute to check whether contact with the Cytoscape app via CyREST works 
p4c.cytoscape_version_info()

{'apiVersion': 'v1',
 'cytoscapeVersion': '3.10.1',
 'automationAPIVersion': '1.9.0',
 'py4cytoscapeVersion': '1.9.0'}

## Load networks
Three types of networks will be loaded, firstly from KEGG, then Wikipathways and lastly STRING. It is important to indicate the desired STRING cutoff score in the beginning, which will influence the number of edges in the network. 

### Load KEGG

In [5]:
## Set string confidence in a variable with 0 > confidence >= 1.0 ##
string_confidence = 0.4

In [6]:
## Load KEGG network ##
p4c.networks.import_network_from_file("hsa05010_mod_with_uniprot.gml")
p4c.layout_network('force-directed', network="current") # the gml file does not contain any layout information so layout has to be set
p4c.rename_network("KEGG hsa05010 network")

{'network': 128, 'title': 'KEGG hsa05010 network'}

In [7]:
## Create a Stringified version of KEGG network ##
cmd_list = ['string', 'stringify', 'colDisplayName="name"', 'column="uniprot"',
            'compoundQuery="false"', 'includeNotMapped="false"', 
            'networkNoGui="KEGG hsa05010 network"', 'species="Homo sapiens"']
cmd = " ".join(cmd_list)
p4c.commands.commands_run(cmd) 
p4c.rename_network("KEGG hsa05010 network stringified")

{'network': 16602, 'title': 'KEGG hsa05010 network stringified'}

In [8]:
## Set cutoff ##
change_cutoff_cmd_list = ['string change confidence', f'confidence={string_confidence}', 'network="KEGG hsa05010 network stringified"']
change_cutoff_cmd = " ".join(change_cutoff_cmd_list)
p4c.commands.commands_run(change_cutoff_cmd)

['']

### Load Wikipathways 

In [9]:
## Load Wikipathways network ##
cmd_list = ['wikipathways','import-as-network','id="WP5124"']
cmd = " ".join(cmd_list)
p4c.commands.commands_get(cmd) 
p4c.rename_network("Wikipathways WP5124 network")

{'network': 69274, 'title': 'Wikipathways WP5124 network'}

In [12]:
## Clone and rename Wikipathways network ##
p4c.clone_network("Wikipathways WP5124 network")
p4c.rename_network(title="Wikipathways WP5124 network uniprot", network="Wikipathways WP5124 network_2")

{'network': 79319, 'title': 'Wikipathways WP5124 network uniprot'}

In [13]:
## Add uniprot columns to Wikipathways network ##
p4c.load_table_data_from_file("wikipathway_to_uniprot.xlsx", first_row_as_column_names=True,
                              data_key_column_index=1, network="Wikipathways WP5124 network uniprot")

{'mappedTables': [79290, 79328]}

In [14]:
## Remove nodes without uniprot ID ##
p4c.create_column_filter(filter_name="Columns without uniprot", 
                         column="uniprot", 
                         criterion="", predicate="DOES_NOT_CONTAIN") 

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without uniprot", network="Wikipathways WP5124 network uniprot")
p4c.delete_selected_nodes("Wikipathways WP5124 network uniprot")

No edges selected.
No edges selected.


{'nodes': [80859,
  79571,
  80787,
  79679,
  79675,
  80879,
  79475,
  80939,
  79435,
  79727,
  79755,
  79735,
  80543,
  79619,
  79771,
  79587,
  79507,
  79687,
  79567,
  79447,
  79695,
  79527,
  79615,
  79555,
  79595,
  79535,
  79739,
  80459,
  79395,
  80771,
  80535,
  80919,
  79627,
  79495,
  79451,
  79647,
  79503,
  80779,
  79511,
  80871,
  80911,
  80867,
  79667,
  79651,
  79403,
  79547,
  79719,
  79607,
  79563,
  80855,
  79747,
  79711,
  79559,
  79439,
  79715,
  79467,
  80899,
  80775,
  79427,
  79487,
  80923,
  79655,
  79407,
  80559,
  79499,
  79671,
  80891,
  79775,
  80499,
  79623,
  79443,
  79759,
  79631,
  79575,
  80951,
  79531,
  79423,
  79591,
  80531,
  80947,
  79691,
  79419,
  79471,
  79415,
  79583,
  79731,
  80955,
  80539,
  80935,
  80683,
  79431,
  79515,
  79411,
  79767,
  80895,
  79483,
  80943,
  79635,
  79479,
  79707,
  79751,
  79551,
  79659,
  79703,
  79491,
  79699,
  79463,
  79599,
  79543,
  80931,
 

In [15]:
## Create a Stringified version of Wikipathways network ##
cmd_list_stringify_wp = ['string', 'stringify', 'colDisplayName="name"', 'column="uniprot"',
                        'compoundQuery="false"', 'includeNotMapped="false"', 
                        'networkNoGui="Wikipathways WP5124 network uniprot"', 'species="Homo sapiens"']
cmd_stringify_wp = " ".join(cmd_list_stringify_wp)
p4c.commands.commands_run(cmd_stringify_wp)
p4c.rename_network("Wikipathways WP5124 network stringified")

{'network': 86185, 'title': 'Wikipathways WP5124 network stringified'}

In [16]:
## Set cutoff to string confidence ##
change_cutoff_cmd_list = ['string change confidence', f'confidence={string_confidence}', 'network="Wikipathways WP5124 network stringified"']
change_cutoff_cmd = " ".join(change_cutoff_cmd_list)
p4c.commands.commands_run(change_cutoff_cmd)

['']

### Load STRING

In [17]:
## Load STRING network ##
string_cmd_list = ['string disease query','disease="doid:10652"', 'species="Homo sapiens"', 'limit=2000', f'cutoff={string_confidence}'] # 2000 proteins takes quite long
string_cmd = " ".join(string_cmd_list)
p4c.commands.commands_run(string_cmd)
p4c.rename_network(f"STRING 10652 network C{string_confidence}")

{'network': 112194, 'title': 'STRING 10652 network C0.4'}

In [18]:
## Check whether networks are imported and store suids ##
network_names = p4c.get_network_list()
kegg_suid = p4c.get_network_suid("KEGG hsa05010 network stringified")
wiki_suid = p4c.get_network_suid("Wikipathways WP5124 network stringified")
string_suid = p4c.get_network_suid(f"STRING 10652 network C{string_confidence}")

print(network_names)
print(kegg_suid, wiki_suid, string_suid)

['KEGG hsa05010 network', 'STRING 10652 network C0.4', 'Wikipathways WP5124 network uniprot', 'Wikipathways WP5124 network stringified', 'KEGG hsa05010 network stringified', 'Wikipathways WP5124 network', 'Wikipathways WP5124 network_1']
16602 86185 112194


## Add database columns to tables
This part is so that there is an extra column in every node table that indicates the database where the data is from. This way, in the merged network, it will be easier to find where the overlap is between the databases.

In [20]:
## Create dataframe with new column that indicates which database the data originates from ##
kegg_nodes = p4c.get_all_nodes(kegg_suid)
kegg_label_df = pd.DataFrame(data=kegg_nodes, columns=["shared name"])
kegg_label_df["kegg_db"] = 1 # Column value will be 1 for all data in kegg network

wiki_nodes = p4c.get_all_nodes(wiki_suid)
wiki_label_df = pd.DataFrame(data=wiki_nodes, columns=["shared name"])
wiki_label_df["wiki_db"] = 1 # Column value will be 1 for all data in wikipathways network

string_04_nodes = p4c.get_all_nodes(string_suid)
string_04_label_df = pd.DataFrame(data=string_04_nodes, columns=["shared name"])
string_04_label_df["string_db"] = 1 # Column value will be 1 for all data in string network

In [21]:
## Merge dataframes with tables in networks ##
p4c.set_current_network(kegg_suid)
p4c.load_table_data(kegg_label_df, data_key_column="shared name", network=kegg_suid)
p4c.delete_table_column("row.names") 

p4c.set_current_network(wiki_suid)
p4c.load_table_data(wiki_label_df, data_key_column="shared name", network=wiki_suid)
p4c.delete_table_column("row.names")

p4c.set_current_network(string_suid)
p4c.load_table_data(string_04_label_df, data_key_column="shared name", network=string_suid)
p4c.delete_table_column("row.names")

''

## Remove rows without canonical name
To ensure that the merge will not give conflicting results because some of the column values are empty, all rows that do not have a Uniprot identifier (canonical name) will be removed.

In [22]:
## Create filter for entries that don't have a stringdb canonical name (uniprot ID). Delete selected rows in each network. ##
p4c.create_column_filter(filter_name="Columns without canonical name", 
                         column="stringdb::canonical name", 
                         criterion="", predicate="DOES_NOT_CONTAIN") 

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without canonical name", network=kegg_suid)
p4c.delete_selected_nodes(kegg_suid)

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without canonical name", network=wiki_suid)
p4c.delete_selected_nodes(wiki_suid)

# Apply filter and delete nodes for STRING networks
p4c.apply_filter("Columns without canonical name", network=string_suid)
p4c.delete_selected_nodes(string_suid)

No edges selected.
No nodes selected.
No edges selected.
No edges selected.
No edges selected.


{'nodes': [117263,
  113075,
  117338,
  112775,
  117401,
  118175,
  117155,
  113201,
  114422,
  115073,
  114980,
  114431,
  118223,
  114842,
  115976,
  113414,
  114395,
  113054,
  117983,
  113834,
  113879,
  115448,
  117551,
  112559,
  115133,
  113825,
  117911,
  116804,
  116234,
  117734,
  114083,
  116528,
  116819,
  116264,
  113354,
  117815,
  117134,
  114242,
  114413,
  116789,
  118364,
  116945,
  116015,
  115625,
  114314,
  116318,
  115052,
  115715,
  117539,
  117587,
  116909,
  116246,
  114383,
  115373,
  112826,
  117359,
  118055,
  117326,
  114878,
  115193,
  115076,
  112724,
  114335,
  117818,
  113582,
  115865,
  116633,
  117935,
  115064,
  114350,
  116588,
  112577,
  112388,
  115415,
  112592,
  114956,
  117509,
  114254,
  116999,
  115850,
  117797,
  112634,
  114386,
  113192],
 'edges': [309233,
  340988,
  391163,
  266234,
  319487,
  448526,
  205832,
  313358,
  391181,
  374792,
  126989,
  221195,
  309269,
  196634,
 

# Network merging

In [23]:
## Merge the three networks ##
network_names = ['KEGG hsa05010 network stringified', f'STRING 10652 network C{string_confidence}', 'Wikipathways WP5124 network stringified']
node_keys_list = ["stringdb::canonical name", "stringdb::canonical name", "stringdb::canonical name"]
merged_network_name = f"Merged STRING KEGG Wiki network cutoff {string_confidence}"

## This try except structure is to catch a type error in the py4cytoscape library
try: 
    p4c.merge_networks(sources=network_names, 
                       title=merged_network_name, 
                       node_keys=node_keys_list)
except TypeError:
    print(p4c.get_network_list())

merged_network_suid = p4c.get_network_suid(merged_network_name)

## Filtering database columns
To check the overlap between the databases, a filter will be applied.

In [24]:
## Create dataframe with the column indicators ##
database_filter_df = p4c.get_table_columns('node', columns=["name", "kegg_db", "wiki_db", "string_db"], network=merged_network_suid)
database_filter_df["database"] = "" # Create new column with empty string values

## Give the database column a value based on which databases the nodes are from
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "kegg;wiki;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "kegg;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] != 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "kegg;wiki"
database_filter_df.loc[(database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "wiki;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] != 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "kegg"
database_filter_df.loc[((database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] != 1)) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "wiki"
database_filter_df.loc[(database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "string"

print(database_filter_df.groupby("database")["database"].count()) # Give the distribution of nodes in the merged network

p4c.set_current_network(merged_network_suid)
p4c.load_table_data(database_filter_df, data_key_column="name") # Add the database filter column to the network
p4c.delete_table_column("row.names")

database
kegg                  82
kegg;string           43
kegg;wiki            116
kegg;wiki;string     137
string              1733
wiki                   1
Name: database, dtype: int64


''

## Network expansion
Expand the merged network with another 2000 nodes

In [25]:
## Expand Merged STRING_0.4 KEGG Wiki network ##
p4c.clone_network(merged_network_suid) # Creates a clone of the merged network, with "_1" added to the name
expansion_cmd_list = ['string expand', 'additionalNodes=2000', f'network="{merged_network_name}_1"', 
                      'nodeTypes="Homo sapiens"', 'selectivityAlpha=0.5']
expansion_cmd = " ".join(expansion_cmd_list)
p4c.commands.commands_run(expansion_cmd)
p4c.rename_network(f"{merged_network_name} expanded")
merged_network_expanded_suid = p4c.get_network_suid(f"{merged_network_name} expanded")

## Visualization
More additional visualization can be done in Cytoscape itself, this only visualizes the database distribution and the nervous tissue confidence.

In [26]:
## Set color of nodes according to a gradient for the confidence that the protein is active in the nervous system
if "ADCN style" not in p4c.get_visual_style_names():
    p4c.create_visual_style("ADCN style")

p4c.set_current_network(merged_network_expanded_suid)
p4c.set_node_color_mapping(**p4c.gen_node_color_map('tissue::nervous system', p4c.palette_color_brewer_s_YlOrBr(), style_name="ADCN style"))

''

In [27]:
## Set shape of nodes according to database values
database_values = ["kegg", "string", "wiki", "kegg;string", "kegg;wiki", "wiki;string", "kegg;wiki;string"]
mapping_shapes = ["HEXAGON", "ROUND_RECTANGLE", "VEE", "PARALLELOGRAM", "DIAMOND", "ELLIPSE", "TRIANGLE"]

p4c.set_node_shape_mapping(table_column="database", table_column_values=database_values,
                            shapes=mapping_shapes, default_shape="RECTANGLE", style_name="ADCN style")

''

In [28]:
## Set ADCN style for the two networks
p4c.set_visual_style("ADCN style", merged_network_suid)
p4c.set_visual_style("ADCN style", merged_network_expanded_suid)

{'message': 'Visual Style applied.'}

## Analyse the networks
Analyse the normal and expanded networks with the analysis software from Cytoscape

In [29]:
## Analyse network
p4c.set_current_network(merged_network_suid)
analysis = p4c.analyze_network()
analysis_df = pd.DataFrame.from_dict(analysis, 'index')
print(analysis_df)

                                                                0
networkTitle    Merged STRING KEGG Wiki network cutoff 0.4 (un...
nodeCount                                                    2112
edgeCount                                                  118820
avNeighbors                                    111.57176693510185
diameter                                                        5
radius                                                          3
avSpl                                          2.1918230168761688
cc                                            0.44368924426849643
density                                      0.052877614661185714
heterogeneity                                  0.9800463134393858
centralization                                0.43805873721064537
ncc                                                             2
time                                                       16.667


In [30]:
## Analyse expanded network
p4c.set_current_network(merged_network_expanded_suid)
analysis_expanded = p4c.analyze_network()
analysis_expanded_df = pd.DataFrame.from_dict(analysis_expanded, 'index')
print(analysis_expanded_df)

                                                                0
networkTitle    Merged STRING KEGG Wiki network cutoff 0.4 exp...
nodeCount                                                    4112
edgeCount                                                  304017
avNeighbors                                    147.39041595718803
diameter                                                        4
radius                                                          3
avSpl                                          2.2220682626458834
cc                                            0.37497471018937084
density                                       0.03586141507474161
heterogeneity                                  0.8935457576140715
centralization                                 0.3355913877258335
ncc                                                             2
time                                                       60.312
