# Alzheimer consensus network automated workflow
This notebook gives a workflow for creating a consensus protein-protein interaction network using the library *py4cytoscape*. This uses the *CyREST* framework to make contact with Cytoscape where the commands will be executed. So, the **Cytoscape app needs to be opened in the browser** when executing this file. Also, pay attention to the amount of proteins imported from STRING, there are 2000 proteins imported and an additional 2000 for the expansion. Running the whole workflow can then take up to thirty minutes.

## Setup

In [1]:
pip install py4cytoscape

Note: you may need to restart the kernel to use updated packages.


In [2]:
## Imports
import py4cytoscape as p4c
import pandas as pd

In [3]:
## Execute to check whether contact with the Cytoscape app via CyREST works 
p4c.cytoscape_version_info()

{'apiVersion': 'v1',
 'cytoscapeVersion': '3.10.1',
 'automationAPIVersion': '1.9.0',
 'py4cytoscapeVersion': '1.9.0'}

## Load networks
Three types of networks will be loaded, firstly from KEGG, then Wikipathways and lastly STRING. It is important to indicate the desired STRING cutoff score in the beginning, which will influence the number of edges in the network. 

### Load KEGG

In [42]:
## Set string confidence in a variable with 0 > confidence >= 1.0 ##
string_confidence = 0.4

In [5]:
## Load KEGG network ##
p4c.networks.import_network_from_file("AD_hsa05010.gml")
p4c.layout_network('force-directed', network="current") # the gml file does not contain any layout information so layout has to be set
p4c.rename_network("KEGG AD hsa05010 network")

{'network': 128, 'title': 'KEGG AD hsa05010 network'}

In [6]:
## Create a Stringified version of KEGG network ##
cmd_list = ['string', 'stringify', 'colDisplayName="name"', 'column="uniprot"',
            'compoundQuery="false"', 'includeNotMapped="false"', 
            'networkNoGui="KEGG AD hsa05010 network"', 'species="Homo sapiens"']
cmd = " ".join(cmd_list)
p4c.commands.commands_run(cmd) 
p4c.rename_network("KEGG AD hsa05010 network stringified")

{'network': 17172, 'title': 'KEGG AD hsa05010 network stringified'}

In [7]:
## Set cutoff ##
change_cutoff_cmd_list = ['string change confidence', f'confidence={string_confidence}', 'network="KEGG AD hsa05010 network stringified"']
change_cutoff_cmd = " ".join(change_cutoff_cmd_list)
p4c.commands.commands_run(change_cutoff_cmd)

['']

### Load Wikipathways 

In [9]:
## Load Wikipathways network ##
cmd_list = ['wikipathways','import-as-network','id="WP5124"']
cmd = " ".join(cmd_list)
p4c.commands.commands_get(cmd) 
p4c.rename_network("Wikipathways AD WP5124 network")

{'network': 69854, 'title': 'Wikipathways AD WP5124 network'}

In [10]:
## Clone and rename Wikipathways network ##
p4c.clone_network("Wikipathways AD WP5124 network")
p4c.rename_network(title="Wikipathways AD WP5124 network uniprot", network="Wikipathways AD WP5124 network")

{'network': 69854, 'title': 'Wikipathways AD WP5124 network uniprot'}

In [17]:
## Add uniprot columns to Wikipathways network ##
p4c.load_table_data_from_file("wikipathway_to_uniprot_AD.xlsx", first_row_as_column_names=True,
                              data_key_column_index=1, network="Wikipathways AD WP5124 network uniprot")

{'mappedTables': [69825, 69863]}

In [18]:
## Remove nodes without uniprot ID ##
p4c.create_column_filter(filter_name="Columns without uniprot", 
                         column="uniprot", 
                         criterion="", predicate="DOES_NOT_CONTAIN") 

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without uniprot", network="Wikipathways AD WP5124 network uniprot")
p4c.delete_selected_nodes("Wikipathways AD WP5124 network uniprot")

No edges selected.
No edges selected.


{'nodes': [71603,
  70285,
  70042,
  70102,
  70994,
  71240,
  70967,
  71573,
  69982,
  71233,
  71129,
  71552,
  70276,
  71354,
  71074,
  71537,
  71394,
  70958,
  71579,
  71594,
  70099,
  71528,
  69994,
  71387,
  70855,
  71418,
  71585,
  71267,
  71432,
  71034,
  70928,
  70024,
  71041,
  70015,
  71618,
  69976,
  71122,
  70908,
  71525,
  71112,
  71119,
  69985,
  70282,
  71332,
  71558,
  70915,
  71249,
  71543,
  69973,
  70309,
  69979,
  70033,
  71935,
  70105,
  71452,
  71567,
  71048,
  71597,
  71020,
  70941,
  72065,
  70039,
  71600,
  71561,
  71621,
  71570,
  70901,
  71335,
  70018,
  72060,
  71546,
  70279,
  70012,
  69991,
  71027,
  70339,
  71540,
  70171,
  71852,
  71296,
  71303,
  71230,
  71083,
  71564,
  71156,
  71474,
  70093,
  70096,
  71555,
  71258,
  71509,
  71615,
  71576,
  71325,
  71606,
  71310,
  70027,
  71097,
  71227,
  71588,
  71378,
  70030,
  69970,
  70000,
  70852,
  71512,
  71534,
  71549,
  69967,
  71609,
 

In [19]:
## Create a Stringified version of Wikipathways network ##
cmd_list_stringify_wp = ['string', 'stringify', 'colDisplayName="name"', 'column="uniprot"',
                        'compoundQuery="false"', 'includeNotMapped="false"', 
                        'networkNoGui="Wikipathways AD WP5124 network uniprot"', 'species="Homo sapiens"']
cmd_stringify_wp = " ".join(cmd_list_stringify_wp)
p4c.commands.commands_run(cmd_stringify_wp)
p4c.rename_network("Wikipathways AD WP5124 network stringified")

{'network': 82283, 'title': 'Wikipathways AD WP5124 network stringified'}

In [20]:
## Set cutoff to string confidence ##
change_cutoff_cmd_list = ['string change confidence', f'confidence={string_confidence}', 'network="Wikipathways AD WP5124 network stringified"']
change_cutoff_cmd = " ".join(change_cutoff_cmd_list)
p4c.commands.commands_run(change_cutoff_cmd)

['']

### Load STRING

In [64]:
## Load STRING network ##
string_cmd_list = ['string disease query','disease="doid:10652"', 'species="Homo sapiens"', 'limit=2000', f'cutoff={string_confidence}'] # 2000 proteins takes quite long
string_cmd = " ".join(string_cmd_list)
p4c.commands.commands_run(string_cmd)
p4c.rename_network(f"STRING AD 10652 network C{string_confidence}")

AttributeError: module 'py4cytoscape' has no attribute 'set_verbosity'

In [54]:
## Check whether networks are imported and store suids ##
network_names = p4c.get_network_list()
kegg_suid = p4c.get_network_suid("KEGG AD hsa05010 network stringified")
wiki_suid = p4c.get_network_suid("Wikipathways AD WP5124 network stringified")
string_suid = p4c.get_network_suid(f"STRING AD 10652 network C{string_confidence}")

print(network_names)
print(kegg_suid, wiki_suid, string_suid)

['KEGG AD hsa05010 network', 'Wikipathways AD WP5124 network_1', 'KEGG AD hsa05010 network stringified', 'Wikipathways AD WP5124 network stringified', 'STRING AD 10652 network C0.4', 'Wikipathways AD WP5124 network uniprot']
17172 82283 1212479


## Add database columns to tables
This part is so that there is an extra column in every node table that indicates the database where the data is from. This way, in the merged network, it will be easier to find where the overlap is between the databases.

In [55]:
## Create dataframe with new column that indicates which database the data originates from ##
kegg_nodes = p4c.get_all_nodes(kegg_suid)
kegg_label_df = pd.DataFrame(data=kegg_nodes, columns=["shared name"])
kegg_label_df["kegg_db"] = 1 # Column value will be 1 for all data in kegg network

wiki_nodes = p4c.get_all_nodes(wiki_suid)
wiki_label_df = pd.DataFrame(data=wiki_nodes, columns=["shared name"])
wiki_label_df["wiki_db"] = 1 # Column value will be 1 for all data in wikipathways network

string_04_nodes = p4c.get_all_nodes(string_suid)
string_04_label_df = pd.DataFrame(data=string_04_nodes, columns=["shared name"])
string_04_label_df["string_db"] = 1 # Column value will be 1 for all data in string network

In [56]:
## Merge dataframes with tables in networks ##
p4c.set_current_network(kegg_suid)
p4c.load_table_data(kegg_label_df, data_key_column="shared name", network=kegg_suid)
p4c.delete_table_column("row.names") 

p4c.set_current_network(wiki_suid)
p4c.load_table_data(wiki_label_df, data_key_column="shared name", network=wiki_suid)
p4c.delete_table_column("row.names")

p4c.set_current_network(string_suid)
p4c.load_table_data(string_04_label_df, data_key_column="shared name", network=string_suid)
p4c.delete_table_column("row.names")

''

## Remove rows without canonical name
To ensure that the merge will not give conflicting results because some of the column values are empty, all rows that do not have a Uniprot identifier (canonical name) will be removed.

In [57]:
## Create filter for entries that don't have a stringdb canonical name (uniprot ID). Delete selected rows in each network. ##
p4c.create_column_filter(filter_name="Columns without canonical name", 
                         column="stringdb::canonical name", 
                         criterion="", predicate="DOES_NOT_CONTAIN") 

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without canonical name", network=kegg_suid)
p4c.delete_selected_nodes(kegg_suid)

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without canonical name", network=wiki_suid)
p4c.delete_selected_nodes(wiki_suid)

# Apply filter and delete nodes for STRING networks
p4c.apply_filter("Columns without canonical name", network=string_suid)
p4c.delete_selected_nodes(string_suid)

No edges selected.
No nodes selected.
No edges selected.
No nodes selected.
No edges selected.
No edges selected.


{'nodes': [1213489, 1213408, 1212769], 'edges': []}

# Network merging

In [58]:
## Merge the three networks ##
network_names = ['KEGG AD hsa05010 network stringified', f'STRING AD 10652 network C{string_confidence}', 'Wikipathways AD WP5124 network stringified']
node_keys_list = ["stringdb::canonical name", "stringdb::canonical name", "stringdb::canonical name"]
merged_network_name = f"Merged AD STRING KEGG Wiki network cutoff {string_confidence}"

## This try except structure is to catch a type error in the py4cytoscape library
try: 
    p4c.merge_networks(sources=network_names, 
                       title=merged_network_name, 
                       node_keys=node_keys_list)
except TypeError:
    print(p4c.get_network_list())

merged_network_suid = p4c.get_network_suid(merged_network_name)

## Filtering database columns
To check the overlap between the databases, a filter will be applied.

In [59]:
## Create dataframe with the column indicators ##
database_filter_df = p4c.get_table_columns('node', columns=["name", "kegg_db", "wiki_db", "string_db"], network=merged_network_suid)
database_filter_df["database"] = "" # Create new column with empty string values

## Give the database column a value based on which databases the nodes are from
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "kegg;wiki;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "kegg;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] != 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "kegg;wiki"
database_filter_df.loc[(database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "wiki;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] != 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "kegg"
database_filter_df.loc[((database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] != 1)) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "wiki"
database_filter_df.loc[(database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "string"

print(database_filter_df.groupby("database")["database"].count()) # Give the distribution of nodes in the merged network

p4c.set_current_network(merged_network_suid)
p4c.load_table_data(database_filter_df, data_key_column="name") # Add the database filter column to the network
p4c.delete_table_column("row.names")

database
kegg                121
kegg;string           5
kegg;wiki           241
kegg;wiki;string     12
string              308
wiki                  1
Name: database, dtype: int64


''

## Network expansion
Expand the merged network with another 2000 nodes

In [60]:
## Expand Merged STRING_0.4 KEGG Wiki network ##
p4c.clone_network(merged_network_suid) # Creates a clone of the merged network, with "_1" added to the name
expansion_cmd_list = ['string expand', 'additionalNodes=2000', f'network="{merged_network_name}_1"', 
                      'nodeTypes="Homo sapiens"', 'selectivityAlpha=0.5']
expansion_cmd = " ".join(expansion_cmd_list)
p4c.commands.commands_run(expansion_cmd)
p4c.rename_network(f"{merged_network_name} expanded")
merged_network_expanded_suid = p4c.get_network_suid(f"{merged_network_name} expanded")

## Visualization
More additional visualization can be done in Cytoscape itself, this only visualizes the database distribution and the nervous tissue confidence.

In [67]:
## Set color of nodes according to a gradient for the confidence that the protein is active in the nervous system
if "ADCN style" not in p4c.get_visual_style_names():
    p4c.create_visual_style("ADCN style")

p4c.set_current_network(merged_network_expanded_suid)
p4c.set_node_color_mapping(**p4c.gen_node_color_map('tissue::nervous system', p4c.palette_color_brewer_s_YlOrBr(), style_name="ADCN style"))

''

In [69]:
## Set shape of nodes according to database values
database_values = ["kegg", "string", "wiki", "kegg;string", "kegg;wiki", "wiki;string", "kegg;wiki;string"]
mapping_shapes = ["HEXAGON", "ROUND_RECTANGLE", "VEE", "PARALLELOGRAM", "DIAMOND", "ELLIPSE", "TRIANGLE"]

p4c.set_node_shape_mapping(table_column="database", table_column_values=database_values,
                            shapes=mapping_shapes, default_shape="RECTANGLE", style_name="ADCN style")

''

In [70]:
## Set ADCN style for the two networks
p4c.set_visual_style("ADCN style", merged_network_suid)
p4c.set_visual_style("ADCN style", merged_network_expanded_suid)

{'message': 'Visual Style applied.'}

## Analyse the networks
Analyse the normal and expanded networks with the analysis software from Cytoscape

In [71]:
## Analyse network
p4c.set_current_network(merged_network_suid)
analysis = p4c.analyze_network()
analysis_df = pd.DataFrame.from_dict(analysis, 'index')
print(analysis_df)

                                                                0
networkTitle    Merged AD STRING KEGG Wiki network cutoff 0.4 ...
nodeCount                                                     688
edgeCount                                                   12617
avNeighbors                                    37.036334913112164
diameter                                                        9
radius                                                          5
avSpl                                           3.054312396264523
cc                                             0.6021593099716364
density                                        0.0586017957485952
heterogeneity                                  0.9214585711397336
centralization                                 0.2269253144496379
ncc                                                            49
time                                                        0.722


In [72]:
## Analyse expanded network
p4c.set_current_network(merged_network_expanded_suid)
analysis_expanded = p4c.analyze_network()
analysis_expanded_df = pd.DataFrame.from_dict(analysis_expanded, 'index')
print(analysis_expanded_df)

                                                                0
networkTitle    Merged AD STRING KEGG Wiki network cutoff 0.4 ...
nodeCount                                                    2688
edgeCount                                                  170610
avNeighbors                                    126.61171204774338
diameter                                                        6
radius                                                          3
avSpl                                          2.3028857020381124
cc                                            0.44976685101361424
density                                       0.04724317613721768
heterogeneity                                  0.8711902339774741
centralization                                 0.3141852328503062
ncc                                                             8
time                                                       10.565
