# Spinocerebellar ataxia consensus network automated workflow
This notebook gives a workflow for creating a consensus protein-protein interaction network using the library *py4cytoscape*. This uses the *CyREST* framework to make contact with Cytoscape where the commands will be executed. So, the **Cytoscape app needs to be opened in the browser** when executing this file. Also, pay attention to the amount of proteins imported from STRING, there are 2000 proteins imported and an additional 2000 for the expansion. Running the whole workflow can then take up to thirty minutes.

## Setup

In [1]:
pip install py4cytoscape

Note: you may need to restart the kernel to use updated packages.


In [2]:
## Imports
import py4cytoscape as p4c
import pandas as pd

In [3]:
## Execute to check whether contact with the Cytoscape app via CyREST works 
p4c.cytoscape_version_info()

{'apiVersion': 'v1',
 'cytoscapeVersion': '3.10.1',
 'automationAPIVersion': '1.9.0',
 'py4cytoscapeVersion': '1.9.0'}

## Load networks
Three types of networks will be loaded, firstly from KEGG, then Wikipathways and lastly STRING. It is important to indicate the desired STRING cutoff score in the beginning, which will influence the number of edges in the network. 

### Load KEGG

In [5]:
## Set string confidence in a variable with 0 > confidence >= 1.0 ##
string_confidence = 0.4

In [6]:
## Load KEGG network ##
p4c.networks.import_network_from_file("sca_hsa05017.gml")
p4c.layout_network('force-directed', network="current") # the gml file does not contain any layout information so layout has to be set
p4c.rename_network("KEGG hsa05017 network")

{'network': 21981, 'title': 'KEGG hsa05017 network'}

In [7]:
## Create a Stringified version of KEGG network ##
cmd_list = ['string', 'stringify', 'colDisplayName="name"', 'column="uniprot"',
            'compoundQuery="false"', 'includeNotMapped="false"', 
            'networkNoGui="KEGG hsa05017 network"', 'species="Homo sapiens"']
cmd = " ".join(cmd_list)
p4c.commands.commands_run(cmd) 
p4c.rename_network("KEGG hsa05017 network stringified")

{'network': 26235, 'title': 'KEGG hsa05017 network stringified'}

In [8]:
## Set cutoff ##
change_cutoff_cmd_list = ['string change confidence', f'confidence={string_confidence}', 'network="KEGG hsa05017 network stringified"']
change_cutoff_cmd = " ".join(change_cutoff_cmd_list)
p4c.commands.commands_run(change_cutoff_cmd)

['']

### Load Wikipathways 

In [42]:
## Load Wikipathways network ##
cmd_list = ['wikipathways','import-as-network','id="WP4760"']
cmd = " ".join(cmd_list)
p4c.commands.commands_get(cmd) 
p4c.rename_network("Wikipathways WP4760 network")

{'network': 40096, 'title': 'Wikipathways WP4760 network'}

In [43]:
## Clone and rename Wikipathways network ##
p4c.clone_network("Wikipathways WP4760 network")
p4c.rename_network(title="Wikipathways WP4760 network uniprot", network="Wikipathways WP4760 network")

{'network': 40096, 'title': 'Wikipathways WP4760 network uniprot'}

In [48]:
## Add uniprot columns to Wikipathways network ##
p4c.load_table_data_from_file("wikipathway_to_uniprot_sca.xlsx", first_row_as_column_names=True,
                              data_key_column_index=1, network="Wikipathways WP4760 network uniprot")

{'mappedTables': [40067, 40105]}

In [49]:
## Remove nodes without uniprot ID ##
p4c.create_column_filter(filter_name="Columns without uniprot", 
                         column="uniprot", 
                         criterion="", predicate="DOES_NOT_CONTAIN") 

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without uniprot", network="Wikipathways WP4760 network uniprot")
p4c.delete_selected_nodes("Wikipathways WP4760 network uniprot")

No edges selected.
No edges selected.


{'nodes': [40323,
  40224,
  40341,
  40206,
  40239,
  40312,
  40338,
  40347,
  40332,
  40326,
  40215,
  40301,
  40287,
  40329,
  40344,
  40242,
  40218,
  40298,
  40335,
  40254],
 'edges': [40431,
  40303,
  40368,
  40305,
  40434,
  40371,
  40307,
  40437,
  40309,
  40374,
  40440,
  40377,
  40314,
  40443,
  40380,
  40316,
  40446,
  40318,
  40383,
  40320,
  40386,
  40389,
  40392,
  40395,
  40398,
  40401,
  40404,
  40407,
  40410,
  40413,
  40350,
  40416,
  40353,
  40289,
  40419,
  40291,
  40293,
  40422,
  40295,
  40425,
  40362,
  40428,
  40365]}

In [50]:
## Create a Stringified version of Wikipathways network ##
cmd_list_stringify_wp = ['string', 'stringify', 'colDisplayName="name"', 'column="uniprot"',
                        'compoundQuery="false"', 'includeNotMapped="false"', 
                        'networkNoGui="Wikipathways WP4760 network uniprot"', 'species="Homo sapiens"']
cmd_stringify_wp = " ".join(cmd_list_stringify_wp)
p4c.commands.commands_run(cmd_stringify_wp)
p4c.rename_network("Wikipathways WP4760 network stringified")

{'network': 41686, 'title': 'Wikipathways WP4760 network stringified'}

In [51]:
## Set cutoff to string confidence ##
change_cutoff_cmd_list = ['string change confidence', f'confidence={string_confidence}', 'network="Wikipathways WP4760 network stringified"']
change_cutoff_cmd = " ".join(change_cutoff_cmd_list)
p4c.commands.commands_run(change_cutoff_cmd)

['']

### Load STRING

In [52]:
## Load STRING network ##
string_cmd_list = ['string disease query','disease="doid:0050954"', 'species="Homo sapiens"', 'limit=2000', f'cutoff={string_confidence}'] # 2000 proteins takes quite long
string_cmd = " ".join(string_cmd_list)
p4c.commands.commands_run(string_cmd)
p4c.rename_network(f"STRING 0050954 network C{string_confidence}")

{'network': 42582, 'title': 'STRING 0050954 network C0.4'}

In [54]:
## Check whether networks are imported and store suids ##
network_names = p4c.get_network_list()
kegg_suid = p4c.get_network_suid("KEGG hsa05017 network stringified")
wiki_suid = p4c.get_network_suid("Wikipathways WP4760 network stringified")
string_suid = p4c.get_network_suid(f"STRING 0050954 network C{string_confidence}")

print(network_names)
print(kegg_suid, wiki_suid, string_suid)

['Wikipathways WP4760 network uniprot', 'Wikipathways WP4760 network stringified', 'STRING 0050954 network C0.4', 'KEGG hsa05017 network stringified', 'Wikipathways WP4760 network_1', 'KEGG hsa05017 network']
26235 41686 42582


## Add database columns to tables
This part is so that there is an extra column in every node table that indicates the database where the data is from. This way, in the merged network, it will be easier to find where the overlap is between the databases.

In [55]:
## Create dataframe with new column that indicates which database the data originates from ##
kegg_nodes = p4c.get_all_nodes(kegg_suid)
kegg_label_df = pd.DataFrame(data=kegg_nodes, columns=["shared name"])
kegg_label_df["kegg_db"] = 1 # Column value will be 1 for all data in kegg network

wiki_nodes = p4c.get_all_nodes(wiki_suid)
wiki_label_df = pd.DataFrame(data=wiki_nodes, columns=["shared name"])
wiki_label_df["wiki_db"] = 1 # Column value will be 1 for all data in wikipathways network

string_04_nodes = p4c.get_all_nodes(string_suid)
string_04_label_df = pd.DataFrame(data=string_04_nodes, columns=["shared name"])
string_04_label_df["string_db"] = 1 # Column value will be 1 for all data in string network

In [56]:
## Merge dataframes with tables in networks ##
p4c.set_current_network(kegg_suid)
p4c.load_table_data(kegg_label_df, data_key_column="shared name", network=kegg_suid)
p4c.delete_table_column("row.names") 

p4c.set_current_network(wiki_suid)
p4c.load_table_data(wiki_label_df, data_key_column="shared name", network=wiki_suid)
p4c.delete_table_column("row.names")

p4c.set_current_network(string_suid)
p4c.load_table_data(string_04_label_df, data_key_column="shared name", network=string_suid)
p4c.delete_table_column("row.names")

''

## Remove rows without canonical name
To ensure that the merge will not give conflicting results because some of the column values are empty, all rows that do not have a Uniprot identifier (canonical name) will be removed.

In [57]:
## Create filter for entries that don't have a stringdb canonical name (uniprot ID). Delete selected rows in each network. ##
p4c.create_column_filter(filter_name="Columns without canonical name", 
                         column="stringdb::canonical name", 
                         criterion="", predicate="DOES_NOT_CONTAIN") 

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without canonical name", network=kegg_suid)
p4c.delete_selected_nodes(kegg_suid)

# Apply filter and delete nodes for Wikipathways
p4c.apply_filter("Columns without canonical name", network=wiki_suid)
p4c.delete_selected_nodes(wiki_suid)

# Apply filter and delete nodes for STRING networks
p4c.apply_filter("Columns without canonical name", network=string_suid)
p4c.delete_selected_nodes(string_suid)

No edges selected.
No nodes selected.
No edges selected.
No nodes selected.
No edges selected.
No edges selected.


{'nodes': [42866,
  43460,
  43529,
  42824,
  44228,
  44612,
  43520,
  43913,
  42767,
  43871,
  44696,
  43103,
  42878,
  43988,
  43685,
  44936,
  43670,
  44090,
  42965,
  44891,
  44420,
  42884,
  44486,
  42911,
  43421,
  44441,
  44060,
  44021,
  44549,
  44378,
  44075,
  42905,
  43433,
  44153,
  44132,
  44003,
  42875,
  43571,
  43535,
  44165,
  43349,
  43442,
  44141,
  44453,
  43892],
 'edges': [73976,
  91127,
  65786,
  82685,
  89090,
  45569,
  52739,
  87299,
  88328,
  79367,
  48398,
  73487,
  89105,
  89108,
  87830,
  71957,
  70421,
  51734,
  72983,
  89114,
  77081,
  68894,
  49694,
  89117,
  89120,
  64799,
  88097,
  88100,
  89123,
  88103,
  89129,
  63275,
  89132,
  47915,
  89135,
  89138,
  49202,
  84788,
  88118,
  54581,
  89141,
  88124,
  78140,
  88127,
  88130,
  49217,
  82499,
  88133,
  59462,
  77384,
  90695,
  88139,
  63056,
  78674,
  53843,
  92243,
  87635,
  83030,
  92246,
  51287,
  88919,
  59480,
  54617,
  92249,


# Network merging

In [59]:
## Merge the three networks ##
network_names = ['KEGG hsa05017 network stringified', f'STRING 0050954 network C{string_confidence}', 'Wikipathways WP4760 network stringified']
node_keys_list = ["stringdb::canonical name", "stringdb::canonical name", "stringdb::canonical name"]
merged_network_name = f"Merged STRING KEGG Wiki network cutoff {string_confidence}"

## This try except structure is to catch a type error in the py4cytoscape library
try: 
    p4c.merge_networks(sources=network_names, 
                       title=merged_network_name, 
                       node_keys=node_keys_list)
except TypeError:
    print(p4c.get_network_list())

merged_network_suid = p4c.get_network_suid(merged_network_name)

## Filtering database columns
To check the overlap between the databases, a filter will be applied.

In [61]:
## Create dataframe with the column indicators ##
database_filter_df = p4c.get_table_columns('node', columns=["name", "kegg_db", "wiki_db", "string_db"], network=merged_network_suid)
database_filter_df["database"] = "" # Create new column with empty string values

## Give the database column a value based on which databases the nodes are from
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "kegg;wiki;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "kegg;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] != 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "kegg;wiki"
database_filter_df.loc[(database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "wiki;string"
database_filter_df.loc[(database_filter_df['kegg_db'] == 1) & (database_filter_df["string_db"] != 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "kegg"
database_filter_df.loc[((database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] != 1)) &
                    (database_filter_df["wiki_db"] == 1), "database"] = "wiki"
database_filter_df.loc[(database_filter_df['kegg_db'] != 1) & (database_filter_df["string_db"] == 1) &
                    (database_filter_df["wiki_db"] != 1), "database"] = "string"

print(database_filter_df.groupby("database")["database"].count()) # Give the distribution of nodes in the merged network

p4c.set_current_network(merged_network_suid)
p4c.load_table_data(database_filter_df, data_key_column="name") # Add the database filter column to the network
p4c.delete_table_column("row.names")

database
kegg                 80
kegg;string          49
kegg;wiki             4
kegg;wiki;string      8
string              630
wiki                  7
wiki;string           2
Name: database, dtype: int64


''

## Network expansion
Expand the merged network with another 2000 nodes

In [62]:
## Expand Merged STRING_0.4 KEGG Wiki network ##
p4c.clone_network(merged_network_suid) # Creates a clone of the merged network, with "_1" added to the name
expansion_cmd_list = ['string expand', 'additionalNodes=2000', f'network="{merged_network_name}_1"', 
                      'nodeTypes="Homo sapiens"', 'selectivityAlpha=0.5']
expansion_cmd = " ".join(expansion_cmd_list)
p4c.commands.commands_run(expansion_cmd)
p4c.rename_network(f"{merged_network_name} expanded")
merged_network_expanded_suid = p4c.get_network_suid(f"{merged_network_name} expanded")

## Visualization
More additional visualization can be done in Cytoscape itself, this only visualizes the database distribution and the nervous tissue confidence.

In [63]:
## Set color of nodes according to a gradient for the confidence that the protein is active in the nervous system
if "ADCN style" not in p4c.get_visual_style_names():
    p4c.create_visual_style("ADCN style")

p4c.set_current_network(merged_network_expanded_suid)
p4c.set_node_color_mapping(**p4c.gen_node_color_map('tissue::nervous system', p4c.palette_color_brewer_s_YlOrBr(), style_name="ADCN style"))

''

In [64]:
## Set shape of nodes according to database values
database_values = ["kegg", "string", "wiki", "kegg;string", "kegg;wiki", "wiki;string", "kegg;wiki;string"]
mapping_shapes = ["HEXAGON", "ROUND_RECTANGLE", "VEE", "PARALLELOGRAM", "DIAMOND", "ELLIPSE", "TRIANGLE"]

p4c.set_node_shape_mapping(table_column="database", table_column_values=database_values,
                            shapes=mapping_shapes, default_shape="RECTANGLE", style_name="ADCN style")

''

In [65]:
## Set ADCN style for the two networks
p4c.set_visual_style("ADCN style", merged_network_suid)
p4c.set_visual_style("ADCN style", merged_network_expanded_suid)

{'message': 'Visual Style applied.'}

## Analyse the networks
Analyse the normal and expanded networks with the analysis software from Cytoscape

In [66]:
## Analyse network
p4c.set_current_network(merged_network_suid)
analysis = p4c.analyze_network()
analysis_df = pd.DataFrame.from_dict(analysis, 'index')
print(analysis_df)

                                                                0
networkTitle    Merged STRING KEGG Wiki network cutoff 0.4 (un...
nodeCount                                                     780
edgeCount                                                   17121
avNeighbors                                     45.05151915455746
diameter                                                        6
radius                                                          3
avSpl                                          2.4476560916455234
cc                                             0.4826698757444982
density                                      0.059591956553647435
heterogeneity                                  1.0104566554841483
centralization                                0.30099162549493674
ncc                                                            22
time                                                        0.598


In [67]:
## Analyse expanded network
p4c.set_current_network(merged_network_expanded_suid)
analysis_expanded = p4c.analyze_network()
analysis_expanded_df = pd.DataFrame.from_dict(analysis_expanded, 'index')
print(analysis_expanded_df)

                                                                0
networkTitle    Merged STRING KEGG Wiki network cutoff 0.4 exp...
nodeCount                                                    2780
edgeCount                                                  208707
avNeighbors                                    150.64259927797835
diameter                                                        6
radius                                                          3
avSpl                                           2.152426881943331
cc                                            0.40181811772066695
density                                       0.05440325001010413
heterogeneity                                  0.8164134752615942
centralization                                 0.4244152852493649
ncc                                                            11
time                                                       17.448
