# Data gathering

In this part, we will gather as much as possible proteomics data from published literature and databases. Here are some sources:

1. CyanoMapDB, this is a database providing cyanobacterial PPIs with experimental evidence, consisting of 52,304 PPIs among 6,789 proteins from 23 cyanobacterial species. It collected available data in UniProt, STRING, and IntAct, and mined numerous PPIs from co-fractionation MS data in cyanobacteria.
2. Native Protein Complexes in Synechocystis sp. PCC 6803, Comparative Network Biology Discovers Protein Complexes That Underline Cellular Differentiation in Anabaena sp.; These two papers talk about how to construct the complexes using interactome data and clustering method. Our approach will be the same to use the same clustering method to analyze the CyanoMapDB interactome data of S. elongatus PCC7942
3. There are some cofraction proteomics data of S. elongatus PCC7942 reported from the paper: "Monitoring light/dark association dynamics of multi-protein complexes in cyanobacteria using size exclusion chromatography-based proteomics". The CyanoMapDB included this dataset and use it predicted the interactome.

In [1]:
import pandas as pd
import os, sys, re
from pathlib import Path
home = str(Path.home())

In [2]:
work_dir = home + "/Dropbox/PNNL/PredPheno/SystemModeling/Modeling/S_elongatus"
# work_dir

In [3]:
proteome = pd.read_excel(work_dir + "/data/interactome/Synechococcus_PCC_7942_Dataset.xlsx", sheet_name="Protein")
interactome = pd.read_excel(work_dir + "/data/interactome/Synechococcus_PCC_7942_Dataset.xlsx", sheet_name="PPI")


In [4]:
interactome

Unnamed: 0,Protein A,Protein B,Taxon,UniProt evidence,STRING score,IntAct score,IntAct method,GS complex evidence,CF-MS score,CF-MS ID A,CF-MS ID B,PPI index
0,O05161,P16954,Synechococcus elongatus (strain PCC 7942 / FAC...,,,,,,0.651,1G6TR,1G1IW,0.651
1,O05161,Q31KN7,Synechococcus elongatus (strain PCC 7942 / FAC...,,,,,,0.595,1G6TR,1G152,0.595
2,O05161,Q31LJ5,Synechococcus elongatus (strain PCC 7942 / FAC...,,,,,,0.694,1G6TR,1G2EH,0.694
3,O05161,Q31LM9,Synechococcus elongatus (strain PCC 7942 / FAC...,,,,,,0.639,1G6TR,1G4ZP,0.639
4,O05161,Q31N38,Synechococcus elongatus (strain PCC 7942 / FAC...,,,,,,0.647,1G6TR,1GCIT,0.647
...,...,...,...,...,...,...,...,...,...,...,...,...
4529,Q8GMR7,Q99QJ5,Synechococcus elongatus (strain PCC 7942 / FAC...,,,,,,0.598,1G0PI,1G030,0.598
4530,Q8GMT0,Q935Z3,Synechococcus elongatus (strain PCC 7942 / FAC...,,,,,,0.849,1G6TX,1G1IA,0.849
4531,Q8KPQ0,Q9L4P3,Synechococcus elongatus (strain PCC 7942 / FAC...,,,,,,0.675,1G0ND,1G126,0.675
4532,Q8KPU9,Q8L1E5,Synechococcus elongatus (strain PCC 7942 / FAC...,,,,,,0.696,1G15V,1G1X2,0.696


In [5]:
cl1_input = interactome[["Protein A", "Protein B", "PPI index"]]
input_file = work_dir + "/data/interactome/ClusterONE_input.tsv"
cl1_input.to_csv(input_file, sep='\t', index=None, header=None)

In [6]:
import subprocess
output_file = work_dir + "/data/interactome/ClusterONE_output.tsv"
output_file_csv = work_dir + "/data/interactome/ClusterONE_output.csv"
with open(output_file, 'w') as output:
    subprocess.run(["java", "-jar", home + "/Dropbox/PNNL/PredPheno/SystemModeling/tools/cluster_one-1.2.jar", "-f", "edge_list", "-F", "plain", input_file], stdout=output)
with open(output_file_csv, 'w') as output:
    subprocess.run(["java", "-jar", home + "/Dropbox/PNNL/PredPheno/SystemModeling/tools/cluster_one-1.2.jar", "-f", "edge_list", "-F", "csv", input_file], stdout=output)

Loaded graph with 679 nodes and 4534 edges
Detected 129 complexes
Loaded graph with 679 nodes and 4534 edges
Detected 129 complexes


In [7]:
sys.path.append("./data/interactome")
import utils3 as util3

In [8]:
import GoldStandard as GS
# Evaluating predicted clusters
pred_clusters = GS.Clusters(False)
pred_clusters.read_file(output_file)


Average size of predicted complexes is: 8.410852713178295


In [29]:
network_df = pd.read_csv(input_file, sep='\t', header=None)
network_df.shape

(4534, 3)

In [30]:
network_df.columns = ["ProtA", "ProtB", "Score"]

In [36]:
edge_idx = []
clusters = pred_clusters
for complex in clusters.complexes:
    prots = list(clusters.complexes[complex])
    edge_idx += [i for i in range(network_df.shape[0]) if network_df.iloc[i,0] in prots and network_df.iloc[i,1] in prots]
edge_idx = list(set(edge_idx))
complexes_df = network_df.iloc[edge_idx, :].copy()

In [37]:
complexes_df

Unnamed: 0,ProtA,ProtB,Score
3,O05161,Q31LM9,0.639
7,O05161,Q935Z3,0.675
11,O06865,P43087,0.811
13,O06865,Q31KE9,0.716
19,O06865,Q31N85,0.693
...,...,...,...
4528,Q8GMR4,Q8GMR7,0.751
4529,Q8GMR7,Q99QJ5,0.598
4530,Q8GMT0,Q935Z3,0.849
4531,Q8KPQ0,Q9L4P3,0.675


In [10]:
network_df["network"] = network_df[0].astype(str) + '\t' + network_df[1].astype(str) + '\t' + network_df[2].astype(str)

In [11]:
network = network_df["network"].to_list()

In [12]:
clust_json, clust_edges, clust_nodes = util3.clusters_to_json(pred_clusters, network)

In [13]:
clust_js = util3.json_to_cy_js("clust_cy", clust_json)
# clust_cy_div = """<div id='clust_cy' style="width: 100%; height: 500px; background: #f0f0f0;"></div>"""
# clust_js

In [18]:
from IPython.display import HTML, display, Javascript
display(HTML(clust_cy_div))
Javascript(clust_js)

<IPython.core.display.Javascript object>

In [21]:
import ipycytoscape
import ipywidgets as wigets
cytoscapeobj = ipycytoscape.CytoscapeWidget()
import json
net_json = json.loads(clust_json)

In [23]:
net_json

[{'group': 'edges',
  'data': {'source': 'Q31R70_0', 'target': 'Q55357_0', 'score': 0.5}},
 {'group': 'edges',
  'data': {'source': 'Q31R70_0', 'target': 'Q31LU5_0', 'score': 0.666}},
 {'group': 'edges',
  'data': {'source': 'Q31R70_0', 'target': 'Q31RZ4_0', 'score': 0.782}},
 {'group': 'edges',
  'data': {'source': 'Q31R70_0', 'target': 'Q31NZ1_0', 'score': 0.5}},
 {'group': 'edges',
  'data': {'source': 'Q31R70_0', 'target': 'Q31RZ5_0', 'score': 0.5}},
 {'group': 'edges',
  'data': {'source': 'Q31R70_0', 'target': 'Q31PC1_0', 'score': 0.5}},
 {'group': 'edges',
  'data': {'source': 'Q55357_0', 'target': 'Q31LU5_0', 'score': 0.5}},
 {'group': 'edges',
  'data': {'source': 'Q55357_0', 'target': 'Q31RZ4_0', 'score': 0.5}},
 {'group': 'edges',
  'data': {'source': 'Q55357_0', 'target': 'Q31NZ1_0', 'score': 0.5}},
 {'group': 'edges',
  'data': {'source': 'Q55357_0', 'target': 'Q31RZ5_0', 'score': 0.758}},
 {'group': 'edges',
  'data': {'source': 'Q55357_0', 'target': 'Q31PC1_0', 'score': 

In [24]:
cytoscapeobj.graph.add_graph_from_json(net_json)

TypeError: list indices must be integers or slices, not str

In [28]:
net_json['nodes']

TypeError: list indices must be integers or slices, not str