<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"></ul></div>

# Setup

## Library import
We import all the required Python libraries

The non-default libries are networkX (https://networkx.org/) and py4cytoscape (https://py4cytoscape.readthedocs.io/, only necessary if you wish to view the results in Cytoscape). 

You will also need the skm-tools package provided in the same repository as this notebook. 

In [1]:
import sys
from pathlib import Path
from datetime import datetime
import networkx as nx

In [2]:
from collections import defaultdict

The following allows us to import functions from the skm-tools package. 
Note the relative path to the folder containing the 
"skm_tools" directory. 


In [3]:
sys.path.append("../")
from skm_tools import load_networks, pss_utils

In [4]:
today = datetime.today().strftime('%Y.%m.%d'); today

'2024.07.30'

## Path and parameter definitions

In [5]:
base_dir = Path("./")
data_dir = base_dir / "data"
output_dir = base_dir / "output"

## Load PSS

To obtain the exact results of the article, download PSS-v1.0.0 from [skm.nib.si/downloads](https://skm.nib.si/downloads), and adjust the below paths accordingly. 
Otherwise, this code will use the latest live PSS instance. 


In [6]:
pss_edge_path = data_dir / f"rxn-edges-public-{today}.tsv"
pss_node_path = data_dir / f"rxn-nodes-public-{today}.tsv"

In [7]:
g = load_networks.pss_dinar_to_networkx(
    edge_path=pss_edge_path, 
    node_path=pss_node_path
)

print(f"\nNumber of nodes: {g.number_of_nodes()}\nNumber of edges: {g.number_of_edges()}")


Number of nodes: 895
Number of edges: 3236


## Remove deadend complexes

"deadend" complexes are those created automatically if a binding/oligomerisation is entered into PSS. However, they only make sense to include here if the complex has some downstream interactions it takes part in, since this projection already includes the substrate-substrate interaction. 

The DiNAR projection does not include complexes if the binding/oligomerisation reaction if formulated as inhibition of the substrate(s) (and not activation of the complex). This removes the majority of the problematic complexes. But since (1) the inhibition or activation annotation of the reaction is not always included in the reaction formulation correctlyand (2) somebody maybe hasn't yet added the downstream interactions, let's check for deadend complexes and remove them anyway. 

In [8]:
complexes = [n for n, data in g.nodes(data=True) if data["node_type"] == "Complex"]; complexes

['CML|HC-Pro',
 'EDS1|PAD4',
 'EDS1|MPK3|PAD4',
 'CRT|ETR',
 'HSP90|RAR1|SGT1',
 'EDS1|PAD4|SAG',
 'TORC1',
 'NDR1|RIN4',
 'ribosome',
 'NPR1|TGA2,5,6',
 'RANGAP|Rx',
 'GPAphid2|RANGAP',
 'OBE1|WRKY17',
 'OBE1|WRKY11',
 'CO|OBE1',
 'GST|SA',
 'BAK1|FLS2|flg22',
 'MKS1|WRKY33',
 'GA|GID1',
 'WRKY30|WRKY53',
 'SCL14|TGA2,5,6',
 'SCF',
 'GID|SCF',
 'D14|MAX2|SCF',
 'R-gene|potyvirus',
 'EIN3(like)|JAZ',
 'COI1|JA-Ile|SCF',
 'NPR1|NPR1',
 'EBF|SCF',
 'DELLA|GA|GID1',
 'RISC - vsiRNA',
 'CTR|ETR',
 'ETP|SCF',
 'CML|Ca2+',
 'WD/bHLH/MYB',
 'BAK1|BRI1',
 'SARD1|TCP8',
 'TCP8|WRKY28',
 'NAC019|TCP8',
 'NPR1|WRKY18',
 'CDK|NPR1',
 'CDK|WRKY18',
 'CDK|WRKY6',
 'CDK|TGA2,5,6',
 'PEPR1|PEP1',
 'BIK1|PEPR1|PEP1',
 'ET|ETR',
 'RISC - miR390',
 'GAI|GI',
 'GI|PIF4',
 'EIN3|NPR1',
 'CDK|RAP2-6',
 'RAP2-6|SNRK2']

In [9]:
# g is a directed graph, so we can just use "neighbors" to find complexes without out/downstream edges
to_remove = []
for c in complexes:
    if len(list(g.neighbors(c))) == 0:
        print(c)
        to_remove.append(c)

CML|HC-Pro
EDS1|MPK3|PAD4
CRT|ETR
HSP90|RAR1|SGT1
NDR1|RIN4
RANGAP|Rx
GPAphid2|RANGAP
OBE1|WRKY17
OBE1|WRKY11
CO|OBE1
GST|SA
MKS1|WRKY33
WRKY30|WRKY53
R-gene|potyvirus
EIN3(like)|JAZ
DELLA|GA|GID1
SARD1|TCP8
TCP8|WRKY28
NAC019|TCP8
NPR1|WRKY18
CDK|NPR1
CDK|WRKY18
CDK|WRKY6
CDK|TGA2,5,6
BIK1|PEPR1|PEP1
ET|ETR
RISC - miR390
GAI|GI
GI|PIF4
CDK|RAP2-6
RAP2-6|SNRK2


In [10]:
g.remove_nodes_from(to_remove)

## Remove "uninteresting" metabolites, and connect all their neighbours

In [11]:
metabolites = [n for n, data in g.nodes(data=True) if data["node_type"] == "Metabolite"]; metabolites

['CA',
 'CA-CoA',
 'Ca2+',
 'IsoChor-9-Glu',
 'N-pyruvoyl-L-Glu',
 'SA',
 'GA',
 'DMAPP',
 'tRNA-adenine',
 'prenyl-tRNA',
 'cZ-riboside',
 'cZ-ribotide',
 'cZ',
 'UDP-Glc',
 'cZ-glucosides',
 'DZ-riboside',
 'DZ-ribotide',
 'tZ',
 'DZ',
 'iP',
 'L-Met',
 'SAMe',
 'ACC',
 'ET',
 'Thr',
 'Ile',
 'ALA',
 '13-HPOT',
 '12,13-EOT',
 'OPDA',
 'OPC8',
 'OPC8-CoA',
 'OPC6-CoA',
 'OPC4-CoA',
 'JA-CoA',
 'JA',
 'MeJA',
 'JA-Ile',
 '12-OH-JA-Ile',
 'MeSA',
 'Chor',
 'IsoChor',
 'Prep',
 'Phe',
 'BA',
 'SAG',
 'ROS',
 'O2',
 'e-',
 'Geranylgeranyl-PP',
 'ent-Copalyl-PP',
 'ent-Kaurene',
 'ent-Kaurenoic acid',
 'GA12',
 'GA9',
 'GA20',
 'GA8',
 'GA34',
 'GA-methylester',
 '&beta;-Carotene',
 '9-cis-&beta;-carotene',
 'CL',
 'CLA',
 'HMBDP',
 'iP-ribotide',
 'phospholipids',
 'DZ-glucosides',
 'tZ-riboside',
 'tZ-ribotide',
 'tZ-glucoside',
 'p-Coumaric acid',
 'iP-riboside',
 'iP-glucosides',
 'Cu2+',
 'Violaxanthin',
 '9-cis-Violaxanthin',
 'Abscisic aldehyde',
 'ABA',
 'Xanthoxin',
 'Antheraxanth

In [12]:
# List here the metabolites to keep as is
metabolites_to_keep = [
    'Ca2+',
    'SA',
    'GA',
    'cZ',
    'ET',
    'JA',
    'JA-Ile',
    'ROS',
    'O2',
    'Cu2+',
    'ABA',
    'SL',
    'IAA',
]


In [13]:
reaction_types_to_collapse = [
    "catalysis", 
    "translocation"
]

In [14]:
for metabolite in metabolites:
    if not metabolite in metabolites_to_keep:
        
        # find all neighbours that are: (not Metabolite) or (in metabolites_to_keep)
        targets = []
        edges = {}
        sources = []
        
        for target in g.successors(metabolite):
            e = g[metabolite][target][0]
            
            # right type of reaction
            if e['reaction_type'] in reaction_types_to_collapse:

                # right type of target node
                target_type = g.nodes()[target]['node_type']
                if target_type == "Metabolite":
                    if not (target in metabolites_to_keep):
                        continue

                edges[target] = {'interaction_type': 'activation', 
                                  'directed': 'True', 
                                  'reaction_type': 'catalysis', 
                                  'reaction_effect': 'activation', 
                                  'reaction_id': e['reaction_id'], 
                                  'source_edge_type': 'PRODUCT/SUBSTRATE', 
                                  'target_edge_type': 'PRODUCT/SUBSTRATE', 
                                  'note': f'metabolite collapse {metabolite}'
                                }
            
                targets.append(g.nodes()[target])

        for source in g.predecessors(metabolite):
            e = g[source][metabolite][0]
            
            # right type of reaction
            if e['reaction_type'] in reaction_types_to_collapse:

                # right type of source node
                source_type = g.nodes()[source]['node_type']
                if source_type == "Metabolite":
                    if not (source in metabolites_to_keep):
                        continue
            
                sources.append(g.nodes()[source])

        for source in sources:
            for target in targets:
                if source != target:
                    g.add_edge(source['name'], target['name'], **edges[target['name']])
        
        g.remove_node(metabolite)

In [15]:
print(f"\nNumber of nodes: {g.number_of_nodes()}\nNumber of edges: {g.number_of_edges()}")


Number of nodes: 749
Number of edges: 3546


In [16]:
# Exercise - remove self loops and duplicated edges?

# Cytoscape 

First open the Cytoscape application. Then the following cell will load the required library and and make sure you can connect to the Cytoscape application. 

More py4cytoscape documentation is here: https://py4cytoscape.readthedocs.io/

In [17]:
from skm_tools import cytoscape_utils

In [18]:
import py4cytoscape as p4c
p4c.cytoscape_ping()
p4c.cytoscape_version_info()

You are connected to Cytoscape!


{'apiVersion': 'v1',
 'cytoscapeVersion': '3.10.2',
 'automationAPIVersion': '1.9.0',
 'py4cytoscapeVersion': '1.9.0'}

We set the Cytoscape collection name for this notebook. 

In [19]:
COLLECTION = f"PSS DiNAR projection ({today})"
COLLECTION

'PSS DiNAR projection (2024.07.30)'

## Load the PSS network into Cytoscape

We load the network, set a visual style, and apply the CoSE layout.

With skm-tools, we provide a default style for PSS, colouring the nodes by pathway.

Returned is the ID of the network view in Cytoscape.

In [20]:
pss_network_suid = p4c.networks.create_network_from_networkx(g, title="Complete PSS", collection=COLLECTION)

Applying default style...
Applying preferred layout


In [21]:
# Apply a style
cytoscape_utils.apply_builtin_style(pss_network_suid, 'pss')
pss_network_suid

Applied PSS-default to 26914


26914

# Example path extraction

Before running this section, depending on the problem at hand, you may want to remove nodes such as virus related, of perhaps abiotic stress. 


First we identify the nodes of interest.

In [22]:
JA = [x for x,y in g.nodes(data=True) if y['name']=="JA"][0]
SA = [x for x,y in g.nodes(data=True) if y['name']=="SA"][0];
ROS = [x for x,y in g.nodes(data=True) if y['name']=="ROS"][0]

print(JA)
print(SA)
print(ROS)

JA
SA
ROS


## JA --> SA + JA --> ROS

In [23]:
JA_paths = [p for p in nx.all_shortest_paths(g, source=JA, target=SA)]
JA_paths += [p for p in nx.all_shortest_paths(g, source=JA, target=ROS)]
JA_paths

[['JA',
  'AT4G03400',
  'JA-Ile',
  'AT2G39940',
  'AT1G67090',
  'VPg',
  'AT1G02920',
  'SA'],
 ['JA',
  'AT2G46370',
  'JA-Ile',
  'AT2G39940',
  'AT1G67090',
  'VPg',
  'AT1G02920',
  'SA'],
 ['JA',
  'AT4G03400',
  'JA-Ile',
  'AT2G39940',
  'AT5G38430',
  'VPg',
  'AT1G02920',
  'SA'],
 ['JA',
  'AT2G46370',
  'JA-Ile',
  'AT2G39940',
  'AT5G38430',
  'VPg',
  'AT1G02920',
  'SA'],
 ['JA',
  'AT4G03400',
  'JA-Ile',
  'AT2G39940',
  'ATCG00490',
  'VPg',
  'AT1G02920',
  'SA'],
 ['JA',
  'AT2G46370',
  'JA-Ile',
  'AT2G39940',
  'ATCG00490',
  'VPg',
  'AT1G02920',
  'SA'],
 ['JA',
  'AT4G03400',
  'JA-Ile',
  'AT2G39940',
  'AT1G67090',
  'VPg',
  'AT1G02930',
  'SA'],
 ['JA',
  'AT2G46370',
  'JA-Ile',
  'AT2G39940',
  'AT1G67090',
  'VPg',
  'AT1G02930',
  'SA'],
 ['JA',
  'AT4G03400',
  'JA-Ile',
  'AT2G39940',
  'AT5G38430',
  'VPg',
  'AT1G02930',
  'SA'],
 ['JA',
  'AT2G46370',
  'JA-Ile',
  'AT2G39940',
  'AT5G38430',
  'VPg',
  'AT1G02930',
  'SA'],
 ['JA',
  'AT4G03400

## SA --> JA + SA --> ROS

In [24]:
SA_paths = [p for p in nx.all_shortest_paths(g, source=SA, target=JA)]
SA_paths += [p for p in nx.all_shortest_paths(g, source=SA, target=ROS)]
SA_paths

[['SA', 'AT1G20620', 'AT1G06290', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA'],
 ['SA', 'AT1G20630', 'AT1G06290', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA'],
 ['SA', 'AT4G35090', 'AT1G06290', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA'],
 ['SA', 'AT1G20620', 'AT2G35690', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA'],
 ['SA', 'AT1G20630', 'AT2G35690', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA'],
 ['SA', 'AT4G35090', 'AT2G35690', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA'],
 ['SA', 'AT1G20620', 'AT4G16760', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA'],
 ['SA', 'AT1G20630', 'AT4G16760', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA'],
 ['SA', 'AT4G35090', 'AT4G16760', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA'],
 ['SA', 'AT1G20620', 'AT5G65110', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA'],
 ['SA', 'AT1G20630', 'AT5G65110', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA'],
 ['SA', 'AT4G35090', 'AT5G65110', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA'],
 ['SA', 'AT1G20620', 'AT1G06290', 'AT3G1

## ROS --> JA + ROS --> SA

In [25]:
ROS_paths = [p for p in nx.all_shortest_paths(g, source=ROS, target=JA)]
ROS_paths += [p for p in nx.all_shortest_paths(g, source=ROS, target=SA)]
ROS_paths

[['ROS',
  'AT1G02920',
  'SA',
  'AT1G20620',
  'AT1G06290',
  'AT3G06860',
  'AT1G04710',
  'AT2G30720',
  'JA'],
 ['ROS',
  'AT1G02930',
  'SA',
  'AT1G20620',
  'AT1G06290',
  'AT3G06860',
  'AT1G04710',
  'AT2G30720',
  'JA'],
 ['ROS',
  'AT1G17170',
  'SA',
  'AT1G20620',
  'AT1G06290',
  'AT3G06860',
  'AT1G04710',
  'AT2G30720',
  'JA'],
 ['ROS',
  'AT2G02930',
  'SA',
  'AT1G20620',
  'AT1G06290',
  'AT3G06860',
  'AT1G04710',
  'AT2G30720',
  'JA'],
 ['ROS',
  'AT2G47730',
  'SA',
  'AT1G20620',
  'AT1G06290',
  'AT3G06860',
  'AT1G04710',
  'AT2G30720',
  'JA'],
 ['ROS',
  'AT4G02520',
  'SA',
  'AT1G20620',
  'AT1G06290',
  'AT3G06860',
  'AT1G04710',
  'AT2G30720',
  'JA'],
 ['ROS',
  'AT1G02920',
  'SA',
  'AT1G20630',
  'AT1G06290',
  'AT3G06860',
  'AT1G04710',
  'AT2G30720',
  'JA'],
 ['ROS',
  'AT1G02930',
  'SA',
  'AT1G20630',
  'AT1G06290',
  'AT3G06860',
  'AT1G04710',
  'AT2G30720',
  'JA'],
 ['ROS',
  'AT1G17170',
  'SA',
  'AT1G20630',
  'AT1G06290',
  'AT3G068

# Visualise paths in Cytoscape

Now we're going to highlight the paths we identified in the network by applying style bypasses.



We set the colours here:

In [26]:
JA_COLOUR = "#66a61e"
SA_COLOUR = "#34858d"
ROS_COLOUR = "#dc1c1c"

We don't want to recolour already highlighted path elements, so we keep track of them here:

In [27]:
done_nodes, done_edges = [], []

In [28]:
for p in ROS_paths:
    done_nodes_now, done_edges_now = cytoscape_utils.highlight_path(p, ROS_COLOUR, skip_nodes=done_nodes, skip_edges=done_edges)
    done_nodes += done_nodes_now
    done_edges += done_edges_now
# Note - need to improve edge colouring

ROS AT1G02920
AT1G02920 SA
SA AT1G20620
AT1G20620 AT1G06290
AT1G06290 AT3G06860
AT3G06860 AT1G04710
AT1G04710 AT2G30720
AT2G30720 JA
ROS AT1G02930
AT1G02930 SA
SA AT1G20620
AT1G20620 AT1G06290
AT1G06290 AT3G06860
AT3G06860 AT1G04710
AT1G04710 AT2G30720
AT2G30720 JA
ROS AT1G17170
AT1G17170 SA
SA AT1G20620
AT1G20620 AT1G06290
AT1G06290 AT3G06860
AT3G06860 AT1G04710
AT1G04710 AT2G30720
AT2G30720 JA
ROS AT2G02930
AT2G02930 SA
SA AT1G20620
AT1G20620 AT1G06290
AT1G06290 AT3G06860
AT3G06860 AT1G04710
AT1G04710 AT2G30720
AT2G30720 JA
ROS AT2G47730
AT2G47730 SA
SA AT1G20620
AT1G20620 AT1G06290
AT1G06290 AT3G06860
AT3G06860 AT1G04710
AT1G04710 AT2G30720
AT2G30720 JA
ROS AT4G02520
AT4G02520 SA
SA AT1G20620
AT1G20620 AT1G06290
AT1G06290 AT3G06860
AT3G06860 AT1G04710
AT1G04710 AT2G30720
AT2G30720 JA
ROS AT1G02920
AT1G02920 SA
SA AT1G20630
AT1G20630 AT1G06290
AT1G06290 AT3G06860
AT3G06860 AT1G04710
AT1G04710 AT2G30720
AT2G30720 JA
No more nodes to colour
No more nodes to colour
No more nodes to colo

In [29]:
for p in JA_paths:
    done_nodes_now, done_edges_now = cytoscape_utils.highlight_path(p, JA_COLOUR, skip_nodes=done_nodes, skip_edges=done_edges)
    done_nodes += done_nodes_now
    done_edges += done_edges_now

JA AT4G03400
AT4G03400 JA-Ile
JA-Ile AT2G39940
AT2G39940 AT1G67090
AT1G67090 VPg
VPg AT1G02920
AT1G02920 SA
JA AT2G46370
AT2G46370 JA-Ile
JA-Ile AT2G39940
AT2G39940 AT1G67090
AT1G67090 VPg
VPg AT1G02920
AT1G02920 SA
JA AT4G03400
AT4G03400 JA-Ile
JA-Ile AT2G39940
AT2G39940 AT5G38430
AT5G38430 VPg
VPg AT1G02920
AT1G02920 SA
No more nodes to colour
JA AT4G03400
AT4G03400 JA-Ile
JA-Ile AT2G39940
AT2G39940 ATCG00490
ATCG00490 VPg
VPg AT1G02920
AT1G02920 SA
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to colour
No more nodes to

In [30]:
for p in SA_paths:
    print(p)
    done_nodes_now, done_edges_now = cytoscape_utils.highlight_path(p, SA_COLOUR, skip_nodes=done_nodes, skip_edges=done_edges)
    done_nodes += done_nodes_now
    done_edges += done_edges_now


['SA', 'AT1G20620', 'AT1G06290', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA']
No more nodes to colour
['SA', 'AT1G20630', 'AT1G06290', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA']
No more nodes to colour
['SA', 'AT4G35090', 'AT1G06290', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA']
No more nodes to colour
['SA', 'AT1G20620', 'AT2G35690', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA']
No more nodes to colour
['SA', 'AT1G20630', 'AT2G35690', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA']
No more nodes to colour
['SA', 'AT4G35090', 'AT2G35690', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA']
No more nodes to colour
['SA', 'AT1G20620', 'AT4G16760', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA']
No more nodes to colour
['SA', 'AT1G20630', 'AT4G16760', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA']
No more nodes to colour
['SA', 'AT4G35090', 'AT4G16760', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA']
No more nodes to colour
['SA', 'AT1G20620', 'AT5G65110', 'AT3G06860', 'AT1G04710', 'AT2G30720', 'JA']
No m

At this point, the Cytoscape session has a network view of the filtered PSS, and highlighting of the paths we extracted from our targeted searches. 

## Extract subnetworks in Cytoscape

Properly inspecting the identified paths is a bit hard within the complete network, so here we pull out the subnetworks of and surrounding the paths. 


### The edge induced network
The first, and smallest subnetwork, is created by extracting only the edges that are present on the paths. 

In [31]:
network_edge_induced_suid = cytoscape_utils.subnetwork_edge_induced_from_paths(
    paths=JA_paths + SA_paths + ROS_paths,
    g=g,
    parent_suid=pss_network_suid,
    name="identified paths (edge induced)",
)

We apply a new layout to this subnetwork

In [32]:
_ = p4c.layouts.layout_network('cose', network=network_edge_induced_suid)

### The node induced network

Now we extract the network based on the nodes along the paths, meaning any edges between those nodes that are not on the paths are also extracted. 

In [33]:
nodes = list(set([y for x in JA_paths + SA_paths + ROS_paths for y in x]))

In [34]:
network_node_induced_suid = cytoscape_utils.subnetwork_node_induced(
    nodes=nodes,
    parent_suid=pss_network_suid,
    name="identified paths (node induced)",
)

Instead of applying a network layout algorithm, we can copy the layout from the previous subnetwork. 

In [35]:
_ = p4c.layouts.layout_copycat(
    network_edge_induced_suid, 
    network_node_induced_suid
)

### Neighbours

For more context around our paths, we can include the first neighbours in the view. We can use the Cytoscape first neighbour selection functionality. 

In [36]:
network_neighbours_suid = cytoscape_utils.subnetwork_neighbours(
    nodes=nodes,
    parent_suid=pss_network_suid,
    name="identified paths + 1st neighbours",
)

In [37]:
_ = p4c.layouts.layout_network('cose', network=network_neighbours_suid)

### Additional filtering of the neighbours

There are many neighbours displayed now, and we are perhaps only interested in the ones that are connected to at least two of the original path nodes, so we can make a filter using networkX neighbour functions. 

In [38]:
filtered_neighbours = []
for n in g.nodes():
    if (len([x for x in nx.MultiGraph(g).neighbors(n) if (x in done_nodes)]) > 1) and (n not in done_nodes):
        filtered_neighbours.append(n)

In [39]:
network_neighbours_filtered_suid = cytoscape_utils.subnetwork_node_induced(
    nodes=nodes+filtered_neighbours,
    parent_suid=pss_network_suid,
    name="identified paths + 1st neighbours (filtered)",
)

In [40]:
p4c.layouts.layout_copycat(
    network_neighbours_suid, 
    network_neighbours_filtered_suid
)

{'mappedNodeCount': 52, 'unmappedNodeCount': 0}

## Saving the session

Save the Cytoscape session:

In [41]:
p4c.session.save_session(str(output_dir / f"PSS-DiNAR-JA-SA-ROS-{today}.cys"))

This file has been overwritten.


{}

## Exporting graph to file

NetworkX does not have  the greatest export formats, so  making a Pandas dataframe and saving that seems the best. 

In [42]:
df = nx.to_pandas_edgelist(g)
df.head()

Unnamed: 0,source,target,effect,interaction_type,reaction_id,reaction_type,note,target_edge_type,reaction_effect,source_edge_type,directed
0,VPg,AT2G34430,inhibition,,rx00669,binding/oligomerisation,,SUBSTRATE,,SUBSTRATE,False
1,VPg,AT1G71500,inhibition,,rx00666,binding/oligomerisation,,SUBSTRATE,,SUBSTRATE,False
2,VPg,AT1G47710,inhibition,,rx00664,binding/oligomerisation,,SUBSTRATE,,SUBSTRATE,False
3,VPg,ATCG00670,inhibition,,rx00665,binding/oligomerisation,,SUBSTRATE,,SUBSTRATE,False
4,VPg,AT5G54500,inhibition,,rx00668,binding/oligomerisation,,SUBSTRATE,,SUBSTRATE,False


In [43]:
df.to_csv(output_dir / f"pss-dinar-refined-edgelist-{today}.tsv", sep="\t", index=None)

### Exporting the subnetwork

In [44]:
g_subgraph_node_induced = nx.induced_subgraph(g, nodes)
print(f"\nNumber of nodes: {g_subgraph_node_induced.number_of_nodes()}\nNumber of edges: {g_subgraph_node_induced.number_of_edges()}")


Number of nodes: 32
Number of edges: 95


In [45]:
nx.to_pandas_edgelist(g_subgraph_node_induced)\
    .to_csv(output_dir / f"pss-dinar-subgraph-refined-edgelist-{today}.tsv", sep="\t", index=None)

## Exporting the Cytoscape networks to PDF

In [46]:
collection_suid = p4c.get_collection_suid(network_edge_induced_suid)

In [47]:
from skm_tools import cytoscape_pdf_utils

In [48]:
cytoscape_pdf_utils.export_collection_to_pdfs(collection_suid, output_dir / "figures")

26914 Complete PSS
48481 identified paths (edge induced)
49006 identified paths (node induced)
49498 identified paths + 1st neighbours
63896 identified paths + 1st neighbours (filtered)
Collection saved to output/figures


In [49]:
cytoscape_pdf_utils.export_collection_to_single_pdf(collection_suid, output_dir / "figures" / "single_pdf", caption=True)

26914 Complete PSS
48481 identified paths (edge induced)
49006 identified paths (node induced)
49498 identified paths + 1st neighbours
63896 identified paths + 1st neighbours (filtered)
Collection save to output/figures/single_pdf


In [50]:
# END