# Figure 5. analysis

This notebook contains the network and node sampling procedures used to generate inputs to the integration methods evaluated in Fig. 5.

In [1]:
import random
from functools import reduce
from pathlib import Path

import numpy as np
import pandas as pd
import networkx as nx

Network sampling is straightforward: given a set of yeast co-expression networks, we randomly sampled sets of these networks for integration. All methods are provided with the same set of sampled networks to ensure differences in integration performance between methods is not influenced by the sampling procedure.

In [2]:
in_path = Path("../data/methods")

# import coex network names
coex_names = pd.read_csv(in_path / "yeast-coex-network-names.txt", sep="\n", header=None)
coex_names = list(coex_names[0])

# sampling networks for increasing sample sizes
final_samples = {}
for sample_size in [2, 3, 7, 15, 29]:

    sampled_nets = {}

    # perform sampling for 10 trials
    for trial in range(10):
        sampled = random.sample(coex_names, sample_size)  # samples without replacement
        sampled_nets[trial] = sampled

    final_samples[sample_size] = sampled_nets

# show samples from trial 5 of sample size 7 (as an example)
final_samples[7][5]

['Kaplan T 2008',
 'Mendes-Ferreira A 2007, 2010',
 'Carter GW 2007',
 'Guan Q 2006',
 'Aragon AD 2006 (rep 1)',
 'Knijnenburg TA 2009',
 'Hu Z 2007']

Node sampling is slightly more involved: here we sample a set of nodes across the four input networks, and return the subgraph induced on those nodes. Similar to network sampling, all integration methods are provided with the same set of subsampled networks.

In [3]:
# import the human PPI networks
names = ["Huttlin-2015.txt", "Huttlin-2017.txt", "Hein-2015.txt", "Rolland-2014.txt"]
nets = [nx.read_weighted_edgelist(in_path / name) for name in names]

# add self loops
for net in nets:
    net.add_edges_from([(node, node) for node in net.nodes()])

# get all nodes present in the networks
node_union = list(reduce(np.union1d, [list(net.nodes()) for net in nets]))
print(f"{len(node_union)} nodes in union of networks")

# subsample networks for increasing node sizes
final_nets = {}
for n_nodes in [2000, 4000, 6000, 8000, 10000]:

    subsampled_nets = {}

    # perform sampling for 10 trials
    for trial in range(10):

        sampled = []

        # randomly sample nodes from `node_union`
        node_sample = random.sample(node_union, n_nodes)  # without replacement

        for net in nets:
            common_nodes = np.intersect1d(node_sample, list(net.nodes()))
            subsampled_net = net.subgraph(common_nodes)
            sampled.append(subsampled_net)
        
        subsampled_nets[trial] = sampled
    final_nets[n_nodes] = subsampled_nets

# show samples from trial 5 of node sample size 6000 (for example)
for net in final_nets[6000][5]:
    print(nx.info(net), "\n")

# each net contains less than 6000 nodes, but the union of nodes across these nets is 6000
print("Total nodes:", len(reduce(np.union1d, [list(net.nodes()) for net in final_nets[6000][5]])))

12938 nodes in union of networks
Name: 
Type: Graph
Number of nodes: 3581
Number of edges: 8934
Average degree:   4.9897 

Name: 
Type: Graph
Number of nodes: 5083
Number of edges: 17163
Average degree:   6.7531 

Name: 
Type: Graph
Number of nodes: 2487
Number of edges: 8041
Average degree:   6.4664 

Name: 
Type: Graph
Number of nodes: 1976
Number of edges: 4704
Average degree:   4.7611 

Total nodes: 6000
