# **Bowtie static Analysis**

## 0 - Previous requirements

### Install requirements

In [1]:
# Olivia Finder requirements
%pip install networkx pandas matplotlib scipy intbitset

Collecting intbitset
  Downloading intbitset-3.0.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (286 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m286.3/286.3 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: intbitset
Successfully installed intbitset-3.0.2
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [None]:
from time import time
import os
from typing import Any
import psutil
import pandas as pd
import networkx as nx
from matplotlib import pyplot as plt
import gc

In [3]:
# Append the path to the olivia_finder package
import sys
sys.path.append('/kaggle/input/olivia-finder-repo/olivia/')

from olivia.model import OliviaNetwork
from olivia.networkmetrics import attack_vulnerability, failure_vulnerability

## Bowtie Structure

This function takes a network as input and returns a node set decomposition based on the tie-loop structure of the input network. The function uses the algorithm described in the article "Bow-tie decomposition in directed graphs" by R. Yang, L. Zhuhadar and O. Nasraoui.

The function starts by finding the largest strongly connected component of the input network using the `max` function of the `networkx` library. Next, an arbitrary node of the largest strongly connected component is selected and the forward and backward reachable nodes from the largest strongly connected component are computed using the `dfs_tree` function of the `networkx` library.

Next, the input and output components of the tie-loop structure are computed using the forward and backward reachable nodes. Nodes that are forward reachable but not backward make up the output component, while nodes that are backward reachable but not forward make up the input component.

After calculating the input and output components, the "tendril", "tube" and "disconnected" components are calculated. Nodes that are reachable forward and backward form a "pipe", nodes that are reachable backward but not forward form an input "tendril", nodes that are reachable forward but not backward form a "tendril". " output, and nodes that are neither forward nor backward reachable are considered disconnected.

Finally, the function returns the node sets for each component of the tie-loop structure: the largest strongly connected component, the input and output components, the "tendril", "tube", and "disconnected" components.

In short, this function is used to decompose a network into its bow-tie structure components using the algorithm described in the article "Bow-tie decomposition in directed graphs". The function returns the node sets for each component of the tie-loop structure, which can be useful for analyzing the network structure and understanding its behavior.

In [5]:
def bowtie_structure(network):
    """ 
    Return node set decomposition according to the bowtie structure of the input network.
    Algorithm from 
    R. Yang, L. Zhuhadar and O. Nasraoui, "Bow-tie decomposition in directed graphs",2011
    """
    
    largest_scc = max(nx.strongly_connected_components(network), key=len)
    
    # Arbitrary node from the largest SCC
    node = next(iter(largest_scc))
    
    # Reachable nodes (forward) from the largest SCC
    dfs = set(nx.dfs_tree(network,node).nodes())
    
    # Reachable nodes (backwards) from the largest SCC
    reversed_network = nx.reverse(network, copy=True)
    dfs_t = set(nx.dfs_tree(reversed_network,node).nodes())
    
    out_component = dfs - largest_scc
    in_component = dfs_t - largest_scc
    
    # Tendrils, tubes and disconnected components
    rest = set(network.nodes()) -  largest_scc - out_component - in_component

    tubes, in_tendrils, out_tendrils, disconnected  = set(), set(), set(), set()

    for v in rest:
        # in_component nodes backwards reachable from v
        irv = in_component & set(nx.dfs_tree(reversed_network, v).nodes())
        # out_component nodes reachable from v
        vro = out_component & set(nx.dfs_tree(network, v).nodes())
        
        if irv and vro:
            tubes.add(v)
        elif irv and not vro:
            in_tendrils.add(v)
        elif not irv and vro:
            out_tendrils.add(v)
        else:
            disconnected.add(v)
            
    return  largest_scc, in_component, out_component, tubes, in_tendrils, out_tendrils, disconnected

In [6]:
def add_chunk(
    df, G, dependent_field, dependency_field,
    filter_field=None,
    filter_value=None
):
    """ Utility method for build_dependency_network"""

    filtered = df[df[filter_field] == filter_value] if filter_field else df
    links = list(zip(filtered[dependency_field], filtered[dependent_field]))
    G.add_edges_from(links)
    return G

def build_dependency_network(
    input_file,
    output_file,
    chunk_size,
    dependent_field: str = 'Project Name',
    dependency_field: str = 'Dependency Name',
    filter_field = None,
    filter_value = None,
    verbose: bool = True
) -> None:

    """
    Builds a dependency network from a file with package dependencies information

    Reads from a CSV file and writes to a txt file with adjacency lists
    corresponding to network model. Compression methods are inferred from file
    extension (.gz and .bz2 are supported from NetworkX IO methods)

    Parameters
    ----------
    input_file : str
        Path to csv file with dependencies information
    output_file : str
        Path to write resulting network file
    chunk_size : int
        Amount of lines to be read at once from input_file in batch  processing.
    dependent_field : str
        DataFrame column Id for the dependent package
    dependency_field : str
        Dataframe column Id for the dependency package
    filter_field : str, optional
        If not None, only add records where filter_field equals filter_value
    filter_value : str, optional
        If not None, only add records where filter_field equals filter_value
    verbose: bool, optional
        If True, processing information is written to standard output.
    Returns
    -------
        None
    """

    # Print only if verbose
    vprint = print if verbose else lambda *a, **k: None
    process = psutil.Process(os.getpid())
    vprint("Using process ", process)
    t = time()
    try:
        vprint(f'Opening "{input_file}"... ', end='')
        # Obtain reader iterator
        reader = pd.read_csv(input_file, chunksize=chunk_size)
        vprint('OK')
        vprint('Initializing graph... ', end='')
        # New NetworkX directed Graph
        G = nx.DiGraph()
        vprint('OK')
        for i, chunk in enumerate(reader):
            # Add dependencies from chunk to G
            add_chunk(
                chunk, 
                G,
                dependent_field=dependent_field,
                dependency_field=dependency_field,
                filter_field=filter_field,
                filter_value=filter_value
            )
            vprint(f'{round(i*chunk_size/1e6,1)}M lines | {len(G)} nodes,{len(G.edges)} deps. ({int(time()-t)}s) {round(process.memory_info().rss/1e6,1)}Mb ')
        vprint('Done reading file')
        vprint(f'Saving network as "{output_file}"... ', end='')
        nx.write_adjlist(G, output_file)
        vprint('OK')
    except Exception as e:
        print('\n', e)

## Analysis

Set up in/out files

In [8]:
'''
CRAN
------------------------------------------------------------------------------------------------------------------------
'''
# Libraries.io all versions
cran_csv_1 = "/kaggle/input/dependency-networks/cran/cran_adjlist_librariesio.csv"
cran_adjlist_1 = "/kaggle/working/cran_adjlist_librariesio.bz2"

# Libraries.io last version filtered (imports and depends)
cran_csv_2 = "/kaggle/input/dependency-networks/cran/cran_adjlist_librariesio_filtered_(imports_depends).csv"
cran_adjlist_2 = "/kaggle/working/cran_adjlist_librariesio_filtered_(imports_depends).bz2"

# Libraries.io last version filtered (imports, depends, suggests, enhances)
cran_csv_3 = "/kaggle/input/dependency-networks/cran/cran_adjlist_librariesio_filtered_(imports_depends_suggests_enhances).csv"
cran_adjlist_3 = "/kaggle/working/cran_adjlist_librariesio_filtered_(imports_depends_suggests_enhances).bz2"

# Scraped (imports and depends)
cran_csv_4 = "/kaggle/input/dependency-networks/cran/cran_adjlist_scraping.csv"
cran_adjlist_4 = "/kaggle/working/cran_adjlist_scraping.bz2"

'''
Bioconductor
------------------------------------------------------------------------------------------------------------------------
'''
# Scraped (imports and depends)
bioconductor_csv = "/kaggle/input/dependency-networks/bioconductor/bioconductor_adjlist_scraping.csv"
bioconductor_adjlist = "/kaggle/working/bioconductor_adjlist_scraping.bz2"

'''
PyPI
------------------------------------------------------------------------------------------------------------------------
'''
# Libraries.io all versions
pypi_csv_1 = "/kaggle/input/dependency-networks/pypi/pypi_adjlist_librariesio.csv"
pypi_adjlist_1 = "/kaggle/working/pypi_adjlist_librariesio.bz2"

# Libraries.io last version filtered
pypi_csv_2 = "/kaggle/input/dependency-networks/pypi/pypi_adjlist_librariesio_filtered.csv"
pypi_adjlist_2 = "/kaggle/working/pypi_adjlist_librariesio_filtered.bz2"

# Scraped PyPI dataset
pypi_csv_3 = "/kaggle/input/dependency-networks/pypi/pypi_adjlist_scraping.csv"
pypi_adjlist_3 = "/kaggle/working/pypi_adjlist_scraping.bz2"

'''
NPM
------------------------------------------------------------------------------------------------------------------------
'''
# Full librariesio dataset 
npm_csv_1 = "/kaggle/input/dependency-networks/npm/librariesio_npm.csv"
npm_adjlist_1 = "/kaggle/working/librariesio_npm.bz2"

# Libraries.io last version filtered
npm_csv_2 = "/kaggle/input/dependency-networks/npm/npm_adjlist_librariesio_filtered.csv"
npm_adjlist_2 = "/kaggle/working/npm_adjlist_librariesio_filtered.bz2"

# Scraped npm dataset (only runtime dependencies)
npm_csv_3 = "/kaggle/input/dependency-networks/npm/npm_adjlist_scraping_runtime.csv"
npm_adjlist_3 = "/kaggle/working/npm_adjlist_scraping_runtime.bz2"

# Scraped npm dataset (all dependencies)
npm_csv_4 = "/kaggle/input/dependency-networks/npm/npm_adjlist_scraping.csv"
npm_adjlist_4 = "/kaggle/working/npm_adjlist_scraping.bz2"

### Bioconductor bowtie

Build network graph

In [9]:
build_dependency_network(
    input_file=bioconductor_csv,
    output_file=bioconductor_adjlist,
    chunk_size=1e4,
    dependent_field='name',
    dependency_field='dependency',
    verbose=True
)
bioconductor_G = nx.read_adjlist(bioconductor_adjlist, create_using=nx.DiGraph)
bioconductor_model = OliviaNetwork()
bioconductor_model.build_model(bioconductor_G)
bioconductor_sccs = bioconductor_model.sorted_clusters()
bioconductor_attack = attack_vulnerability(bioconductor_model, normalize=False)
bioconductor_failure = failure_vulnerability(bioconductor_model, normalize=False)
bioconductor_attack_N = attack_vulnerability(bioconductor_model, normalize=True)
bioconductor_failure_N = failure_vulnerability(bioconductor_model, normalize=True)
del bioconductor_model, bioconductor_adjlist
gc.collect()

Using process  psutil.Process(pid=14, name='python', status='running', started='09:28:04')
Opening "/kaggle/input/dependency-networks/bioconductor/bioconductor_adjlist_scraping.csv"... OK
Initializing graph... OK
0.0M lines | 1783 nodes,9999 deps. (0s) 2778.8Mb 
0.0M lines | 2787 nodes,19994 deps. (0s) 2788.0Mb 
0.0M lines | 3509 nodes,28320 deps. (0s) 2793.7Mb 
Done reading file
Saving network as "/kaggle/working/bioconductor_adjlist_scraping.bz2"... OK
Building Olivia Model
     Finding strongly connected components (SCCs)...
     Building condensation network...
     Adding structural meta-data...
     Done
Computing Reach
     Processing node: 3K      
Reach retrieved from metrics cache
Reach retrieved from metrics cache
Reach retrieved from metrics cache


11758

Calc metrics for scraped network

In [10]:
bio_largest_scc, bio_in_component, bio_out_component, bio_tubes, bio_in_tendrils, bio_out_tendrils, bio_disconnected = bowtie_structure(bioconductor_G)

checks = len(bio_largest_scc)+\
len(bio_in_component)+\
len(bio_out_component)+\
len(bio_tubes)+\
len(bio_in_tendrils)+\
len(bio_out_tendrils)+\
len(bio_disconnected) == len(bioconductor_G.nodes())

if checks:
    print("Sum of all components equals total number of nodes")


bio_first_scc = len(bioconductor_sccs[0]) if len(bioconductor_sccs[0]) > 0 else None
bio_second_scc = len(bioconductor_sccs[1]) if len(bioconductor_sccs[1]) > 0 else None
bioconductor_nodes = len(bioconductor_G.nodes)
bioconductor_edges = len(bioconductor_G.edges)

print(f"Bioconductor first SCC: {bio_first_scc}")
print(f"Bioconductor second SCC: {bio_second_scc}")

print(f"Bioconductor bowtie structure:\n\
      Largest SCC: {len(bio_largest_scc)}\n\
      In component: {len(bio_in_component)}\n\
      Out component: {len(bio_out_component)}\n\
      Tubes: {len(bio_tubes)}\n\
      In tendrils: {len(bio_in_tendrils)}\n\
      Out tendrils: {len(bio_out_tendrils)}\n\
      Disconnected: {len(bio_disconnected)}")

del bioconductor_G
gc.collect()


Sum of all components equals total number of nodes
Bioconductor first SCC: 1
Bioconductor second SCC: 1
Bioconductor bowtie structure:
      Largest SCC: 1
      In component: 124
      Out component: 0
      Tubes: 0
      In tendrils: 2161
      Out tendrils: 0
      Disconnected: 1223


4128

### CRAN bowtie comparation

In [11]:
# Cran libraries.io all versions
build_dependency_network(
    input_file=cran_csv_1,
    output_file=cran_adjlist_1,
    chunk_size=1e5,
    dependent_field='Project Name',
    dependency_field='Dependency Name',
    verbose=True
)
cran_1_G = nx.read_adjlist(cran_adjlist_1, create_using=nx.DiGraph)
cran_1_model = OliviaNetwork()
cran_1_model.build_model(cran_1_G)
cran_1_sccs = cran_1_model.sorted_clusters()
cran_1_attack = attack_vulnerability(cran_1_model, normalize=False)
cran_1_failure = failure_vulnerability(cran_1_model, normalize=False)
cran_1_attack_N = attack_vulnerability(cran_1_model, normalize=True)
cran_1_failure_N = failure_vulnerability(cran_1_model, normalize=True)
del cran_1_model, cran_adjlist_1
gc.collect()

# Cran libraries.io last version filtered (imports and depends)
build_dependency_network(
    input_file=cran_csv_2,
    output_file=cran_adjlist_2,
    chunk_size=1e5,
    dependent_field='Project Name',
    dependency_field='Dependency Name',
    verbose=True
)
cran_2_G = nx.read_adjlist(cran_adjlist_2, create_using=nx.DiGraph)
cran_2_model = OliviaNetwork()
cran_2_model.build_model(cran_2_G)
cran_2_sccs = cran_2_model.sorted_clusters()
cran_2_attack = attack_vulnerability(cran_2_model, normalize=False)
cran_2_failure = failure_vulnerability(cran_2_model, normalize=False)
cran_2_attack_N = attack_vulnerability(cran_2_model, normalize=True)
cran_2_failure_N = failure_vulnerability(cran_2_model, normalize=True)
del cran_2_model, cran_adjlist_2
gc.collect()

# Cran libraries.io last version filtered (imports, depends, suggests, enhances)
build_dependency_network(
    input_file=cran_csv_3,
    output_file=cran_adjlist_3,
    chunk_size=1e5,
    dependent_field='Project Name',
    dependency_field='Dependency Name',
    verbose=True
)
cran_3_G = nx.read_adjlist(cran_adjlist_3, create_using=nx.DiGraph)
cran_3_model = OliviaNetwork()
cran_3_model.build_model(cran_3_G)
cran_3_sccs = cran_3_model.sorted_clusters()
cran_3_attack = attack_vulnerability(cran_3_model, normalize=False)
cran_3_failure = failure_vulnerability(cran_3_model, normalize=False)
cran_3_attack_N = attack_vulnerability(cran_3_model, normalize=True)
cran_3_failure_N = failure_vulnerability(cran_3_model, normalize=True)
del cran_3_model, cran_adjlist_3
gc.collect()

# Scraped (imports and depends)
build_dependency_network(
    input_file=cran_csv_4,
    output_file=cran_adjlist_4,
    chunk_size=1e5,
    dependent_field='name',
    dependency_field='dependency',
    verbose=True
)
cran_4_G = nx.read_adjlist(cran_adjlist_4, create_using=nx.DiGraph)
cran_4_model = OliviaNetwork()
cran_4_model.build_model(cran_4_G)
cran_4_sccs = cran_4_model.sorted_clusters()
cran_4_attack = attack_vulnerability(cran_4_model, normalize=False)
cran_4_failure = failure_vulnerability(cran_4_model, normalize=False)
cran_4_attack_N = attack_vulnerability(cran_4_model, normalize=True)
cran_4_failure_N = failure_vulnerability(cran_4_model, normalize=True)
del cran_4_model, cran_adjlist_4
gc.collect()

Using process  psutil.Process(pid=14, name='python', status='running', started='09:28:04')
Opening "/kaggle/input/dependency-networks/cran/cran_adjlist_librariesio.csv"... OK
Initializing graph... OK
0.0M lines | 3479 nodes,15397 deps. (0s) 2836.9Mb 
0.1M lines | 5761 nodes,31039 deps. (0s) 2848.0Mb 
0.2M lines | 9095 nodes,51619 deps. (1s) 2859.2Mb 
0.3M lines | 11658 nodes,74324 deps. (1s) 2867.4Mb 
0.4M lines | 14910 nodes,105841 deps. (2s) 2878.4Mb 
0.5M lines | 16174 nodes,117724 deps. (2s) 2878.6Mb 
Done reading file
Saving network as "/kaggle/working/cran_adjlist_librariesio.bz2"... OK
Building Olivia Model
     Finding strongly connected components (SCCs)...
     Building condensation network...
     Adding structural meta-data...
     Done
Computing Reach
     Processing node: 14K      
Reach retrieved from metrics cache
Reach retrieved from metrics cache
Reach retrieved from metrics cache
Using process  psutil.Process(pid=14, name='python', status='running', started='09:28:0

65102

In [13]:
cran_largest_scc, cran_in_component_1, cran_out_component_1, \
cran_tubes_1, cran_in_tendrils_1, cran_out_tendrils_1, cran_disconnected_1 = bowtie_structure(cran_1_G)

checks = len(cran_largest_scc)+\
len(cran_in_component_1)+\
len(cran_out_component_1)+\
len(cran_tubes_1)+\
len(cran_in_tendrils_1)+\
len(cran_out_tendrils_1)+\
len(cran_disconnected_1) == len(cran_1_G)

if checks:
    print("Sum of all components equals total number of nodes")

cran_1_first_scc = len(cran_1_sccs[0]) if len(cran_1_sccs[0]) > 0 else None
cran_1_second_scc = len(cran_1_sccs[1]) if len(cran_1_sccs[1]) > 0 else None

cran_1_nodes = len(cran_1_G.nodes)
cran_1_edges = len(cran_1_G.edges)

print(f"Cran libraries.io all versions first SCC: {cran_1_first_scc}")
print(f"Cran libraries.io all versions second SCC: {cran_1_second_scc}")


print(f"Cran bowtie structure:\n\
        Largest SCC: {len(cran_largest_scc)}\n\
        In component: {len(cran_in_component_1)}\n\
        Out component: {len(cran_out_component_1)}\n\
        Tubes: {len(cran_tubes_1)}\n\
        In tendrils: {len(cran_in_tendrils_1)}\n\
        Out tendrils: {len(cran_out_tendrils_1)}\n\
        Disconnected: {len(cran_disconnected_1)}")

del cran_1_G
gc.collect()

Sum of all components equals total number of nodes
Cran libraries.io all versions first SCC: 1405
Cran libraries.io all versions second SCC: 6
Cran bowtie structure:
        Largest SCC: 1405
        In component: 381
        Out component: 11746
        Tubes: 444
        In tendrils: 1680
        Out tendrils: 481
        Disconnected: 37


21036

In [14]:
cran_largest_scc_2, cran_in_component_2, cran_out_component_2, \
cran_tubes_2, cran_in_tendrils_2, cran_out_tendrils_2, cran_disconnected_2 = bowtie_structure(cran_2_G)

checks = len(cran_largest_scc_2)+\
len(cran_in_component_2)+\
len(cran_out_component_2)+\
len(cran_tubes_2)+\
len(cran_in_tendrils_2)+\
len(cran_out_tendrils_2)+\
len(cran_disconnected_2) == len(cran_2_G)

if checks:
    print("Sum of all components equals total number of nodes")

cran_2_first_scc = len(cran_2_sccs[0]) if len(cran_2_sccs[0]) > 0 else None
cran_2_second_scc = len(cran_2_sccs[1]) if len(cran_2_sccs[1]) > 0 else None

cran_2_nodes = len(cran_2_G.nodes)
cran_2_edges = len(cran_2_G.edges)

print(f"Cran libraries.io last version first SCC: {cran_2_first_scc}")
print(f"Cran libraries.io last version second SCC: {cran_2_second_scc}")

print(f"Cran bowtie structure:\n\
        Largest SCC: {len(cran_largest_scc_2)}\n\
        In component: {len(cran_in_component_2)}\n\
        Out component: {len(cran_out_component_2)}\n\
        Tubes: {len(cran_tubes_2)}\n\
        In tendrils: {len(cran_in_tendrils_2)}\n\
        Out tendrils: {len(cran_out_tendrils_2)}\n\
        Disconnected: {len(cran_disconnected_2)}")

del cran_2_G
gc.collect()

Sum of all components equals total number of nodes
Cran libraries.io last version first SCC: 1
Cran libraries.io last version second SCC: 1
Cran bowtie structure:
        Largest SCC: 1
        In component: 79
        Out component: 0
        Tubes: 0
        In tendrils: 14980
        Out tendrils: 0
        Disconnected: 587


19202

In [15]:
cran_largest_scc_3, cran_in_component_3, cran_out_component_3, \
cran_tubes_3, cran_in_tendrils_3, cran_out_tendrils_3, cran_disconnected_3 = bowtie_structure(cran_3_G)

checks = len(cran_largest_scc_3)+\
len(cran_in_component_3)+\
len(cran_out_component_3)+\
len(cran_tubes_3)+\
len(cran_in_tendrils_3)+\
len(cran_out_tendrils_3)+\
len(cran_disconnected_3) == len(cran_3_G)

if checks:
    print("Sum of all components equals total number of nodes")

cran_3_first_scc = len(cran_3_sccs[0]) if len(cran_3_sccs[0]) > 0 else None
cran_3_second_scc = len(cran_3_sccs[1]) if len(cran_3_sccs[1]) > 0 else None

cran_3_nodes = len(cran_3_G.nodes)
cran_3_edges = len(cran_3_G.edges)


print(f"Cran libraries.io last version filtered first SCC: {cran_3_first_scc}")
print(f"Cran libraries.io last version filtered second SCC: {cran_3_second_scc}")

print(f"Cran bowtie structure:\n\
        Largest SCC: {len(cran_largest_scc_3)}\n\
        In component: {len(cran_in_component_3)}\n\
        Out component: {len(cran_out_component_3)}\n\
        Tubes: {len(cran_tubes_3)}\n\
        In tendrils: {len(cran_in_tendrils_3)}\n\
        Out tendrils: {len(cran_out_tendrils_3)}\n\
        Disconnected: {len(cran_disconnected_3)}")

del cran_3_G
gc.collect()

Sum of all components equals total number of nodes
Cran libraries.io last version filtered first SCC: 923
Cran libraries.io last version filtered second SCC: 13
Cran bowtie structure:
        Largest SCC: 923
        In component: 333
        Out component: 11269
        Tubes: 666
        In tendrils: 2373
        Out tendrils: 442
        Disconnected: 49


20623

In [16]:
cran_largest_scc_4, cran_in_component_4, cran_out_component_4, \
cran_tubes_4, cran_in_tendrils_4, cran_out_tendrils_4, cran_disconnected_4 = bowtie_structure(cran_4_G)

checks = len(cran_largest_scc_4)+\
len(cran_in_component_4)+\
len(cran_out_component_4)+\
len(cran_tubes_4)+\
len(cran_in_tendrils_4)+\
len(cran_out_tendrils_4)+\
len(cran_disconnected_4) == len(cran_4_G)

if checks:
    print("Sum of all components equals total number of nodes")

cran_4_first_scc = len(cran_4_sccs[0]) if len(cran_4_sccs[0]) > 0 else None
cran_4_second_scc = len(cran_4_sccs[1]) if len(cran_4_sccs[1]) > 0 else None
cran_4_nodes = len(cran_4_G.nodes)
cran_4_edges = len(cran_4_G.edges)

print(f"Cran libraries.io last version filtered first SCC: {cran_4_first_scc}")
print(f"Cran libraries.io last version filtered second SCC: {cran_4_second_scc}")

print(f"Cran bowtie structure:\n\
        Largest SCC: {len(cran_largest_scc_4)}\n\
        In component: {len(cran_in_component_4)}\n\
        Out component: {len(cran_out_component_4)}\n\
        Tubes: {len(cran_tubes_4)}\n\
        In tendrils: {len(cran_in_tendrils_4)}\n\
        Out tendrils: {len(cran_out_tendrils_4)}\n\
        Disconnected: {len(cran_disconnected_4)}")

del cran_4_G
gc.collect()

Sum of all components equals total number of nodes
Cran libraries.io last version filtered first SCC: 1
Cran libraries.io last version filtered second SCC: 1
Cran bowtie structure:
        Largest SCC: 1
        In component: 6
        Out component: 0
        Tubes: 0
        In tendrils: 17984
        Out tendrils: 0
        Disconnected: 680


23219

### Pypi Bowtie comparation

Build network graphs

In [17]:
build_dependency_network(
    input_file=pypi_csv_1,
    output_file=pypi_adjlist_1,
    chunk_size=1e5,
    dependent_field='Project Name',
    dependency_field='Dependency Name',
    verbose=True
)
pypi_1_G = nx.read_adjlist(pypi_adjlist_1, create_using=nx.DiGraph)
pypi_1_model = OliviaNetwork()
pypi_1_model.build_model(pypi_1_G)
pypi_1_sccs =  pypi_1_model.sorted_clusters()
pypi_1_attack = attack_vulnerability(pypi_1_model, normalize=False)
pypi_1_failure = failure_vulnerability(pypi_1_model, normalize=False)
pypi_1_attack_N = attack_vulnerability(pypi_1_model, normalize=True)
pypi_1_failure_N = failure_vulnerability(pypi_1_model, normalize=True)
del pypi_1_model, pypi_adjlist_1
gc.collect()

build_dependency_network(
    input_file=pypi_csv_2,
    output_file=pypi_adjlist_2,
    chunk_size=1e5,
    dependent_field='Project Name',
    dependency_field='Dependency Name',
    verbose=True
)
pypi_2_G = nx.read_adjlist(pypi_adjlist_2, create_using=nx.DiGraph)
pypi_2_model = OliviaNetwork()
pypi_2_model.build_model(pypi_2_G)
pypi_2_sccs =  pypi_2_model.sorted_clusters()
pypi_2_attack = attack_vulnerability(pypi_2_model, normalize=False)
pypi_2_failure = failure_vulnerability(pypi_2_model, normalize=False)
pypi_2_attack_N = attack_vulnerability(pypi_2_model, normalize=True)
pypi_2_failure_N = failure_vulnerability(pypi_2_model, normalize=True)
del pypi_2_model, pypi_adjlist_2
gc.collect()

build_dependency_network(
    input_file=pypi_csv_3,
    output_file=pypi_adjlist_3,
    chunk_size=1e5,
    dependent_field='name',
    dependency_field='dependency',
    verbose=True
)
pypi_3_G = nx.read_adjlist(pypi_adjlist_3, create_using=nx.DiGraph)
pypi_3_model = OliviaNetwork()
pypi_3_model.build_model(pypi_3_G)
pypi_3_sccs =  pypi_3_model.sorted_clusters()
pypi_3_attack = attack_vulnerability(pypi_3_model, normalize=False)
pypi_3_failure = failure_vulnerability(pypi_3_model, normalize=False)
pypi_3_attack_N = attack_vulnerability(pypi_3_model, normalize=True)
pypi_3_failure_N = failure_vulnerability(pypi_3_model, normalize=True)
del pypi_3_model, pypi_adjlist_3
gc.collect()

Using process  psutil.Process(pid=14, name='python', status='running', started='09:28:04')
Opening "/kaggle/input/dependency-networks/pypi/pypi_adjlist_librariesio.csv"... OK
Initializing graph... OK
0.0M lines | 3660 nodes,6614 deps. (0s) 3000.7Mb 
0.1M lines | 5261 nodes,11295 deps. (0s) 3002.9Mb 
0.2M lines | 6533 nodes,15596 deps. (1s) 3002.9Mb 
0.3M lines | 7776 nodes,20175 deps. (1s) 3002.9Mb 
0.4M lines | 9376 nodes,25597 deps. (2s) 3002.9Mb 
0.5M lines | 10512 nodes,29550 deps. (2s) 3002.9Mb 
0.6M lines | 11469 nodes,32699 deps. (3s) 3002.9Mb 
0.7M lines | 13362 nodes,38950 deps. (3s) 3002.9Mb 
0.8M lines | 14976 nodes,44653 deps. (4s) 3002.9Mb 
0.9M lines | 16815 nodes,50982 deps. (4s) 3002.9Mb 
1.0M lines | 18460 nodes,56729 deps. (5s) 3002.9Mb 
1.1M lines | 20257 nodes,63126 deps. (5s) 3002.9Mb 
1.2M lines | 22597 nodes,71022 deps. (5s) 3002.9Mb 
1.3M lines | 23429 nodes,73699 deps. (6s) 3002.9Mb 
1.4M lines | 25709 nodes,81204 deps. (6s) 3004.0Mb 
1.5M lines | 28637 nodes,9

443107

Calc metrics for librariesio network

In [19]:
pypi_largest_scc, pypi_in_component, pypi_out_component, pypi_tubes, pypi_in_tendrils, pypi_out_tendrils, pypi_disconnected = bowtie_structure(pypi_1_G)

# Check that the sum of all bowtie components is equal to the total number of nodes in the graph
checks = len(pypi_largest_scc)+\
len(pypi_in_component)+\
len(pypi_out_component)+\
len(pypi_tubes)+\
len(pypi_in_tendrils)+\
len(pypi_out_tendrils)+\
len(pypi_disconnected) == len(pypi_1_G)

if checks:
    print("Sum of all bowtie components is equal to the total number of nodes in the graph")

pypi_1_first_scc = len(pypi_1_sccs[0]) if len(pypi_1_sccs[0]) > 0 else None
pypi_1_second_scc = len(pypi_1_sccs[1]) if len(pypi_1_sccs[1]) > 0 else None
pypi_1_nodes= len(pypi_1_G.nodes)
pypi_1_edges = len(pypi_1_G.edges)
print(f"PyPI libraries.io first version first SCC: {pypi_1_first_scc}")
print(f"PyPI libraries.io first version second SCC: {pypi_1_second_scc}")

print(f"Librariesio PyPI bowtie structure:\n\
      Largest SCC: {len(pypi_largest_scc)}\n\
      In component: {len(pypi_in_component)}\n\
      Out component: {len(pypi_out_component)}\n\
      Tubes: {len(pypi_tubes)}\n\
      In tendrils: {len(pypi_in_tendrils)}\n\
      Out tendrils: {len(pypi_out_tendrils)}\n\
      Disconnected: {len(pypi_disconnected)}")

del pypi_1_G
gc.collect()

Sum of all bowtie components is equal to the total number of nodes in the graph
PyPI libraries.io first version first SCC: 7
PyPI libraries.io first version second SCC: 4
Librariesio PyPI bowtie structure:
      Largest SCC: 7
      In component: 39
      Out component: 62
      Tubes: 13
      In tendrils: 23815
      Out tendrils: 13
      Disconnected: 26817


57532

In [20]:
pypi_largest_scc_2, pypi_in_component_2, pypi_out_component_2, \
pypi_tubes_2, pypi_in_tendrils_2, pypi_out_tendrils_2, pypi_disconnected_2 = bowtie_structure(pypi_2_G)

# Check that the sum of all bowtie components is equal to the total number of nodes in the graph
checks = len(pypi_largest_scc_2)+\
len(pypi_in_component_2)+\
len(pypi_out_component_2)+\
len(pypi_tubes_2)+\
len(pypi_in_tendrils_2)+\
len(pypi_out_tendrils_2)+\
len(pypi_disconnected_2) == len(pypi_2_G)

if checks:
    print("Sum of all bowtie components is equal to the total number of nodes in the graph")

pypi_2_first_scc = len(pypi_2_sccs[0]) if len(pypi_2_sccs[0]) > 0 else None
pypi_2_second_scc = len(pypi_2_sccs[1]) if len(pypi_2_sccs[1]) > 0 else None
pypi_2_nodes = len(pypi_2_G.nodes)
pypi_2_edges = len(pypi_2_G.edges)

print(f"PyPI libraries.io second version first SCC: {pypi_2_first_scc}")
print(f"PyPI libraries.io second version second SCC: {pypi_2_second_scc}")

print(f"Scraped PyPI bowtie structure:\n\
        Largest SCC: {len(pypi_largest_scc_2)}\n\
        In component: {len(pypi_in_component_2)}\n\
        Out component: {len(pypi_out_component_2)}\n\
        Tubes: {len(pypi_tubes_2)}\n\
        In tendrils: {len(pypi_in_tendrils_2)}\n\
        Out tendrils: {len(pypi_out_tendrils_2)}\n\
        Disconnected: {len(pypi_disconnected_2)}")

del pypi_2_G
gc.collect()


Sum of all bowtie components is equal to the total number of nodes in the graph
PyPI libraries.io second version first SCC: 4
PyPI libraries.io second version second SCC: 4
Scraped PyPI bowtie structure:
        Largest SCC: 4
        In component: 21
        Out component: 5
        Tubes: 1
        In tendrils: 27742
        Out tendrils: 11
        Disconnected: 21522


55622

In [21]:
pypi_largest_scc_3, pypi_in_component_3, pypi_out_component_3, \
pypi_tubes_3, pypi_in_tendrils_3, pypi_out_tendrils_3, pypi_disconnected_3 = bowtie_structure(pypi_3_G)

# Check that the sum of all bowtie components is equal to the total number of nodes in the graph
checks = len(pypi_largest_scc_3)+\
len(pypi_in_component_3)+\
len(pypi_out_component_3)+\
len(pypi_tubes_3)+\
len(pypi_in_tendrils_3)+\
len(pypi_out_tendrils_3)+\
len(pypi_disconnected_3) == len(pypi_3_G)

if checks:
    print("Sum of all bowtie components is equal to the total number of nodes in the graph")

pypi_3_first_scc = len(pypi_3_sccs[0]) if len(pypi_3_sccs[0]) > 0 else None
pypi_3_second_scc = len(pypi_3_sccs[1]) if len(pypi_3_sccs[1]) > 0 else None

pypi_3_nodes = len(pypi_3_G.nodes)
pypi_3_edges = len(pypi_3_G.edges)

print(f"PyPI libraries.io third version first SCC: {pypi_3_first_scc}")
print(f"PyPI libraries.io third version second SCC: {pypi_3_second_scc}")

print(f"Scraped PyPI bowtie structure:\n\
        Largest SCC: {len(pypi_largest_scc_3)}\n\
        In component: {len(pypi_in_component_3)}\n\
        Out component: {len(pypi_out_component_3)}\n\
        Tubes: {len(pypi_tubes_3)}\n\
        In tendrils: {len(pypi_in_tendrils_3)}\n\
        Out tendrils: {len(pypi_out_tendrils_3)}\n\
        Disconnected: {len(pypi_disconnected_3)}")

del pypi_3_G
gc.collect()

Sum of all bowtie components is equal to the total number of nodes in the graph
PyPI libraries.io third version first SCC: 283
PyPI libraries.io third version second SCC: 19
Scraped PyPI bowtie structure:
        Largest SCC: 283
        In component: 449
        Out component: 138219
        Tubes: 2446
        In tendrils: 30261
        Out tendrils: 14941
        Disconnected: 27870


230785

### NPM Bowtie comparation

In [22]:
build_dependency_network(
    input_file=npm_csv_1,
    output_file=npm_adjlist_1,
    chunk_size=1e6,
    dependent_field='Project Name',
    dependency_field='Dependency Name',
    verbose=True
)
npm_1_G = nx.read_adjlist(npm_adjlist_1, create_using=nx.DiGraph)
npm_1_model = OliviaNetwork()
npm_1_model.build_model(npm_1_G)
npm_1_sccs =  npm_1_model.sorted_clusters()
npm_1_attack = attack_vulnerability(npm_1_model, normalize=False)
npm_1_failure = failure_vulnerability(npm_1_model, normalize=False)
npm_1_attack_N = attack_vulnerability(npm_1_model, normalize=True)
npm_1_failure_N = failure_vulnerability(npm_1_model, normalize=True)
del npm_1_model, npm_adjlist_1
gc.collect()

build_dependency_network(
    input_file=npm_csv_2,
    output_file=npm_adjlist_2,
    chunk_size=1e6,
    dependent_field='Project Name',
    dependency_field='Dependency Name',
    verbose=True
)
npm_2_G = nx.read_adjlist(npm_adjlist_2, create_using=nx.DiGraph)
npm_2_model = OliviaNetwork()
npm_2_model.build_model(npm_2_G)
npm_2_sccs =  npm_2_model.sorted_clusters()
npm_2_attack = attack_vulnerability(npm_2_model, normalize=False)
npm_2_failure = failure_vulnerability(npm_2_model, normalize=False)
npm_2_attack_N = attack_vulnerability(npm_2_model, normalize=True)
npm_2_failure_N = failure_vulnerability(npm_2_model, normalize=True)
del npm_2_model, npm_adjlist_2
gc.collect()

build_dependency_network(
    input_file=npm_csv_3,
    output_file=npm_adjlist_3,
    chunk_size=1e6,
    dependent_field='name',
    dependency_field='dependency',
    verbose=True
)
npm_3_G = nx.read_adjlist(npm_adjlist_3, create_using=nx.DiGraph)
npm_3_model = OliviaNetwork()
npm_3_model.build_model(npm_3_G)
npm_3_sccs =  npm_3_model.sorted_clusters()
npm_3_attack = attack_vulnerability(npm_3_model, normalize=False)
npm_3_failure = failure_vulnerability(npm_3_model, normalize=False)
npm_3_attack_N = attack_vulnerability(npm_3_model, normalize=True)
npm_3_failure_N = failure_vulnerability(npm_3_model, normalize=True)
del npm_3_model, npm_adjlist_3
gc.collect()

build_dependency_network(
    input_file=npm_csv_4,
    output_file=npm_adjlist_4,
    chunk_size=1e6,
    dependent_field='name',
    dependency_field='dependency',
    verbose=True
)
npm_4_G = nx.read_adjlist(npm_adjlist_4, create_using=nx.DiGraph)
npm_4_model = OliviaNetwork()
npm_4_model.build_model(npm_4_G)
npm_4_sccs =  npm_4_model.sorted_clusters()
npm_4_attack = attack_vulnerability(npm_4_model, normalize=False)
npm_4_failure = failure_vulnerability(npm_4_model, normalize=False)
npm_4_attack_N = attack_vulnerability(npm_4_model, normalize=True)
npm_4_failure_N = failure_vulnerability(npm_4_model, normalize=True)
del npm_4_model, npm_adjlist_4
gc.collect()


Using process  psutil.Process(pid=14, name='python', status='running', started='09:28:04')
Opening "/kaggle/input/dependency-networks/npm/librariesio_npm.csv"... OK
Initializing graph... OK
0.0M lines | 20638 nodes,98907 deps. (3s) 3841.6Mb 
1.0M lines | 32004 nodes,174025 deps. (6s) 3873.5Mb 
2.0M lines | 43933 nodes,260232 deps. (10s) 3841.6Mb 
3.0M lines | 53379 nodes,338390 deps. (13s) 3852.7Mb 
4.0M lines | 65547 nodes,446460 deps. (16s) 3852.7Mb 
5.0M lines | 81139 nodes,546543 deps. (20s) 3852.7Mb 
6.0M lines | 91689 nodes,628933 deps. (23s) 3852.7Mb 
7.0M lines | 103318 nodes,710271 deps. (27s) 3852.7Mb 
8.0M lines | 112171 nodes,793637 deps. (30s) 3853.5Mb 
9.0M lines | 118418 nodes,851342 deps. (34s) 3872.4Mb 
10.0M lines | 129611 nodes,957747 deps. (37s) 3906.5Mb 
11.0M lines | 135802 nodes,1023165 deps. (40s) 3926.9Mb 
12.0M lines | 144469 nodes,1104639 deps. (44s) 3953.3Mb 
13.0M lines | 152475 nodes,1176768 deps. (47s) 3976.5Mb 
14.0M lines | 161473 nodes,1259825 deps. (5

3980723

In [23]:
npm_largest_scc, npm_in_component, npm_out_component, npm_tubes, npm_in_tendrils, npm_out_tendrils, npm_disconnected = bowtie_structure(npm_1_G)

checks=len(npm_largest_scc)+\
len(npm_in_component)+\
len(npm_out_component)+\
len(npm_tubes)+\
len(npm_in_tendrils)+\
len(npm_out_tendrils)+\
len(npm_disconnected) == len(npm_1_G)

if checks:
    print("Sum of all bowtie components is equal to the total number of nodes in the graph")

npm_first_scc_1 = len(npm_1_sccs[0]) if len(npm_1_sccs[0]) > 0 else None
npm_second_scc_1 = len(npm_1_sccs[1]) if len(npm_1_sccs[1]) > 0 else None
npm_1_nodes = len(npm_1_G.nodes)
npm_1_edges = len(npm_1_G.edges)
print(f"Librariesio NPM first version first SCC: {npm_first_scc_1}")
print(f"Librariesio NPM first version second SCC: {npm_second_scc_1}")

print(f"Librariesio NPM bowtie structure:\n\
    Largest SCC: {len(npm_largest_scc)}\n\
    In component: {len(npm_in_component)}\n\
    Out component: {len(npm_out_component)}\n\
    Tubes: {len(npm_tubes)}\n\
    In tendrils: {len(npm_in_tendrils)}\n\
    Out tendrils: {len(npm_out_tendrils)}\n\
    Disconnected: {len(npm_disconnected)}")

del npm_1_G
gc.collect()

Sum of all bowtie components is equal to the total number of nodes in the graph
Librariesio NPM first version first SCC: 26486
Librariesio NPM first version second SCC: 175
Librariesio NPM bowtie structure:
    Largest SCC: 26486
    In component: 3849
    Out component: 936295
    Tubes: 3745
    In tendrils: 17891
    Out tendrils: 69604
    Disconnected: 16638


1417880

In [24]:
npm_largest_scc_2, npm_in_component_2, npm_out_component_2, \
npm_tubes_2, npm_in_tendrils_2, npm_out_tendrils_2, npm_disconnected_2 = bowtie_structure(npm_2_G)

checks=len(npm_largest_scc_2)+\
len(npm_in_component_2)+\
len(npm_out_component_2)+\
len(npm_tubes_2)+\
len(npm_in_tendrils_2)+\
len(npm_out_tendrils_2)+\
len(npm_disconnected_2) == len(npm_2_G)

if checks:
      print("Sum of all bowtie components is equal to the total number of nodes in the graph")

npm_first_scc_2 = len(npm_2_sccs[0]) if len(npm_2_sccs[0]) > 0 else None
npm_second_scc_2 = len(npm_2_sccs[1]) if len(npm_2_sccs[1]) > 0 else None
npm_2_edges = len(npm_2_G.edges)
npm_2_nodes = len(npm_2_G.nodes)
print(f"Librariesio NPM second version first SCC: {npm_first_scc_2}")
print(f"Librariesio NPM second version second SCC: {npm_second_scc_2}")

print(f"Scraped NPM bowtie structure:\n\
      Largest SCC: {len(npm_largest_scc_2)}\n\
      In component: {len(npm_in_component_2)}\n\
      Out component: {len(npm_out_component_2)}\n\
      Tubes: {len(npm_tubes_2)}\n\
      In tendrils: {len(npm_in_tendrils_2)}\n\
      Out tendrils: {len(npm_out_tendrils_2)}\n\
      Disconnected: {len(npm_disconnected_2)}")

del npm_2_G
gc.collect()

Sum of all bowtie components is equal to the total number of nodes in the graph
Librariesio NPM second version first SCC: 13378
Librariesio NPM second version second SCC: 157
Scraped NPM bowtie structure:
      Largest SCC: 13378
      In component: 1827
      Out component: 940266
      Tubes: 4260
      In tendrils: 19759
      Out tendrils: 60947
      Disconnected: 24094


1382645

In [25]:
npm_largest_scc_3, npm_in_component_3, npm_out_component_3, \
npm_tubes_3, npm_in_tendrils_3, npm_out_tendrils_3, npm_disconnected_3 = bowtie_structure(npm_3_G)

checks=len(npm_largest_scc_3)+\
len(npm_in_component_3)+\
len(npm_out_component_3)+\
len(npm_tubes_3)+\
len(npm_in_tendrils_3)+\
len(npm_out_tendrils_3)+\
len(npm_disconnected_3) == len(npm_3_G)

if checks:
        print("Sum of all bowtie components is equal to the total number of nodes in the graph")

npm_first_scc_3 = len(npm_3_sccs[0]) if len(npm_3_sccs[0]) > 0 else None
npm_second_scc_3 = len(npm_3_sccs[1]) if len(npm_3_sccs[1]) > 0 else None
npm_3_nodes = len(npm_3_G.nodes)
npm_3_edges = len(npm_3_G.edges)
print(f"Librariesio NPM third version first SCC: {npm_first_scc_3}")
print(f"Librariesio NPM third version second SCC: {npm_second_scc_3}")

print(f"Scraped NPM bowtie structure:\n\
        Largest SCC: {len(npm_largest_scc_3)}\n\
        In component: {len(npm_in_component_3)}\n\
        Out component: {len(npm_out_component_3)}\n\
        Tubes: {len(npm_tubes_3)}\n\
        In tendrils: {len(npm_in_tendrils_3)}\n\
        Out tendrils: {len(npm_out_tendrils_3)}\n\
        Disconnected: {len(npm_disconnected_3)}")

del npm_3_G
gc.collect()


Sum of all bowtie components is equal to the total number of nodes in the graph
Librariesio NPM third version first SCC: 26
Librariesio NPM third version second SCC: 17
Scraped NPM bowtie structure:
        Largest SCC: 26
        In component: 0
        Out component: 1
        Tubes: 0
        In tendrils: 0
        Out tendrils: 0
        Disconnected: 1059731


1209579

In [26]:
npm_largest_scc_4, npm_in_component_4, npm_out_component_4, \
npm_tubes_4, npm_in_tendrils_4, npm_out_tendrils_4, npm_disconnected_4 = bowtie_structure(npm_4_G)

checks=len(npm_largest_scc_4)+\
len(npm_in_component_4)+\
len(npm_out_component_4)+\
len(npm_tubes_4)+\
len(npm_in_tendrils_4)+\
len(npm_out_tendrils_4)+\
len(npm_disconnected_4) == len(npm_4_G)

if checks:
    print("Sum of all bowtie components is equal to the total number of nodes in the graph")

npm_first_scc_4 = len(npm_4_sccs[0]) if len(npm_4_sccs[0]) > 0 else None
npm_second_scc_4 = len(npm_4_sccs[1]) if len(npm_4_sccs[1]) > 0 else None
npm_4_nodes = len(npm_4_G.nodes)
npm_4_edges = len(npm_4_G.edges)
print(f"Librariesio NPM fourth version first SCC: {npm_first_scc_4}")
print(f"Librariesio NPM fourth version second SCC: {npm_second_scc_4}")

print(f"Scraped NPM bowtie structure:\n\
        Largest SCC: {len(npm_largest_scc_4)}\n\
        In component: {len(npm_in_component_4)}\n\
        Out component: {len(npm_out_component_4)}\n\
        Tubes: {len(npm_tubes_4)}\n\
        In tendrils: {len(npm_in_tendrils_4)}\n\
        Out tendrils: {len(npm_out_tendrils_4)}\n\
        Disconnected: {len(npm_disconnected_4)}")

del npm_4_G
gc.collect()


Sum of all bowtie components is equal to the total number of nodes in the graph
Librariesio NPM fourth version first SCC: 19579
Librariesio NPM fourth version second SCC: 451
Scraped NPM bowtie structure:
        Largest SCC: 19579
        In component: 3718
        Out component: 1626207
        Tubes: 7599
        In tendrils: 50120
        Out tendrils: 77588
        Disconnected: 48132


2259907

## Results

In [27]:
# Build a table with the bowtie structure of each package manager
bowtie_table = pd.DataFrame(columns=[
    'Nº nodes',
    'Nº edges',
    '1st SCC',
    '2nd SCC',
    'In component', 
    'Out component', 
    'Tubes', 
    'In tendrils', 
    'Out tendrils', 
    'Disconnected',
    'Atack vulnerability',
    'Atack vulnerability (normalized)',
    'Failure vulnerability',
    'Failure vulnerability (normalized)' 
])
bowtie_table.loc['Bioconductor Scraped (imports and depends)'] = [
    bioconductor_nodes, bioconductor_edges, 
    bio_first_scc, bio_second_scc,
    len(bio_in_component), len(bio_out_component), 
    len(bio_tubes), len(bio_in_tendrils), len(bio_out_tendrils), len(bio_disconnected),
    bioconductor_attack, bioconductor_attack_N,
    bioconductor_failure, bioconductor_failure_N
]
bowtie_table.loc['CRAN Libraries.io all versions'] = [
    cran_1_nodes, cran_1_edges,
    cran_1_first_scc, cran_1_second_scc,
    len(cran_in_component_1), len(cran_out_component_1),
    len(cran_tubes_1), len(cran_in_tendrils_1), len(cran_out_tendrils_1), len(cran_disconnected_1),
    cran_1_attack, cran_1_attack_N,
    cran_1_failure, cran_1_failure_N
]

bowtie_table.loc['CRAN Libraries.io last version filtered (imports and depends)'] = [
    cran_2_nodes, cran_2_edges,
    cran_2_first_scc, cran_2_second_scc,
    len(cran_in_component_2), len(cran_out_component_2),
    len(cran_tubes_2), len(cran_in_tendrils_2), len(cran_out_tendrils_2), len(cran_disconnected_2),
    cran_2_attack, cran_2_attack_N,
    cran_2_failure, cran_2_failure_N
]

bowtie_table.loc['CRAN Libraries.io last version filtered (imports, depends, suggests, enhances)'] = [
    cran_3_nodes, cran_3_edges,
    cran_3_first_scc, cran_3_second_scc,
    len(cran_in_component_3), len(cran_out_component_3),
    len(cran_tubes_3), len(cran_in_tendrils_3), len(cran_out_tendrils_3), len(cran_disconnected_3),
    cran_3_attack, cran_3_attack_N,
    cran_3_failure, cran_3_failure_N
]

bowtie_table.loc['CRAN Scraped (imports and depends)'] = [
    cran_4_nodes, cran_4_edges,
    cran_4_first_scc, cran_4_second_scc,
    len(cran_in_component_4), len(cran_out_component_4),
    len(cran_tubes_4), len(cran_in_tendrils_4), len(cran_out_tendrils_4), len(cran_disconnected_4),
    cran_4_attack, cran_4_attack_N,
    cran_4_failure, cran_4_failure_N
]


bowtie_table.loc['PyPI Libraries.io all versions'] = [
    pypi_1_nodes, pypi_1_edges,
    pypi_1_first_scc, pypi_1_second_scc,
    len(pypi_in_component), len(pypi_out_component),
    len(pypi_tubes), len(pypi_in_tendrils), len(pypi_out_tendrils), len(pypi_disconnected),
    pypi_1_attack, pypi_1_attack_N,
    pypi_1_failure, pypi_1_failure_N
]

bowtie_table.loc['PyPI Libraries.io last version filtered'] = [
    pypi_2_nodes, pypi_2_edges,
    pypi_2_first_scc, pypi_2_second_scc,
    len(pypi_in_component_2), len(pypi_out_component_2),
    len(pypi_tubes_2), len(pypi_in_tendrils_2), len(pypi_out_tendrils_2), len(pypi_disconnected_2),
    pypi_2_attack, pypi_2_attack_N,
    pypi_2_failure, pypi_2_failure_N
]

bowtie_table.loc['PyPI Scraped'] = [
    pypi_3_nodes, pypi_3_edges,
    pypi_3_first_scc, pypi_3_second_scc,
    len(pypi_in_component_3), len(pypi_out_component_3),
    len(pypi_tubes_3), len(pypi_in_tendrils_3), len(pypi_out_tendrils_3), len(pypi_disconnected_3),
    pypi_3_attack, pypi_3_attack_N,
    pypi_3_failure, pypi_3_failure_N
]

bowtie_table.loc['NPM Libraries.io all versions'] = [
    npm_1_nodes, npm_1_edges,
    npm_first_scc_1, npm_second_scc_1,
    len(npm_in_component), len(npm_out_component),
    len(npm_tubes), len(npm_in_tendrils), len(npm_out_tendrils), len(npm_disconnected),
    npm_1_attack, npm_1_attack_N,
    npm_1_failure, npm_1_failure_N
]

bowtie_table.loc['NPM Libraries.io last version filtered'] = [
    npm_2_nodes, npm_2_edges,
    npm_first_scc_2, npm_second_scc_2,
    len(npm_in_component_2), len(npm_out_component_2),
    len(npm_tubes_2), len(npm_in_tendrils_2), len(npm_out_tendrils_2), len(npm_disconnected_2),
    npm_2_attack, npm_2_attack_N,
    npm_2_failure, npm_2_failure_N
]

bowtie_table.loc['NPM Scraped (only runtime dependencies)'] = [
    npm_3_nodes, npm_3_edges,
    npm_first_scc_3, npm_second_scc_3,
    len(npm_in_component_3), len(npm_out_component_3),
    len(npm_tubes_3), len(npm_in_tendrils_3), len(npm_out_tendrils_3), len(npm_disconnected_3),
    npm_3_attack, npm_3_attack_N,
    npm_3_failure, npm_3_failure_N
]

bowtie_table.loc['NPM Scraped (all dependencies)'] = [
    npm_4_nodes, npm_4_edges,
    npm_first_scc_4, npm_second_scc_4,
    len(npm_in_component_4), len(npm_out_component_4),
    len(npm_tubes_4), len(npm_in_tendrils_4), len(npm_out_tendrils_4), len(npm_disconnected_4),
    npm_4_attack, npm_4_attack_N,
    npm_4_failure, npm_4_failure_N
]


In [28]:
bowtie_table

Unnamed: 0,Nº nodes,Nº edges,1st SCC,2nd SCC,In component,Out component,Tubes,In tendrils,Out tendrils,Disconnected,Atack vulnerability,Atack vulnerability (normalized),Failure vulnerability,Failure vulnerability (normalized)
Bioconductor Scraped (imports and depends),3509.0,28320.0,1.0,1.0,124.0,0.0,0.0,2161.0,0.0,1223.0,2109.0,0.601026,24.817327,0.007072
CRAN Libraries.io all versions,16174.0,117724.0,1405.0,6.0,381.0,11746.0,444.0,1680.0,481.0,37.0,15123.0,0.935019,1454.525535,0.08993
CRAN Libraries.io last version filtered (imports and depends),15647.0,76207.0,1.0,1.0,79.0,0.0,0.0,14980.0,0.0,587.0,14395.0,0.919985,24.59104,0.001572
"CRAN Libraries.io last version filtered (imports, depends, suggests, enhances)",16055.0,107370.0,923.0,13.0,333.0,11269.0,666.0,2373.0,442.0,49.0,15056.0,0.937776,957.213454,0.059621
CRAN Scraped (imports and depends),18671.0,113273.0,1.0,1.0,6.0,0.0,0.0,17984.0,0.0,680.0,17223.0,0.922447,33.543999,0.001797
PyPI Libraries.io all versions,50766.0,155369.0,7.0,4.0,39.0,62.0,13.0,23815.0,13.0,26817.0,22315.0,0.439566,15.730115,0.00031
PyPI Libraries.io last version filtered,49306.0,134575.0,4.0,4.0,21.0,5.0,1.0,27742.0,11.0,21522.0,19212.0,0.389648,8.573297,0.000174
PyPI Scraped,214469.0,933955.0,283.0,19.0,449.0,138219.0,2446.0,30261.0,14941.0,27870.0,145000.0,0.676088,489.552667,0.002283
NPM Libraries.io all versions,1074508.0,13052831.0,26486.0,175.0,3849.0,936295.0,3745.0,17891.0,69604.0,16638.0,975555.0,0.907909,27193.825099,0.025308
NPM Libraries.io last version filtered,1064531.0,11405275.0,13378.0,157.0,1827.0,940266.0,4260.0,19759.0,60947.0,24094.0,968059.0,0.909376,13633.877879,0.012807
