# **Bowtie metrics**

## 0 - Previous requirements

### Setup venv and install requirements

In [1]:
# Olivia Finder requirements
%pip install -r ../olivia_finder/requirements.txt

Collecting pandas
  Downloading pandas-2.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m28.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting tqdm
  Using cached tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting requests
  Downloading requests-2.31.0-py3-none-any.whl (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.6/62.6 KB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting BeautifulSoup4
  Using cached beautifulsoup4-4.12.2-py3-none-any.whl (142 kB)
Collecting selenium
  Downloading selenium-4.9.1-py3-none-any.whl (6.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.6/6.6 MB[0m [31m43.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting networkx
  Downloading networkx-3.1-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m56.1 MB/s

### Setup library path

In [1]:
# Append the path to the olivia_finder package
import sys
sys.path.append('../../olivia_finder/')

### Setup configuration

In [2]:
# Add the environment variable OLIVIA_FINDER_CONFIG_FILE_PATH
import os
os.environ['OLIVIA_FINDER_CONFIG_FILE_PATH'] = "../../olivia_finder/olivia_finder/config.ini"

## Bowtie Structure

This function takes a network as input and returns a node set decomposition based on the tie-loop structure of the input network. The function uses the algorithm described in the article "Bow-tie decomposition in directed graphs" by R. Yang, L. Zhuhadar and O. Nasraoui.

The function starts by finding the largest strongly connected component of the input network using the `max` function of the `networkx` library. Next, an arbitrary node of the largest strongly connected component is selected and the forward and backward reachable nodes from the largest strongly connected component are computed using the `dfs_tree` function of the `networkx` library.

Next, the input and output components of the tie-loop structure are computed using the forward and backward reachable nodes. Nodes that are forward reachable but not backward make up the output component, while nodes that are backward reachable but not forward make up the input component.

After calculating the input and output components, the "tendril", "tube" and "disconnected" components are calculated. Nodes that are reachable forward and backward form a "pipe", nodes that are reachable backward but not forward form an input "tendril", nodes that are reachable forward but not backward form a "tendril". " output, and nodes that are neither forward nor backward reachable are considered disconnected.

Finally, the function returns the node sets for each component of the tie-loop structure: the largest strongly connected component, the input and output components, the "tendril", "tube", and "disconnected" components.

In short, this function is used to decompose a network into its bow-tie structure components using the algorithm described in the article "Bow-tie decomposition in directed graphs". The function returns the node sets for each component of the tie-loop structure, which can be useful for analyzing the network structure and understanding its behavior.

In [3]:
import networkx as nx

def bowtie_structure(network):
    """ 
    Return node set decomposition according to the bowtie structure of the input network.
    Algorithm from 
    R. Yang, L. Zhuhadar and O. Nasraoui, "Bow-tie decomposition in directed graphs",2011
    """
    
    largest_scc = max(nx.strongly_connected_components(network), key=len)
    
    # Arbitrary node from the largest SCC
    node = next(iter(largest_scc))
    
    # Reachable nodes (forward) from the largest SCC
    dfs = set(nx.dfs_tree(network,node).nodes())
    
    # Reachable nodes (backwards) from the largest SCC
    reversed_network = nx.reverse(network, copy=True)
    dfs_t = set(nx.dfs_tree(reversed_network,node).nodes())
    
    out_component = dfs - largest_scc
    in_component = dfs_t - largest_scc
    
    # Tendrils, tubes and disconnected components
    rest = set(network.nodes()) -  largest_scc - out_component - in_component

    tubes, in_tendrils, out_tendrils, disconnected  = set(), set(), set(), set()

    for v in rest:
        # in_component nodes backwards reachable from v
        irv = in_component & set(nx.dfs_tree(reversed_network, v).nodes())
        # out_component nodes reachable from v
        vro = out_component & set(nx.dfs_tree(network, v).nodes())
        
        if irv and vro:
            tubes.add(v)
        elif irv and not vro:
            in_tendrils.add(v)
        elif not irv and vro:
            out_tendrils.add(v)
        else:
            disconnected.add(v)
            
    return  largest_scc, in_component, out_component, tubes, in_tendrils, out_tendrils, disconnected

## Analysis

In [4]:
from olivia_finder.package_manager import PackageManager

### Bioconductor

In [11]:
# Build bioconductor graph
bioconductor = PackageManager.load_from_persistence('../olivia_finder/resources/bioconductor_scraper.olvpm')
bioconductor_G = bioconductor.get_network()

In [12]:
largest_scc, in_component, out_component, tubes, in_tendrils, out_tendrils, disconnected = bowtie_structure(bioconductor_G)

In [13]:
assert len(largest_scc)+\
      len(in_component)+\
      len(out_component)+\
      len(tubes)+\
      len(in_tendrils)+\
      len(out_tendrils)+\
      len(disconnected) == len(bioconductor_G)

In [14]:
print(len(largest_scc), 
      len(in_component), 
      len(out_component), 
      len(tubes), 
      len(in_tendrils), 
      len(out_tendrils), 
      len(disconnected))

1 124 0 0 2161 0 1223


### NPM

In [5]:
npm = PackageManager.load_from_persistence('../olivia_finder/results/package_managers/npm_scraper.olvpm')
npm_G = npm.get_network()

In [6]:
largest_scc, in_component, out_component, tubes, in_tendrils, out_tendrils, disconnected = bowtie_structure(npm_G)

KeyboardInterrupt: 

In [28]:
print(len(largest_scc), 
      len(in_component), 
      len(out_component), 
      len(tubes), 
      len(in_tendrils), 
      len(out_tendrils), 
      len(disconnected))

26486 3849 936295 3745 17891 69604 16638


In [29]:
assert len(largest_scc)+\
      len(in_component)+\
      len(out_component)+\
      len(tubes)+\
      len(in_tendrils)+\
      len(out_tendrils)+\
      len(disconnected) == len(npm)
