<h1 style="font-size: 80px; color: blue"> 
Olivia Finder - Usage
</h1>


This notebook consists in the use of the **Olivia** library with the different data obtained through **Olivia Finder**

# <span style="color: red">**0 - Previous requirements**</span>

## Setup venv and install requirements

In [1]:
# Olivia requirements
%pip install -r ../olivia/requirements.txt

Note: you may need to restart the kernel to use updated packages.


In [2]:
# Olivia Finder requirements
%pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.


If you use a virtual environment, it is necessary to configure it as selected in the Jupyter kernel


## Setup library path

Make sure to run this cell to have the **olivia** and **olivia-finder** library at PATH

In [1]:
# Append the path to the olivia_finder package
import sys
sys.path.append('../../olivia/')
sys.path.append('../../olivia_finder/')


## Setup configuration

It is necessary to initialize the configuration, the most comfortable and global way to do so is through an environment variable

In [2]:
# Add the environment variable OLIVIA_FINDER_CONFIG_FILE_PATH

import os
os.environ['OLIVIA_FINDER_CONFIG_FILE_PATH'] = "../../olivia_finder/olivia_finder/config.ini"

# **1 - Build the dataset**

In this section we will use Olivia Finder to build the dataset of the network that we want to analyze

In [3]:
from olivia_finder.package_manager import PackageManager
import gc


## Using data from persistence

### Olivia finder persistence objects

In [7]:
bioconductor_pm_loaded = PackageManager.load_from_persistence("results/package_managers/bioconductor_scraper.olvpm")
bioconductor_G_loaded = bioconductor_pm_loaded.get_network()
del bioconductor_pm_loaded
print(f'Nodes: {len(bioconductor_G_loaded.nodes)}')
print(f'Edges: {len(bioconductor_G_loaded.edges)}')

[34;20m2023-05-25 22:18:37,174 [olivia_finder.packagemanager(INFO)] -> package_manager.py:111[0m
Loading package manager from results/package_managers/bioconductor_scraper.olvpm
[34;20m2023-05-25 22:18:37,385 [olivia_finder.packagemanager(INFO)] -> package_manager.py:114[0m
Package manager loaded


Nodes: 3509
Edges: 28320


In [8]:
cran_pm_loaded = PackageManager.load_from_persistence("results/package_managers/cran_scraper.olvpm")
cran_G_loaded = cran_pm_loaded.get_network()
del cran_pm_loaded
print(f'Nodes: {len(cran_G_loaded.nodes)}')
print(f'Edges: {len(cran_G_loaded.edges)}')

[34;20m2023-05-25 22:18:37,485 [olivia_finder.packagemanager(INFO)] -> package_manager.py:111[0m
Loading package manager from results/package_managers/cran_scraper.olvpm
[34;20m2023-05-25 22:18:38,408 [olivia_finder.packagemanager(INFO)] -> package_manager.py:114[0m
Package manager loaded


Nodes: 18867
Edges: 114642


In [9]:
pypi_pm_loaded = PackageManager.load_from_persistence("results/package_managers/pypi_scraper.olvpm")
pypi_G_loaded = pypi_pm_loaded.get_network()
del pypi_pm_loaded
print(f'Nodes: {len(pypi_G_loaded.nodes)}')
print(f'Edges: {len(pypi_G_loaded.edges)}')

[34;20m2023-05-25 22:18:38,935 [olivia_finder.packagemanager(INFO)] -> package_manager.py:111[0m
Loading package manager from results/package_managers/pypi_scraper.olvpm
[34;20m2023-05-25 22:18:46,264 [olivia_finder.packagemanager(INFO)] -> package_manager.py:114[0m
Package manager loaded


Nodes: 214470
Edges: 933955


In [10]:
npm_pm_loaded = PackageManager.load_from_persistence("results/package_managers/npm_scraper.olvpm")
npm_G_loaded = npm_pm_loaded.get_network()
del npm_pm_loaded
print(f'Nodes: {len(npm_G_loaded.nodes)}')
print(f'Edges: {len(npm_G_loaded.edges)}')

[34;20m2023-05-25 22:18:52,740 [olivia_finder.packagemanager(INFO)] -> package_manager.py:111[0m
Loading package manager from results/package_managers/npm_scraper.olvpm
[34;20m2023-05-25 22:19:30,041 [olivia_finder.packagemanager(INFO)] -> package_manager.py:114[0m
Package manager loaded


Nodes: 1059782
Edges: 4855094


### Csv files

Load the bioconductor network using a CSV file on a package manager object

In [11]:
bioconductor_pm_csv =  PackageManager.load_from_csv(
    "results/csv_datasets/bioconductor/bioconductor_adjlist_scraping.csv",
    default_format=True
)
bioconductor_G_csv = bioconductor_pm_csv.get_network()
del bioconductor_pm_csv
print(f'Nodes: {len(bioconductor_G_csv.nodes)}')
print(f'Edges: {len(bioconductor_G_csv.edges)}')

[34;20m2023-05-25 22:19:59,914 [olivia_finder.packagemanager(INFO)] -> package_manager.py:187[0m
Loading csv file from results/csv_datasets/bioconductor/bioconductor_adjlist_scraping.csv


Nodes: 3509
Edges: 28320


Load the NPM network as it has been done up

In [12]:
npm_pm_csv =  PackageManager.load_from_csv(
    "results/csv_datasets/npm/npm_adjlist_scraping.csv",
    default_format=True
)
npm_G_csv = npm_pm_csv.get_network()
del npm_pm_csv
print(f'Nodes: {len(npm_G_csv.nodes)}')
print(f'Edges: {len(npm_G_csv.edges)}')

Nodes: 1059780
Edges: 4851183


## Build custom network

**Instantance the Package Manager object according to our needs**

#### Scraping data

Build the network for the "retire" package

In [13]:
from olivia_finder.data_source.repository_scrapers.npm import NpmScraper

npm_pm = PackageManager(
    data_sources=[NpmScraper()]
)
retire_G  = npm_pm.get_dependency_network(
    package_name="retire", 
    deep_level=3, 
    generate=True
)
del npm_pm
print(f'Nodes: {len(retire_G.nodes)}')
print(f'Edges: {len(retire_G.edges)}')

Nodes: 19
Edges: 30


Build the retwork for "GOstats" combining 2 differents data source

In [14]:
from olivia_finder.data_source.repository_scrapers.cran import CranScraper
from olivia_finder.data_source.repository_scrapers.bioconductor import BioconductorScraper

bioconductor_pm_multiple = PackageManager(
    data_sources=[
        BioconductorScraper(),
        CranScraper(),
    ]
)
GOstats_G = bioconductor_pm_multiple.get_dependency_network(
    package_name="GOstats",
    deep_level=3,
    generate=True
)
del bioconductor_pm_multiple
print(f'Nodes: {len(GOstats_G.nodes)}')
print(f'Edges: {len(GOstats_G.edges)}')

Worker 0: Error doing request job: <Response [404]>
Request for R: https://www.bioconductor.org/packages/release/bioc/html/R.html failed: response is None
Worker 0: Error doing request job: <Response [404]>
Request for R: https://cran.r-project.org/package=R failed: response is None
Worker 0: Error doing request job: <Response [404]>
Request for utils: https://www.bioconductor.org/packages/release/bioc/html/utils.html failed: response is None
Worker 0: Error doing request job: <Response [404]>
Request for utils: https://cran.r-project.org/package=utils failed: response is None
Worker 0: Error doing request job: <Response [404]>
Request for methods: https://www.bioconductor.org/packages/release/bioc/html/methods.html failed: response is None
Worker 0: Error doing request job: <Response [404]>
Request for methods: https://cran.r-project.org/package=methods failed: response is None
Worker 0: Error doing request job: <Response [404]>
Request for stats4: https://www.bioconductor.org/package

Nodes: 39
Edges: 131


In [None]:
gc.collect()

# **2 - Build olivia model**

In [4]:
from olivia.model import OliviaNetwork

### Using full network data from scraping

In [16]:
bioconductor_model = OliviaNetwork()
bioconductor_model.build_model(bioconductor_G_loaded)

Building Olivia Model
     Finding strongly connected components (SCCs)...
     Building condensation network...
     Adding structural meta-data...
     Done


In [17]:
cran_model = OliviaNetwork()
cran_model.build_model(cran_G_loaded)

Building Olivia Model
     Finding strongly connected components (SCCs)...
     Building condensation network...
     Adding structural meta-data...
     Done


In [18]:
pypi_model = OliviaNetwork()
pypi_model.build_model(pypi_G_loaded)

Building Olivia Model
     Finding strongly connected components (SCCs)...
     Building condensation network...
     Adding structural meta-data...
     Done


Also we can load a prebuilt model

In [19]:
# npm_model = OliviaNetwork()
# npm_model.build_model(npm_G_loaded)

Building Olivia Model
     Finding strongly connected components (SCCs)...
     Building condensation network...
     Adding structural meta-data...
     Done


In [5]:
# load prebuilt models
npm_model = OliviaNetwork()
npm_model.load("results/olivia_prebuilts/npm.olv")

### Using single package networks

In [20]:
retire_network_model = OliviaNetwork()
retire_network_model.build_model(retire_G)

Building Olivia Model
     Finding strongly connected components (SCCs)...
     Building condensation network...
     Adding structural meta-data...
     Done


In [21]:
gostats_network_model = OliviaNetwork()
gostats_network_model.build_model(GOstats_G)

Building Olivia Model
     Finding strongly connected components (SCCs)...
     Building condensation network...
     Adding structural meta-data...
     Done


# **3 - Olivia metrics**

In [7]:
from olivia.networkmetrics import attack_vulnerability, failure_vulnerability

In [23]:
bioconductor_atack_vulnerability = attack_vulnerability(bioconductor_model)
bioconductor_failure_vulnerability = failure_vulnerability(bioconductor_model)

Bioconductor
Computing Reach
     Processing node: 3K      
Attack vulnerability: 2109
Reach retrieved from metrics cache
Failure vulnerability: 24.817326873753206


In [24]:
cran_atack_vulnerability = attack_vulnerability(cran_model)
cran_failure_vulnerability = failure_vulnerability(cran_model)

CRAN
Computing Reach
     Processing node: 18K      
Attack vulnerability: 17415
Reach retrieved from metrics cache
Failure vulnerability: 33.597498277415596


In [25]:
pypi_atack_vulnerability = attack_vulnerability(pypi_model)
pypi_failure_vulnerability = failure_vulnerability(pypi_model)

PyPI
Computing Reach
     Processing node: 213K      
Attack vulnerability: 145000
Reach retrieved from metrics cache
Failure vulnerability: 489.5503893318413


In [8]:
npm_atack_vulnerability = attack_vulnerability(npm_model)
npm_failure_vulnerability = failure_vulnerability(npm_model)

Computing Reach
     Processing node: 902K      

: 

: 

In [None]:
retire_atack_vulnerability = attack_vulnerability(retire_network_model)
retire_failure_vulnerability = failure_vulnerability(retire_network_model)

In [None]:
gostats_atack_vulnerability = attack_vulnerability(gostats_network_model)
gostats_failure_vulnerability = failure_vulnerability(gostats_network_model)

GOstats
Computing Reach
     Processing node: 0K      
Attack vulnerability: 39
Reach retrieved from metrics cache
Failure vulnerability: 7.205128205128205
