<a href="https://colab.research.google.com/github/AnacletoLAB/grape/blob/main/tutorials/Ensmallen_Automatic_graph_retrieval_utilities.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Automatic graph retrieval utilities
In this tutorial we will explore the utilities available for the automatic graph retrieval.

## Installing the library
First of all, we install the [Ensmallen](https://github.com/AnacletoLAB/ensmallen) library:

In [1]:
!pip install -q ensmallen

[K     |████████████████████████████████| 42.5 MB 35 kB/s 
[K     |████████████████████████████████| 99 kB 7.7 MB/s 
[?25h  Building wheel for compress-json (setup.py) ... [?25l[?25hdone
  Building wheel for downloaders (setup.py) ... [?25l[?25hdone
  Building wheel for environments-utils (setup.py) ... [?25l[?25hdone
  Building wheel for py-cpuinfo (setup.py) ... [?25l[?25hdone
  Building wheel for userinput (setup.py) ... [?25l[?25hdone
  Building wheel for IPy (setup.py) ... [?25l[?25hdone
  Building wheel for validate-email (setup.py) ... [?25l[?25hdone
  Building wheel for validate-version-code (setup.py) ... [?25l[?25hdone


## A few of the available methods
We will rapidly present the available methods that allow to programmatically retrieve the available datasets.

Retrieve the list of the available graph repositories, that is the sources from where we can scrape graphs currently

In [3]:
from ensmallen.datasets import get_available_repositories
get_available_repositories()

['kghub',
 'yue',
 'pheknowlatorkg',
 'jax',
 'monarchinitiative',
 'string',
 'linqs',
 'networkrepository',
 'zenodo',
 'kgobo']

Retrieving the list of graphs available from a given repository

In [4]:
from ensmallen.datasets import get_available_graphs_from_repository
get_available_graphs_from_repository("kghub")

['KGCOVID19', 'KGMicrobe']

Retrieving the list of graph versions available from a given graph and repository

In [7]:
from ensmallen.datasets import get_available_versions_from_graph_and_repository
get_available_versions_from_graph_and_repository(
    graph_name="KGCOVID19",
    repository="kghub"
)

['20200925',
 '20200927',
 '20200929',
 '20201001',
 '20201012',
 '20201101',
 '20201202',
 '20210101',
 '20210128',
 '20210201',
 '20210218',
 '20210301',
 '20210412',
 '20210725',
 '20210726',
 '20210727',
 '20210823',
 '20210902',
 '20211002',
 'current']

Getting all the available datasets dataframe

In [8]:
from ensmallen.datasets import get_all_available_graphs_dataframe

available_graphs = get_all_available_graphs_dataframe()

Display the available graphs

In [9]:
available_graphs

Unnamed: 0,repository,graph_name,version
0,kghub,KGCOVID19,20200925
1,kghub,KGCOVID19,20200927
2,kghub,KGCOVID19,20200929
3,kghub,KGCOVID19,20201001
4,kghub,KGCOVID19,20201012
...,...,...,...
58159,kgobo,EMAPA,2021-09-01
58160,kgobo,PDRO,2021-06-08
58161,kgobo,HSAPDV,2020-03-10
58162,kgobo,SWO,swo.owl


Getting the number of graphs per repository

In [11]:
from collections import Counter

Counter(available_graphs["repository"])

Counter({'jax': 1,
         'kghub': 28,
         'kgobo': 133,
         'linqs': 3,
         'monarchinitiative': 2,
         'networkrepository': 1194,
         'pheknowlatorkg': 104,
         'string': 56691,
         'yue': 7,
         'zenodo': 1})

Programmatically retrieving a graph

In [12]:
from ensmallen.datasets import get_dataset
dataset_generator = get_dataset(
    graph_name="KGMicrobe",
    repository="kghub"
)

In [13]:
dataset_generator()

Downloading to graphs/kghub/KGM...kg-microbe.tar.gz:   0%|          | 0.00/30.9M [00:00<?, ?iB/s]