<a href="https://colab.research.google.com/github/AnacletoLAB/grape/blob/main/tutorials/Ensmallen_Automatic_graph_retrieval_utilities.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Automatic graph retrieval utilities
In this tutorial we will explore the utilities available for the automatic graph retrieval.

## Installing the library
First of all, we install the GRAPE library:

In [1]:
!pip install -qU grape

## A few of the available methods
We will rapidly present the available methods that allow to programmatically retrieve the available datasets.

Retrieve the list of the available graph repositories, that is the sources from where we can scrape graphs currently

In [2]:
from grape.datasets import get_available_repositories
get_available_repositories()

['freebase',
 'kghub',
 'linqs',
 'pheknowlatorkg',
 'wikidata',
 'zenodo',
 'jax',
 'monarchinitiative',
 'string',
 'yue',
 'kgobo',
 'networkrepository',
 'wikipedia']

Retrieving the list of graphs available from a given repository

In [3]:
from grape.datasets import get_available_graphs_from_repository
get_available_graphs_from_repository("kghub")

dict_keys(['SLDB', 'KGMicrobe', 'KGIDG', 'KGPhenio', 'KGCOVID19', 'EcoKG'])

Retrieving the list of graph versions available from a given graph and repository

In [4]:
from grape.datasets import get_available_versions_from_graph_and_repository
get_available_versions_from_graph_and_repository(
    name="KGCOVID19",
    repository="kghub"
)

['20200925',
 '20200927',
 '20200929',
 '20201001',
 '20201012',
 '20201101',
 '20201202',
 '20210101',
 '20210128',
 '20210201',
 '20210218',
 '20210301',
 '20210412',
 '20210725',
 '20210726',
 '20210727',
 '20210823',
 '20210902',
 '20211002',
 '20211102',
 '20211202',
 '20220102',
 '20220202',
 '20220217',
 '20220223',
 '20220225',
 '20220228',
 '20220328',
 '20220330',
 '20220402',
 '20220502',
 '20220610',
 '20220702',
 'current']

Getting all the available datasets dataframe

In [5]:
from grape.datasets import get_all_available_graphs_dataframe

available_graphs = get_all_available_graphs_dataframe(verbose=False)

Display the available graphs

In [6]:
available_graphs

Unnamed: 0,repository,name,version
0,freebase,FreeBase,latest
1,freebase,FreeBase2WikiData,latest
2,kghub,SLDB,20220522
3,kghub,KGMicrobe,20210422
4,kghub,KGMicrobe,20210517
...,...,...,...
83639,wikipedia,WikiSourceHR,20220601
83640,wikipedia,WikiSourceHR,20220620
83641,wikipedia,WikiSourceHR,20220701
83642,wikipedia,WikiSourceHR,20220720


Getting the number of graphs per repository

In [7]:
from collections import Counter

Counter(available_graphs["repository"])

Counter({'freebase': 2,
         'kghub': 92,
         'linqs': 3,
         'pheknowlatorkg': 116,
         'wikidata': 22,
         'zenodo': 163,
         'jax': 1,
         'monarchinitiative': 23,
         'string': 75590,
         'yue': 7,
         'kgobo': 462,
         'networkrepository': 1194,
         'wikipedia': 5969})

Programmatically retrieving a graph

In [8]:
from grape.datasets import get_dataset
dataset_generator = get_dataset(
    name="KGMicrobe",
    repository="kghub"
)

In [None]:
dataset_generator()