# Graph Algebra with `kglab`

## intro
`kglab` provides tools to access graph data from multiple source to build a `KnowledgeGraph` that can be easily used by data scientists. For a thorough explanation of how to use triples-stored data and how to load this data into `kglab` please see examples in the `examples/` directory. The examples in this directory (`examples/graph_algebra/`) will care to introduce graph algebra capabilities to be used on the graphs the user has loaded. 

## basic load and querying
In particular, once your data is loaded in a `KnowledgeGraph` with something like:

1. Instantiate a graph from a dataset:

In [1]:
# for use in tutorial and development; do not include this `sys.path` change in production:
import sys ; sys.path.insert(0, "../../")
import warnings
warnings.filterwarnings('ignore')

from os.path import dirname
import kglab
import os

namespaces = {
    "foaf": "http://xmlns.com/foaf/0.1/",
    "gorm": "http://example.org/sagas#",
    "rel":  "http://purl.org/vocab/relationship/",
    }

kg = kglab.KnowledgeGraph(
    name = "Happy Vikings KG example for SKOS/OWL inference",
    namespaces=namespaces,
    )

kg.load_rdf(dirname(dirname(os.getcwd())) + "/dat/gorm.ttl")

<kglab.kglab.KnowledgeGraph at 0x7f81f001ddf0>


2. It is possible to create a subgraph by providing a SPARQL query, by defining a "subject" and "object":


In [2]:
query = """SELECT ?subject ?object
WHERE {
    ?subject rdf:type gorm:Viking .
    ?subject gorm:childOf ?object .
}
"""


## define a subgraph
In this case we are looking for the network of parent-child relations among members of Vikings family.

With this query we can define a __*subgraph* so to have access to *graph algebra* capabilities__: 

In [3]:
from kglab.subg import SubgraphMatrix

subgraph = SubgraphMatrix(kg=kg, sparql=query)


## compute Adjacency matrices
Let's compute the first basic adjacency matrix (usually noted with `A`):

In [4]:
adj_matrix = subgraph.to_adjacency()
adj_matrix

array([[0., 1., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 0.]])

what happened here is that all the subjects and objects have been turned into integer indices from 0 to number of nodes. So we can see that the entity with index 0 is adjancent (is connected, has a directed edge) to the entity with index 1. This is a directed graph because the relationship `gorm:childOf` goes from child to parent, let's turn this into an undirected graph so to see the relation in a more symmetric way (both the child-parent and parent-child).

We can check the labels attached to the matrix's indices with:

In [5]:
for i in range(adj_matrix.shape[0]):
    print(
        subgraph.inverse_transform(i)  # returns a label from an index
    )

http://example.org/sagas#Astrid
http://example.org/sagas#Leif
http://example.org/sagas#Bodil
http://example.org/sagas#Bjorn
http://example.org/sagas#Gorm


We can see from the matrix, assigning labels to the indices, for examples that: Leif and Bodil are child of Astrid.

This is one of the great functionality provided by the semantic layer (data that is represented by W3C Linked Data standard), to represent relationships in both human-understandable and machine-readable way.

## other relevant matrices for a graph

To compute the *vertices degrees matrix* we need to port our directed graph (semantic data graph are always directed as by design triples are `subject->relation->object`) into an undirected ones. This obviously preserve the existence of the relationships but not their direction.

In [6]:
undirected_adj_mtx = subgraph.to_undirected()
undirected_adj_mtx

array([[0., 1., 1., 0., 0.],
       [1., 0., 0., 1., 0.],
       [1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 1.],
       [0., 0., 0., 1., 0.]])

We can see now the relationship is a generic symmetric "parenthood" relations, not just a child-parent directed relationship. We can still say that: Leif and Bodil and Astrid are first-degree kins (parent-child or siblings). 

Same easy way we can compute the vertices degrees matrix:

In [7]:
laplacian = subgraph.to_laplacian()
laplacian

array([[ 2, -1, -1,  0,  0],
       [-1,  2,  0, -1,  0],
       [-1,  0,  1,  0,  0],
       [ 0, -1,  0,  2, -1],
       [ 0,  0,  0, -1,  1]])

In [8]:
subgraph.describe()

{'n_nodes': 5,
 'n_edges': 4,
 'center_msg': 'Found infinite path length because the digraph is not strongly connected',
 'diameter_msg': 'Found infinite path length because the digraph is not strongly connected',
 'eccentricity_msg': 'Found infinite path length because the digraph is not strongly connected',
 'center': None,
 'diameter': None,
 'eccentricity': None}