# Leveraging the `kglab` abstraction layer

Let's try the previous examples – i.e., building a simple recipe KG – again, but this time using the [`kglab`](https://github.com/DerwenAI/kglab) library to make things a wee bit easier...

In [1]:
import kglab

namespaces = {
    "wtm": "http://purl.org/heals/food/",
    "ind": "http://purl.org/heals/ingredient/",
    }

kg = kglab.KnowledgeGraph(
    name = "A recipe KG example based on Food.com",
    base_uri = "https://www.food.com/recipe/",
    language = "en",
    namespaces = namespaces,
    )

Once we have the `kg` object instantiated, we can use its short-cuts for the food-related vocabularies.
Then construct the graph from a sequence of triples:

In [2]:
import rdflib as rdf
from rdflib.namespace import RDF, XSD

node = rdf.URIRef("https://www.food.com/recipe/327593")

kg.add(node, RDF.type, kg.get_ns("wtm").Recipe)
kg.add(node, kg.get_ns("wtm").hasCookTime, rdf.Literal("8", datatype=XSD.integer))
kg.add(node, kg.get_ns("wtm").hasIngredient, kg.get_ns("ind").ChickenEgg)
kg.add(node, kg.get_ns("wtm").hasIngredient, kg.get_ns("ind").CowMilk)
kg.add(node, kg.get_ns("wtm").hasIngredient, kg.get_ns("ind").WholeWheatFlour)

Iterating through these triples shows their full URLs:

In [3]:
for s, p, o in kg._g:
    print(s, p, o)

https://www.food.com/recipe/327593 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://purl.org/heals/food/Recipe
https://www.food.com/recipe/327593 http://purl.org/heals/food/hasIngredient http://purl.org/heals/ingredient/ChickenEgg
https://www.food.com/recipe/327593 http://purl.org/heals/food/hasCookTime 8
https://www.food.com/recipe/327593 http://purl.org/heals/food/hasIngredient http://purl.org/heals/ingredient/WholeWheatFlour
https://www.food.com/recipe/327593 http://purl.org/heals/food/hasIngredient http://purl.org/heals/ingredient/CowMilk


We can serialize the graph in N3 format to a string, as another way to exam the triples:

In [4]:
s = kg._g.serialize(format="n3")
print(s.decode("utf8"))

@prefix ind: <http://purl.org/heals/ingredient/> .
@prefix wtm: <http://purl.org/heals/food/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://www.food.com/recipe/327593> a wtm:Recipe ;
    wtm:hasCookTime 8 ;
    wtm:hasIngredient ind:ChickenEgg,
        ind:CowMilk,
        ind:WholeWheatFlour .




Overall, with the `kglab` library the KG serialization to multiple formats (Turtle, XML, JSON-LD, etc.) becomes much simpler:

In [5]:
kg.save_ttl("tmp.ttl")

In [6]:
kg.save_ttl("tmp.xml", format="xml")

In [7]:
kg.save_jsonld("tmp.jsonld")

Try opening these files to confirm their serialized contents.

Next, we'll use the [Parquet](https://parquet.apache.org/) format for *columnar storage*.
Use of this technology has been especially effective for Big Data frameworks handling data management and analytics efficiently.
It's simple to *partitioned* into multiple files (e.g., for distributed processing per partition) and the columns can be selectively decompressed on file reads (e.g., for [*predicate pushdown*](https://medium.com/microsoftazure/data-at-scale-learn-how-predicate-pushdown-will-save-you-money-7063b80878d7) optimizations).

In [8]:
kg.save_parquet("tmp.parquet")

Let's compare the relative files sizes for these formats:

In [9]:
import pandas as pd
import os

file_paths = ["tmp.jsonld", "tmp.ttl", "tmp.xml", "tmp.parquet"]
file_sizes = [os.path.getsize(file_path) for file_path in file_paths]

df = pd.DataFrame({"file_path": file_paths, "file_size": file_sizes})
df

Unnamed: 0,file_path,file_size
0,tmp.jsonld,779
1,tmp.ttl,314
2,tmp.xml,667
3,tmp.parquet,3632


Parquet uses compression based on a "dictionary" approach, so it added overhead for small files such as this KG.
We'll revisit this comparison across file formats again with a larger KG.

---

## Exercises

**Exercise 1:**
    
Using the `kglab` library, extend the graph by adding another recipe, such as *German Egg Pancakes* <https://www.food.com/recipe/406738>  then serialize out to the three file formats again.
How do the relative file sizes compare as the size of the graph grows?