# Semantic Vocabularies

## What is the semantic web?

Everything and nothing... many things to many people. For our purposes, the semantic web is:

* A set of file formats (turtle: `.ttl`, RDF: `.xml`)
* A way to describe data which is both *flexible* and *strictly-defined*
* A way to have data linked across databases and the web (decentralized knowledge)

The semantic web desribes data in graphs. Graphs have nodes (aka vertices) and edges. You can think of the nodes as nouns and the edges as verbs, though this doesn't always align perfectly with real data. Here is an example:

The unit `kilogram` (node) belongs to (verb) the quantity kind `Mass` (noun).

The semantic web desribes data in *directed* graphs - the edges only go one way. This means that `Mass` doesn't belong to `kilogram`. To indicate the direction, we refer to the noun / edge / noun as a *triple* with a *subject*, *predicate*, and *object*.

* Subject: `kilogram`
* Predicate: `belongs to`
* Object: `mass`

## URIs and IRIs

We want to have *single, unambiguous, and resolvable* identifiers for nodes in our graph. We do that with an [IRI](https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier) - a fancy way of saying a URL. So instead of `kilogram` or `mass`, we would have:

* Subject: `https://vocab.sentier.dev/qudt/unit/KiloGM`
* Predicate: `belongs to`
* Object: `https://vocab.sentier.dev/qudt/quantity-kind/Mass`

## Ontologies

What about our verbs? And what types of nouns are allowed to be related to other nouns? An ontology defines *what kinds* of relationships and nouns are allowed in the way that we are organizing knowledge.

Ontologies can get philispohical and a bit difficult for non-experts to understand rather quickly, see e.g. [EMMO](https://github.com/emmo-repo/EMMO).

We are using SKOS - Simple Knowledge Organization System - as our main ontology in the Sentier.dev vocabulary. Here is the [SKOS Core Guide](https://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/).

SKOS is organized around [concepts](https://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/#secconcept). Let's take the example they give for `love` and translate it to `.ttl`:

In [1]:
from rdflib import Graph
from io import StringIO

In [2]:
love_in_xml = StringIO("""<rdf:RDF 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:skos="http://www.w3.org/2004/02/skos/core#">

  <skos:Concept rdf:about="http://www.example.com/concepts#love"/>
  
</rdf:RDF>""")

graph = Graph().parse(love_in_xml, format="xml")
print(graph.serialize())

@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

<http://www.example.com/concepts#love> a skos:Concept .




This is exactly the form we expect - subject (`<http://www.example.com/concepts#love>`) is `a` (predicate) `http://www.w3.org/2004/02/skos/core#Concept` (object).

You notice that the serialization created a `@prefix` of `skos`: This allows us to write `skos:Concept` instead of the full `http://www.w3.org/2004/02/skos/core#Concept`.

In `.ttl`, and in the query language [sparql](https://en.wikipedia.org/wiki/SPARQL), IRIs are escaped with `<` and `>`, like `<http://www.example.com/concepts#love>`.

Turtle (`.ttl`) sometimes uses common abbreviations for terms which are more explicit in XML. For example, in turtle we can write `a`, but this actually means `http://www.w3.org/1999/02/22-rdf-syntax-ns#about`. `http://www.w3.org/1999/02/22-rdf-syntax-ns#about` isn't part of the *SKOS* ontology, it is part of the foundational `RDF` ontology. This **is important** - we can integrate ontologies to create the best way to describe our systems.

Here is how we would write the same triple about love using `rdflib` in Python:

In [3]:
from rdflib import URIRef
from rdflib.namespace import RDF, SKOS

In [4]:
feelings = Graph()
feelings.add((
    URIRef("http://www.example.com/concepts#love"), # Subject
    RDF.type,
    SKOS.Concept
))
print(feelings.serialize())

@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

<http://www.example.com/concepts#love> a skos:Concept .




*(Why is it `RDF.type` and not `RDF.about`? Sometimes it's just like that...)*

## SKOS nodes and edges

The SKOS Core Guide does a pretty good of describing the following, so just look them up there:

* Concept
* ConceptScheme
* prefLabel
* altLabel
* notation
* note and its child properties
* broader
* narrower
* related

## Creating graphs

1. Use a tool like `rdflib` (but note that you can write `.ttl` manually - it's not that hard, really! See e.g. https://github.com/sentier-dev/sentier_vocab/blob/main/sentier_vocab/data/simapro.ttl, this was written by hand).
2. Create a `Graph` object
3. Uses namespaces to make your code easier to read, and quicker to type
4. Add triples with `.add((subject, predicate, object))`.
5. The datatypes of the subject, predicate, and object should all be `URIRef` or `Literal`:

In [5]:
from rdflib import Literal
from rdflib.namespace import XSD

Literal("Hello world"), Literal("Hoi Zäme", lang="de-ch"), Literal(39, datatype=XSD.integer)

(rdflib.term.Literal('Hello world'),
 rdflib.term.Literal('Hoi Zäme', lang='de-ch'),
 rdflib.term.Literal('39', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))

Here is part of the graph for kilogram from [vocab.sentier.dev](). Let's see how much we can understand:

```ttl
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix ns0: <http://qudt.org/schema/qudt/> .
@prefix qudt: <https://vocab.sentier.dev/qudt/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

qudt:
  skos:prefLabel "QUDT Schema - Version 2.1.42"@en ;
  a skos:ConceptScheme .

<https://vocab.sentier.dev/qudt/unit/LunarMass>
  skos:notation "M☾"^^ns0:symbol ;
  skos:prefLabel "Lunar mass"@en ;
  a skos:Concept ;
  skos:broaderTransitive <https://vocab.sentier.dev/qudt/unit/KiloGM> ;
  skos:broader <https://vocab.sentier.dev/qudt/unit/KiloGM> .

<https://vocab.sentier.dev/qudt/unit/KiloGM>
  ns0:hasQuantityKind <https://vocab.sentier.dev/qudt/quantity-kind/Mass> ;
  skos:narrower <https://vocab.sentier.dev/qudt/unit/LunarMass> ;
  ns0:conversionMultiplierSN 1.000000e+0 ;
  ns0:conversionMultiplier 1.0 ;
  skos:prefLabel "كيلوغرام"@ar, "キログラム"@ja, "quilograma"@pt, "kilogramo"@es, "کیلوگرم"@fa, "χιλιόγραμμο"@el, "kilogram"@tr, "kilogram"@sl, "kilogram"@ro, "kilogram"@pl, "kilogram"@ms, "kilogram"@en, "kilogram"@cs, "किलोग्राम"@hi, "килограмм"@ru, "קילוגרם"@he, "kilogramm*"@hu, "公斤"@zh, "килограм"@bg, "kilogramme"@fr, "chilogrammo"@it, "Kilogramm"@de, "chiliogramma"@la ;
  skos:notation "KGM"^^ns0:uneceCommonCode, "kg"^^ns0:symbol, "0112/2///62720#UAD720"^^ns0:iec61360Code, "kg"^^ns0:ucumCode, "0112/2///62720#UAA594"^^ns0:iec61360Code ;
  skos:note "The kilogram or kilogramme (SI symbol: kg), also known as the kilo, is the base unit of mass in the International System of Units and is defined as being equal to the mass of the International Prototype Kilogram (IPK), which is almost exactly equal to the mass of one liter of water. The avoirdupois (or international) pound, used in both the Imperial system and U.S. customary units, is defined as exactly 0.45359237 kg, making one kilogram approximately equal to 2.2046 avoirdupois pounds." ;
  skos:exactMatch <https://glossary.ecoinvent.org/ids/487df68b-4994-4027-8fdc-a4dc298257b7>, <https://vocab.sentier.dev/simapro/unit/kg>, <http://qudt.org/vocab/unit/KiloGM>, <https://si-digital-framework.org/SI/units/kilogram> ;
  skos:related "http://dbpedia.org/resource/Kilogram"^^xsd:anyURI ;
  skos:broaderTransitive <https://vocab.sentier.dev/qudt/quantity-kind/Mass> ;
  a skos:Concept ;
  ns0:informativeReference "http://en.wikipedia.org/wiki/Kilogram?oldid=493633626"^^xsd:anyURI ;
  skos:inScheme qudt: ;
  skos:definition "The kilogram or kilogramme (SI symbol: kg), also known as the kilo, is the base unit of mass in the International System of Units and is defined as being equal to the mass of the International Prototype Kilogram (IPK), which is almost exactly equal to the mass of one liter of water. The avoirdupois (or international) pound, used in both the Imperial system and U.S. customary units, is defined as exactly 0.45359237 kg, making one kilogram approximately equal to 2.2046 avoirdupois pounds."^^rdf:HTML ;
  ns0:hasDimensionVector <http://qudt.org/vocab/dimensionvector/A0E0L0I0M1H0T0D0> ;
  ns0:applicableSystem <http://qudt.org/vocab/sou/CGS>, <http://qudt.org/vocab/sou/SI>, <http://qudt.org/vocab/sou/CGS-GAUSS>, <http://qudt.org/vocab/sou/CGS-EMU> ;
  skos:broader <https://vocab.sentier.dev/qudt/quantity-kind/Mass> .

<https://vocab.sentier.dev/qudt/quantity-kind/Mass>
  skos:prefLabel "kütle"@tr, "tömeg"@hu, "جرم"@fa, "masă"@ro, "Μάζα"@el, "भार"@hi, "质量"@zh, "masa"@sl, "massa"@pt, "masa"@pl, "massa"@la, "質量"@ja, "מסה"@he, "Маса"@bg, "masa"@es, "كتلة"@ar, "massa"@it, "Masse"@de, "Mass"@en, "Масса"@ru, "Jisim"@ms, "Hmotnost"@cs, "masse"@fr ;
  skos:narrower <https://vocab.sentier.dev/qudt/unit/KiloGM> ;
  a skos:Concept ;
  skos:narrowerTransitive <https://vocab.sentier.dev/qudt/unit/KiloGM> .
```

**Exercise**: Build an serialize a graph for hydrogen.

* Create the concept hydrogen (not label!) as a URI
* Give it a preferred label in at least two languages
* Give it an alternative label
* Find hydrogen in https://vocab.sentier.dev/products/en/ and create a link to that URI
* Find hydrogen in ChEBI and create a link to that URI
* Find hydrogen in Wikipedia and create a link to that URI
* Find hydrogen in the Open Energy Ontology and create a link to that URI

Then do the same thing with one of green/blue/grey/pink/yellow hydrogen, and create a narrower or broader link between your colored hydrogen and the base hydrogen.

Bonus points: Use `skosify.infer.skos_hierarchical` to get reciprocal relationships.

## Filtering the graph

The graph you have created is filled with subject/predicate/object triples. You can [iterate over the graph](https://rdflib.readthedocs.io/en/stable/intro_to_graphs.html#graphs-as-iterators) and return data with the `.triples()` method. You would almost always use [some type of filter](https://rdflib.readthedocs.io/en/stable/intro_to_graphs.html#basic-triple-matching); the simplest is to return all data:

In [6]:
for s, p, o in feelings.triples((None, None, None)):
    print(s, p, o)

http://www.example.com/concepts#love http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2004/02/skos/core#Concept


Filters are just the value you are looking for; to do something more complicated (like "subject starts with the letter "l") you need to do stuff client-side, or learn to write sparql and use that to query the graph.

In [7]:
for s, p, o in feelings.triples((None, None, SKOS.Concept)):
    print(s)

http://www.example.com/concepts#love


**Exercise**: Query the hydrogen graph you created to print all the links of `SKOS.related` objects for hydrogen.

## Using our existing vocabulary

We can retrieve triples and the graph from our existing vocab using `sentier_data_tools`:

In [8]:
from sentier_data_tools import ProductIRI, UnitIRI

In [10]:
UnitIRI("https://vocab.sentier.dev/qudt/unit/KiloGM").triples(limit=5)

[2m06:27:21[0m [[32m[1minfo     [0m] [1mRetrieved 5 triples from https://fuseki.d-d-s.ch/skosmos/query[0m


[(rdflib.term.URIRef('https://vocab.sentier.dev/qudt/unit/KiloGM'),
  rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
  rdflib.term.URIRef('http://www.w3.org/2004/02/skos/core#Concept')),
 (rdflib.term.URIRef('https://vocab.sentier.dev/qudt/unit/KiloGM'),
  rdflib.term.URIRef('http://www.w3.org/2004/02/skos/core#prefLabel'),
  rdflib.term.Literal('كيلوغرام', lang='ar')),
 (rdflib.term.URIRef('https://vocab.sentier.dev/qudt/unit/KiloGM'),
  rdflib.term.URIRef('http://www.w3.org/2004/02/skos/core#prefLabel'),
  rdflib.term.Literal('килограм', lang='bg')),
 (rdflib.term.URIRef('https://vocab.sentier.dev/qudt/unit/KiloGM'),
  rdflib.term.URIRef('http://www.w3.org/2004/02/skos/core#prefLabel'),
  rdflib.term.Literal('kilogram', lang='cs')),
 (rdflib.term.URIRef('https://vocab.sentier.dev/qudt/unit/KiloGM'),
  rdflib.term.URIRef('http://www.w3.org/2004/02/skos/core#prefLabel'),
  rdflib.term.Literal('Kilogramm', lang='de'))]

In [11]:
ProductIRI("http://data.europa.eu/xsp/cn2024/160557000080").graph()

[2m06:28:02[0m [[32m[1minfo     [0m] [1mRetrieved 73 triples from https://fuseki.d-d-s.ch/skosmos/query[0m


<Graph identifier=N843a51f1777044ee97209e4ffad02310 (<class 'rdflib.graph.Graph'>)>

## Conclusions

* Semantic web tech stack is still relatively difficult to use, and writing sparql queries can twist your mind in uncomfortable ways
* Large institutions are committing to publishing semantic web vocabularies (see e.g. https://showvoc.op.europa.eu/#/datasets and https://www.fao.org/statistics/caliper/about/en)
* Semantic web allows to build on existing vocabularies and taxonomies to supplement them or link them to our concepts and data
* Semantic web' combination of strictness (with formally defined ontologies) and flexibility (with what you can add and express) is a powerful way to solve some fundamental LCA data problems.