# Using Building Data in RDF & SPARQL 


A quick hands-on introduction to LinkedData using the RDFLib ecosystem.


This notebook was created for the LDAC 2022 summer school it heavily leans on many peoples work. 
Most importantly, it is following the nice tutorials by Jörg Schad, thank you https://github.com/joerg84 for sharing your code



# Setup the environment

![RDFLib](img/rdflib-packages.png)

this tutorial is organized around the excellent [RDFLib ecosystem](https://rdflib.dev/) which contains manz usefull tools to get startet with Linked Data. Note however, that performance and scalability might not allways meet real-world requirements.


In [None]:
import rdflib
from rdflib import Graph
from rdflib.namespace import DC, RDF, FOAF, RDFS
from rdflib import URIRef, BNode, Literal
import networkx as nx
import io
import pydotplus
from IPython.display import display, Image
from rdflib.tools.rdf2dot import rdf2dot
from pprint import pprint

In [None]:
# Helper function for vizualizing RDF graph
def visualize(g):
    stream = io.StringIO()
    rdf2dot(g, stream, opts = {display})
    dg = pydotplus.graph_from_dot_data(stream.getvalue())
    png = dg.create_png()
    display(Image(png)) 

# A simple building graph

In [None]:
g = Graph()


# Graph using TURTLE syntax
turtle = """
@prefix : <http://www.ldac.org/ns/building1#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> . 

:Building :hasStorey :GroundFloor .
:GroundFloor :hasElement :Wall1 ;
      :height    "3.5"^^xsd:float .
      
      
:Wall1 :hasMaterial :Brick ;
      :isExternal    "false"^^xsd:boolean ;
      :hasOpening :Window1 .
:Window1 :hasMaterial :Wood ;
      :hasHost    :Wall1 .

"""
g.parse(data=turtle, format="turtle")

Let us print all tripes:

In [None]:
#print all triples
for s, p, o in g:
   pprint((s, p, o))

As this is hard to read, let us visualize the RDF graph:

In [None]:
visualize(g)



Let's get everying we know about wall 1:

In [None]:
from pprint import pprint
# Lookup Jane by global identifier
wall1 = URIRef('http://www.ldac.org/ns/building1#Wall1')
pprint([o for o in g.predicate_objects(subject=wall1)])

# Explict construction of a  Graph

Instead of providing files serialized in long strings, let's create a graph explicitly. This means constructing exlicit nodes and edges (predicates). 

Nodes can have different on of three types:
- URI, 
- B(lank)Node, or 
- Literals.

We will recreate the BOT example 5 from the [documentation](https://w3c-lbd-cg.github.io/bot/#example-5) 

![grafik.png](img/bot-image-doc.png)


In [None]:
from rdflib import Namespace
bot = Namespace ("https://w3id.org/bot#")
ldac = Namespace ("https://ldac.org/building2/")
g = Graph()
g.bind("bot", bot, False)
g.bind("ldac", ldac, False)

In [None]:
SiteA = URIRef("https://ldac.org/building2/SiteA")

BuildingA = URIRef("https://ldac.org/building2/BuildingA")
Storey00 = URIRef("https://ldac.org/building2/Storey00")
Storey01 = URIRef("https://ldac.org/building2/Storey01")

SpaceA = URIRef("https://ldac.org/building2/SpaceA")
SpaceB = URIRef("https://ldac.org/building2/SpaceB")
SpaceC = URIRef("https://ldac.org/building2/SpaceC")
SpaceD = URIRef("https://ldac.org/building2/SpaceD")

Storey01 = URIRef("https://ldac.org/building2/Storey01")



g.add( (SiteA, RDF.type, bot.Site) )
g.add( (SiteA, bot.hasBuilding, BuildingA) )
g.add( (BuildingA, bot.hasStorey, Storey00) )
g.add( (BuildingA, bot.hasStorey, Storey01) )

g.add( (BuildingA, RDF.type, bot.Building) )
g.add( (Storey00, bot.hasSpace, SpaceA) )
g.add( (Storey00, bot.hasSpace, SpaceB) )


g.add( (Storey01, bot.hasSpace, SpaceC) )
g.add( (Storey01, bot.hasSpace, SpaceD) )

g.add( (SpaceA, RDF.type, bot.Space) )
#g.add( (SpaceB, RDF.type, bot.Space) )
#g.add( (SpaceC, RDF.type, bot.Space) )
#g.add( (SpaceD, RDF.type, bot.Space) )

#print all triples
for s, p, o in g:
   pprint((s, p, o))

# Visualize the graph for easy interpretation
visualize(g)

In [None]:
g.serialize("mini_bot.ttl", format="turtle")

# Query the BOT graph with SPARQL

## List all facts (s,p,o triples)

In [None]:

result = g.query(
    """SELECT *
  WHERE
  {?s ?p ?o}
  LIMIT 10
""")

# Output result
for row in result:
    pprint(row)


## Retrieve dedicated relations
We can leverage URIs, variables, and predicates to specify pattern we are looking for. 

In this case we want to retreive the spaces associated to the storeys.

Note how we are making the BOT namespace available by the `initNS` statement

In [None]:
result = g.query(
    """SELECT DISTINCT ?a ?b
       WHERE {
          ?a bot:hasSpace ?b .
          
       }""",  initNs={ 'bot': bot })

# Output result
for row in result:
    print("%s has Space %s" % row)

# Import external Data

There are a large number of RDF data sources available on the web, which we can leverage:

In [None]:
from rdflib import Namespace
bot = Namespace ("https://w3id.org/bot#")
duplex = Namespace ("https://ldac.org/duplex/")

g1 = rdflib.Graph()
g1.parse("https://raw.githubusercontent.com/TechnicalBuildingSystems/OpenSmartHomeData/master/00_OpenSmartHomeData.ttl", format="turtle")

print("Graph has %s statements." % len(g1))

# print all tuples
for s, p, o in g1:
   print((s, p, o))

In [None]:
g1

# Turn IFC spaces in to a BOT Graph

In [None]:
import ifcopenshell
model = ifcopenshell.open("data/Duplex_A.ifc")

Let us look at what we are dealing with. Only execute the cell below if you have about 1 min time to wait for the rendering

In [None]:
from utils.JupyterIFCRenderer import JupyterIFCRenderer
viewer = JupyterIFCRenderer(model, size=(400,300))
viewer

In [None]:
# hide all elements except for spaces

for element in model.by_type("IfcProduct"):
    if not element.is_a() == "IfcSpace": 
        viewer.setVisible(element, False)

### Spaces and storeys in IFC

Let us take a quick glance at how spaces and storeys are related as documente in [IfcSpace  documentation](https://standards.buildingsmart.org/IFC/RELEASE/IFC4_1/FINAL/HTML/schema/ifcproductextension/lexical/ifcspace.htm) 

![ifcbuildingstory](img/ifcbuildingstorey-spatialstructure.png)

In [None]:
dg = Graph() #our duplex-graph
from rdflib import Namespace
bot = Namespace ("https://w3id.org/bot#")
duplex = Namespace ("https://ldac.org/duplex/")
dg.namespace_manager.bind("bot", bot, False)
dg.namespace_manager.bind("duplex", duplex, False)

In [None]:
storeys = model.by_type("IfcBuildingStorey")
#pprint(storeys[0].get_info())
for s in storeys:
    dg.add((URIRef(duplex+f"{s.Name.replace(' ', '_')}"), RDF.type, bot.Storey))

In [None]:
s = storeys[0]


In [None]:
spaces = model.by_type("IfcSpace")
#pprint(spaces[0].get_info())
for space in spaces:
    spacenode=URIRef(duplex+f"{space.LongName.replace(' ','_')}_{space.Name}")
    dg.add((spacenode, RDF.type, bot.Space))
    dg.add((spacenode, bot.hasSpace, URIRef(duplex+f"{space.Decomposes[0].RelatingObject.Name.replace(' ', '_')}")))
    

In [None]:
dg.serialize("duplex.n3")

In [None]:
spac = spaces[0]
spac.Decomposes[0].RelatingObject

In [None]:
visualize(dg)

RDF Schema allows to specify classes and hierachies. These hierachies can be leverages for reasoning/inference. 

# Use SPAQRL to query DBpedia

[DBpedia](https://wiki.dbpedia.org/) is a semantic version of Wikipedia. 

Let us query DBpedia to identify birthdays of architects and their buildngs(adapted from https://open.hpi.de/courses/knowledgegraphs2020).

In [None]:
from datetime import datetime
from SPARQLWrapper import SPARQLWrapper, JSON, XML, N3, RDF, CSV

In [None]:
sparql = SPARQLWrapper("http://dbpedia.org/sparql") #determine SPARQL endpoint

In [None]:
# retrieve architects born today (bif:curdate) and their builidngs 

sparql.setQuery("""
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc:  <http://purl.org/dc/elements/1.1/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbc: <http://dbpedia.org/resource/Category:>

SELECT DISTINCT ?building, ?name, ?floors
WHERE
{
    ?building a dbo:Building .
    ?building dbo:floorCount ?floors .
    ?building rdfs:label ?name
    FILTER(LANGMATCHES(LANG(?name),'en'))
    FILTER (?floors >= 100)
}
ORDER BY ?floors
LIMIT 100 
""")

sparql.setReturnFormat(JSON)   # Return format is JSON
results = sparql.query().convert()   # execute SPARQL query and write result to "results"

In [None]:
# retrieve architects born today (bif:curdate) and their builidngs 

sparql.setQuery("""
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc:  <http://purl.org/dc/elements/1.1/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbc: <http://dbpedia.org/resource/Category:>

Select distinct ?birthdate ?architect ?name ?building ?location WHERE {
?architect rdf:type dbo:Architect ;
        dbo:birthDate ?birthdate ;
        rdfs:label ?name
     OPTIONAL {?building dbp:architect ?architect .}
 FILTER ((lang(?name)="en")&&(STRLEN(STR(?birthdate))>6)&&(SUBSTR(STR(?birthdate),6)=SUBSTR(STR(bif:curdate('')),6))) .
} ORDER BY ?birthdate
""")

sparql.setReturnFormat(JSON)   # Return format is JSON
results = sparql.query().convert()   # execute SPARQL query and write result to "results"

In [None]:
from pprint import pprint
pprint(results)