# Mapping done three ways

The notebook below shows how mapping can be done in three ways with maplib:
- OTTR templates using the stOTTR syntax
- Programmatic OTTR Templates
- The PVALUES construction

First, some common dependencies and a bit of formatting config. 

In [1]:
from maplib import Mapping
import polars as pl
pl.Config.set_fmt_str_lengths(100).set_tbl_rows(5)
None

In [2]:
MY_PREFIX = "https://data-treehouse.com/examples#"

Common to all approaches is reading and preprocessing the IRIS dataset.

In [3]:
df = pl.read_csv("iris.csv")
df

sepal_length,sepal_width,petal_length,petal_width,variety
f64,f64,f64,f64,str
5.1,3.5,1.4,0.2,"""Setosa"""
4.9,3.0,1.4,0.2,"""Setosa"""
4.7,3.2,1.3,0.2,"""Setosa"""
…,…,…,…,…
6.2,3.4,5.4,2.3,"""Virginica"""
5.9,3.0,5.1,1.8,"""Virginica"""


Next, we create some IRIs

In [4]:
df = df.with_columns(
    # This makes an IRI that combines the variety, and the cumulative count of that variety
    (pl.lit(MY_PREFIX) + pl.col("variety") + pl.cum_count("variety").over("variety").cast(pl.String)).alias("id"),
    # This makes an IRI for the types 
    (pl.lit(MY_PREFIX) + pl.col("variety")).alias("type")
)
df

sepal_length,sepal_width,petal_length,petal_width,variety,id,type
f64,f64,f64,f64,str,str,str
5.1,3.5,1.4,0.2,"""Setosa""","""https://data-treehouse.com/examples#Setosa1""","""https://data-treehouse.com/examples#Setosa"""
4.9,3.0,1.4,0.2,"""Setosa""","""https://data-treehouse.com/examples#Setosa2""","""https://data-treehouse.com/examples#Setosa"""
4.7,3.2,1.3,0.2,"""Setosa""","""https://data-treehouse.com/examples#Setosa3""","""https://data-treehouse.com/examples#Setosa"""
…,…,…,…,…,…,…
6.2,3.4,5.4,2.3,"""Virginica""","""https://data-treehouse.com/examples#Virginica49""","""https://data-treehouse.com/examples#Virginica"""
5.9,3.0,5.1,1.8,"""Virginica""","""https://data-treehouse.com/examples#Virginica50""","""https://data-treehouse.com/examples#Virginica"""


Now, we create a DataFrame containing instance information.
We can drop "variety", since that is a property of the types. 

In [5]:
df_instances = df.drop("variety")
df_instances

sepal_length,sepal_width,petal_length,petal_width,id,type
f64,f64,f64,f64,str,str
5.1,3.5,1.4,0.2,"""https://data-treehouse.com/examples#Setosa1""","""https://data-treehouse.com/examples#Setosa"""
4.9,3.0,1.4,0.2,"""https://data-treehouse.com/examples#Setosa2""","""https://data-treehouse.com/examples#Setosa"""
4.7,3.2,1.3,0.2,"""https://data-treehouse.com/examples#Setosa3""","""https://data-treehouse.com/examples#Setosa"""
…,…,…,…,…,…
6.2,3.4,5.4,2.3,"""https://data-treehouse.com/examples#Virginica49""","""https://data-treehouse.com/examples#Virginica"""
5.9,3.0,5.1,1.8,"""https://data-treehouse.com/examples#Virginica50""","""https://data-treehouse.com/examples#Virginica"""


In [6]:
df_types = df.select("type", "variety").unique()
df_types

type,variety
str,str
"""https://data-treehouse.com/examples#Versicolor""","""Versicolor"""
"""https://data-treehouse.com/examples#Setosa""","""Setosa"""
"""https://data-treehouse.com/examples#Virginica""","""Virginica"""


## Mapping using stOTTR syntax

We first use the `expand_default` method to get a starting point for our template. 

In [7]:
m_def = Mapping()
tpl_def_inst = m_def.expand_default(df_instances, "id")
print(tpl_def_inst)

<urn:maplib_default:default_template_0> [
     <http://www.w3.org/2001/XMLSchema#double> ?sepal_length, 
     <http://www.w3.org/2001/XMLSchema#double> ?sepal_width, 
     <http://www.w3.org/2001/XMLSchema#double> ?petal_length, 
     <http://www.w3.org/2001/XMLSchema#double> ?petal_width, 
     <http://ns.ottr.xyz/0.4/IRI> ?id, 
     <http://ns.ottr.xyz/0.4/IRI> ?type ] :: {
  ottr:Triple(?id, <urn:maplib_default:sepal_length>, ?sepal_length) ,
  ottr:Triple(?id, <urn:maplib_default:sepal_width>, ?sepal_width) ,
  ottr:Triple(?id, <urn:maplib_default:petal_length>, ?petal_length) ,
  ottr:Triple(?id, <urn:maplib_default:petal_width>, ?petal_width) ,
  ottr:Triple(?id, <urn:maplib_default:type>, ?type)
} . 



In [8]:
tpl_def_type = m_def.expand_default(df_types, "type")
print(tpl_def_type)

<urn:maplib_default:default_template_1> [
     <http://ns.ottr.xyz/0.4/IRI> ?type, 
     <http://www.w3.org/2001/XMLSchema#string> ?variety ] :: {
  ottr:Triple(?type, <urn:maplib_default:variety>, ?variety)
} . 



We run a query to check what was just added, but the default predicate names can be improved. 

Note that the data types have been inferred from the column types and contents. 

In [9]:
m_def.query("""
SELECT ?a ?b ?c WHERE {
    ?a ?b ?c .
}
""")

a,b,c
str,str,str
"""<https://data-treehouse.com/examples#Setosa1>""","""<urn:maplib_default:petal_width>""","""""0.2""^^<http://www.w3.org/2001/XMLSchema#double>"""
"""<https://data-treehouse.com/examples#Setosa10>""","""<urn:maplib_default:petal_width>""","""""0.1""^^<http://www.w3.org/2001/XMLSchema#double>"""
"""<https://data-treehouse.com/examples#Setosa11>""","""<urn:maplib_default:petal_width>""","""""0.2""^^<http://www.w3.org/2001/XMLSchema#double>"""
…,…,…
"""<https://data-treehouse.com/examples#Virginica8>""","""<urn:maplib_default:sepal_length>""","""""7.3""^^<http://www.w3.org/2001/XMLSchema#double>"""
"""<https://data-treehouse.com/examples#Virginica9>""","""<urn:maplib_default:sepal_length>""","""""6.7""^^<http://www.w3.org/2001/XMLSchema#double>"""


Using these defaults as a starting point, we create the following templates.
The stOTTR syntax is specified [here](https://spec.ottr.xyz/stOTTR/0.1.4/)

In [10]:
tpls = """
@prefix ex:   <https://data-treehouse.com/examples#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

ex:InstanceTemplate [
     <http://www.w3.org/2001/XMLSchema#double> ?sepal_length, 
     <http://www.w3.org/2001/XMLSchema#double> ?sepal_width, 
     <http://www.w3.org/2001/XMLSchema#double> ?petal_length, 
     <http://www.w3.org/2001/XMLSchema#double> ?petal_width, 
     <http://ns.ottr.xyz/0.4/IRI> ?id, 
     <http://ns.ottr.xyz/0.4/IRI> ?type ] :: {
  ottr:Triple(?id, ex:sepalLength, ?sepal_length) ,
  ottr:Triple(?id, ex:sepalWidth, ?sepal_width) ,
  ottr:Triple(?id, ex:petalLength, ?petal_length) ,
  ottr:Triple(?id, ex:petalWidth, ?petal_width) ,
  ottr:Triple(?id, a, ?type)
} . 

ex:TypeTemplate [
     <http://ns.ottr.xyz/0.4/IRI> ?type, 
     <http://www.w3.org/2001/XMLSchema#string> ?variety ] :: {
  ottr:Triple(?type, rdfs:label, ?variety),
  ottr:Triple(?type, a, rdfs:Class)
} . 

"""

In [11]:
m_tpl = Mapping()
m_tpl.add_template(tpls)
m_tpl.expand("https://data-treehouse.com/examples#InstanceTemplate", df_instances)
m_tpl.expand("https://data-treehouse.com/examples#TypeTemplate", df_types)
m_tpl.query("""
PREFIX ex: <https://data-treehouse.com/examples#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?id ?sepal_length ?sepal_width ?type_label WHERE {
    ?id a ?type .
    ?id ex:sepalLength ?sepal_length .
    ?id ex:sepalWidth ?sepal_width .
    ?type rdfs:label ?type_label .    
}
""")

id,sepal_length,sepal_width,type_label
str,f64,f64,str
"""<https://data-treehouse.com/examples#Setosa1>""",5.1,3.5,"""Setosa"""
"""<https://data-treehouse.com/examples#Setosa10>""",4.9,3.1,"""Setosa"""
"""<https://data-treehouse.com/examples#Setosa11>""",5.4,3.7,"""Setosa"""
…,…,…,…
"""<https://data-treehouse.com/examples#Virginica8>""",7.3,2.9,"""Virginica"""
"""<https://data-treehouse.com/examples#Virginica9>""",6.7,2.5,"""Virginica"""


## Mapping with programmatic OTTR

To make use of programmatic OTTR, we need some imports

In [12]:
from maplib import Template, Triple, XSD, Parameter, Variable, RDFType, Prefix, a

In [13]:
xsd = XSD()
id_var = Variable("id")
sepal_length_var = Variable("sepal_length")
sepal_width_var = Variable("sepal_width")
petal_length_var = Variable("petal_length")
petal_width_var = Variable("petal_width")
type_var = Variable("type")
my_prefix = Prefix(MY_PREFIX)

instance_template = Template(
    iri=my_prefix.suf("InstanceTemplate"),
    parameters= [
        Parameter(id_var, rdf_type=RDFType.IRI()),
        Parameter(sepal_length_var, rdf_type=RDFType.Literal(xsd.double)),
        Parameter(sepal_width_var, rdf_type=RDFType.Literal(xsd.double)),
        Parameter(petal_length_var, rdf_type=RDFType.Literal(xsd.double)),
        Parameter(petal_width_var, rdf_type=RDFType.Literal(xsd.double)),
        Parameter(type_var, rdf_type=RDFType.IRI())
    ],
    instances=[
        Triple(id_var, my_prefix.suf("sepalLength"), sepal_length_var),
        Triple(id_var, my_prefix.suf("sepalWidth"), sepal_width_var),
        Triple(id_var, my_prefix.suf("petalLength"), petal_length_var),
        Triple(id_var, my_prefix.suf("petalWidth"), petal_width_var),
        Triple(id_var, a(), type_var),
    ]
)
instance_template

<https://data-treehouse.com/examples#InstanceTemplate> [
     <http://ns.ottr.xyz/0.4/IRI> ?id, 
     <http://www.w3.org/2001/XMLSchema#double> ?sepal_length, 
     <http://www.w3.org/2001/XMLSchema#double> ?sepal_width, 
     <http://www.w3.org/2001/XMLSchema#double> ?petal_length, 
     <http://www.w3.org/2001/XMLSchema#double> ?petal_width, 
     <http://ns.ottr.xyz/0.4/IRI> ?type ] :: {
  <http://ns.ottr.xyz/0.4/Triple>(?id, <https://data-treehouse.com/examples#sepalLength>, ?sepal_length) ,
  <http://ns.ottr.xyz/0.4/Triple>(?id, <https://data-treehouse.com/examples#sepalWidth>, ?sepal_width) ,
  <http://ns.ottr.xyz/0.4/Triple>(?id, <https://data-treehouse.com/examples#petalLength>, ?petal_length) ,
  <http://ns.ottr.xyz/0.4/Triple>(?id, <https://data-treehouse.com/examples#petalWidth>, ?petal_width) ,
  <http://ns.ottr.xyz/0.4/Triple>(?id, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, ?type)
} . 

The above template looks pretty similar to what we created earlier in text form.
Now we create the types template. 

In [14]:
variety_var = Variable("variety")
rdfs = Prefix("http://www.w3.org/2000/01/rdf-schema#")

type_template = Template(
    iri=my_prefix.suf("TypeTemplate"),
    parameters= [
        Parameter(type_var, rdf_type=RDFType.IRI()),
        Parameter(variety_var, rdf_type=RDFType.Literal(xsd.string)),
    ],
    instances=[
        Triple(type_var, a(), rdfs.suf("Class")),
        Triple(type_var, rdfs.suf("label"), variety_var),
        
    ]
)
type_template

<https://data-treehouse.com/examples#TypeTemplate> [
     <http://ns.ottr.xyz/0.4/IRI> ?type, 
     <http://www.w3.org/2001/XMLSchema#string> ?variety ] :: {
  <http://ns.ottr.xyz/0.4/Triple>(?type, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://www.w3.org/2000/01/rdf-schema#Class>) ,
  <http://ns.ottr.xyz/0.4/Triple>(?type, <http://www.w3.org/2000/01/rdf-schema#label>, ?variety)
} . 

In [15]:
m_prg = Mapping()
m_prg.expand(instance_template, df_instances)
m_tpl.expand(type_template, df_types)

m_tpl.query("""
PREFIX ex: <https://data-treehouse.com/examples#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?id ?sepal_length ?sepal_width ?type_label WHERE {
    ?id a ?type .
    ?id ex:sepalLength ?sepal_length .
    ?id ex:sepalWidth ?sepal_width .
    ?type rdfs:label ?type_label .    
}
""")

id,sepal_length,sepal_width,type_label
str,f64,f64,str
"""<https://data-treehouse.com/examples#Setosa1>""",5.1,3.5,"""Setosa"""
"""<https://data-treehouse.com/examples#Setosa10>""",4.9,3.1,"""Setosa"""
"""<https://data-treehouse.com/examples#Setosa11>""",5.4,3.7,"""Setosa"""
…,…,…,…
"""<https://data-treehouse.com/examples#Virginica8>""",7.3,2.9,"""Virginica"""
"""<https://data-treehouse.com/examples#Virginica9>""",6.7,2.5,"""Virginica"""


## Mapping with PVALUES

The PVALUES construction allows us to include solution mappings based on a DataFrame.

In [16]:
m_pv = Mapping()

df_instances_coltypes = {
    "id": RDFType.IRI(),
    "sepal_length": RDFType.Literal(xsd.double),
    "sepal_width": RDFType.Literal(xsd.double),
    "petal_length": RDFType.Literal(xsd.double),
    "petal_width": RDFType.Literal(xsd.double),
    "type": RDFType.IRI()
}

m_pv.query("""
    SELECT * WHERE {
    PVALUES (?id ?sepal_length ?sepal_width ?petal_length ?petal_width ?type) df_instances
    }
    """,
    parameters={"df_instances":(df_instances, df_instances_coltypes)}
)

id,petal_length,petal_width,sepal_length,sepal_width,type
str,f64,f64,f64,f64,str
"""<https://data-treehouse.com/examples#Setosa1>""",1.4,0.2,5.1,3.5,"""<https://data-treehouse.com/examples#Setosa>"""
"""<https://data-treehouse.com/examples#Setosa2>""",1.4,0.2,4.9,3.0,"""<https://data-treehouse.com/examples#Setosa>"""
"""<https://data-treehouse.com/examples#Setosa3>""",1.3,0.2,4.7,3.2,"""<https://data-treehouse.com/examples#Setosa>"""
…,…,…,…,…,…
"""<https://data-treehouse.com/examples#Virginica49>""",5.4,2.3,6.2,3.4,"""<https://data-treehouse.com/examples#Virginica>"""
"""<https://data-treehouse.com/examples#Virginica50>""",5.1,1.8,5.9,3.0,"""<https://data-treehouse.com/examples#Virginica>"""


We can use insert in combination with PVALUES to do mapping.

In [17]:
#This is a bit of a dirty trick, we are fixing it :-)
m_pv.insert("""
    PREFIX ex: <https://data-treehouse.com/examples#> 
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    CONSTRUCT {
        ?id ex:sepalLength ?sepal_length .
        ?id ex:sepalWidth ?sepal_width .
        ?id ex:petalLength ?petal_length .
        ?id ex:petalWidth ?petal_width .
        ?id a ?type .
    } WHERE {
    PVALUES (?id ?sepal_length ?sepal_width ?petal_length ?petal_width ?type) df_instances
    }
    """,
    parameters={"df_instances":(df_instances, df_instances_coltypes)}
)

{}

In [18]:
df_types_coltypes = {
    "type": RDFType.IRI(),
    "variety": RDFType.Literal(xsd.string),
}

m_pv.insert("""
    PREFIX ex: <https://data-treehouse.com/examples#> 
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    CONSTRUCT {
        ?type a rdfs:Class .
        ?type rdfs:label ?variety .
    } WHERE {
    PVALUES (?type ?variety) df_types
    }
    """,
    parameters={"df_types":(df_types, df_types_coltypes)}
)

{}

The results are the same 

In [19]:
m_pv.query("""
PREFIX ex: <https://data-treehouse.com/examples#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?id ?sepal_length ?sepal_width ?type_label WHERE {
    ?id a ?type .
    ?id ex:sepalLength ?sepal_length .
    ?id ex:sepalWidth ?sepal_width .
    ?type rdfs:label ?type_label .    
}
""")

id,sepal_length,sepal_width,type_label
str,f64,f64,str
"""<https://data-treehouse.com/examples#Setosa1>""",5.1,3.5,"""Setosa"""
"""<https://data-treehouse.com/examples#Setosa10>""",4.9,3.1,"""Setosa"""
"""<https://data-treehouse.com/examples#Setosa11>""",5.4,3.7,"""Setosa"""
…,…,…,…
"""<https://data-treehouse.com/examples#Virginica8>""",7.3,2.9,"""Virginica"""
"""<https://data-treehouse.com/examples#Virginica9>""",6.7,2.5,"""Virginica"""
