# Configure database connection

We have prepared a hosted database (triplestore / SPARQL endpoint) for you. Let’s connect to it:

In this Notebook, go to "Secrets" in the left sidebar, and add two secrets:

* Name: `sparql_user`, Value: your username (from the note)
* Name: `sparql_password`, Value: your password (from the note)

Enable "Notebook access" for both.

Then execute the following cell to set the database connection details as environment variables:

In [41]:
import os
from google.colab import userdata
os.environ['SPARQL_USER'] = userdata.get('sparql_user')
os.environ['SPARQL_PASSWORD'] = userdata.get('sparql_password')
os.environ['SPARQL_ENDPOINT_STORE'] = "https://sd-84c02130.stardog.cloud:5820/" + userdata.get('sparql_user')
os.environ['SPARQL_ENDPOINT_QUERY'] = "https://sd-84c02130.stardog.cloud:5820/" + userdata.get('sparql_user') + "/query"
os.environ['SPARQL_ENDPOINT_UPDATE'] = "https://sd-84c02130.stardog.cloud:5820/" + userdata.get('sparql_user') + "/update"


## Test connection

To test if the database connection works, execute the following SPARQL query. It should output just the string `graph`, which means that no named graphs exist yet.

In [38]:
!curl -X GET \
      -H "Accept:text/csv" \
      --user "${SPARQL_USER}:${SPARQL_PASSWORD}" \
      "${SPARQL_ENDPOINT_QUERY}" \
      --data-urlencode "query=SELECT DISTINCT ?graph WHERE { GRAPH ?graph { [] ?p [] . } }"

graph
https://graphs.brox.de/dwt24/data/
https://graphs.brox.de/dwt24/ontologies/
https://graphs.brox.de/dwt24/shacl-shapes/
https://graphs.brox.de/dwt24/shacl-reports/


# Install tools

## SPARQL Anything

Links: [Website](https://sparql-anything.cc/) · [Repo](https://github.com/SPARQL-Anything/sparql.anything) · [Documentation](https://sparql-anything.readthedocs.io/en/latest/)

In [None]:
!wget https://github.com/SPARQL-Anything/sparql.anything/releases/download/0.9.0/sparql-anything-0.9.0.jar -O sparql-anything.jar

## PySHACL

Links: [Repo](https://github.com/RDFLib/pySHACL) · [PyPI](https://pypi.org/project/pyshacl/)

In [None]:
%pip install pyshacl

# Build the knowledge graph

## 1) Understand the data (XML)

The input data is about

* persons,
* the management jobs they had, and
* the skills/competences they used in these jobs.

Each job can be assigned to a job category. Some skills have related skills.

All the data is in four XML files in the `kg/transformations_input` directory:

* …
* …
* …
* …


## 2) Create the ontology (RDF)


The ontology describes what the terms (e.g., classes and properties) *mean*.

The RDF file (in Turtle) is in the `kg/ontologies` directory.

## 3) Transform the data (XML to RDF)

There are many ways to convert data to RDF. One of these ways is SPARQL-based. SPARQL is the query language for RDF, but there are tools that allow querying even *non-RDF* data as if it already were RDF. One of these tools is **SPARQL Anything**.

The four query files (one for each XML file) are in the `kg/transformations` directory.

The generated RDF files (in Turtle) get saved in the `kg/transformations_output` directory.

In [36]:
!java -jar sparql-anything.jar \
      --query "./kg/transformations/distances.rq" \
      --configuration "location=./kg/transformations_input/distances.xml" \
      --output "./kg/transformations_output/distances.ttl" \
      --format "TTL"

Here we merge our four Turtle files into one, which makes it easier to validate and upload the RDF in the next steps:

In [21]:
!cat \
  ./kg/transformations_output/distances.ttl \
  ./kg/transformations_output/jobcategories-skills.ttl \
  ./kg/transformations_output/jobcategories.ttl \
  ./kg/transformations_output/skills.ttl \
  > ./kg/transformations_output/data.ttl

## 4) Create SHACL shapes (RDF)

SHACL shapes describe what our RDF data must look like to be considered valid/conformant.

The RDF file (in Turtle) is in the `kg/shapes` directory.



### Validate the RDF

The generated report gets saved in the `kg/shapes_output` directory.

In [39]:
!pyshacl \
  --shacl ./kg/shapes/shapes.ttl \
  --format turtle \
  --output ./kg/shapes_output/report.ttl \
  ./kg/transformations_output/data.ttl


^C


# Upload the knowledge graph

Thanks to the [SPARQL Protocol](https://www.w3.org/TR/sparql11-protocol/), we can upload RDF files to our triplestore using standard HTTP requests.

In [30]:
!curl -X "PUT" \
      -H "Content-Type:text/turtle" \
      --user "${SPARQL_USER}:${SPARQL_PASSWORD}" \
      --data-binary "@./kg/ontologies/o.ttl" \
      "${SPARQL_ENDPOINT}/?graph=https://graphs.brox.de/dwt24/ontologies/"

!curl -X "PUT" \
      -H "Content-Type:text/turtle" \
      --user "${SPARQL_USER}:${SPARQL_PASSWORD}" \
      --data-binary "@./kg/shapes/shapes.ttl" \
      "${SPARQL_ENDPOINT}/?graph=https://graphs.brox.de/dwt24/shacl-shapes/"

!curl -X "PUT" \
      -H "Content-Type:text/turtle" \
      --user "${SPARQL_USER}:${SPARQL_PASSWORD}" \
      --data-binary "@./kg/shapes_output/report.ttl" \
      "${SPARQL_ENDPOINT}/?graph=https://graphs.brox.de/dwt24/shacl-reports/"

!curl -X "PUT" \
      -H "Content-Type:text/turtle" \
      --user "${SPARQL_USER}:${SPARQL_PASSWORD}" \
      --data-binary "@./kg/transformations_output/data.ttl" \
      "${SPARQL_ENDPOINT}/?graph=https://graphs.brox.de/dwt24/data/"

# Query the knowledge graph

## Named Graphs + count of instances (per class)

In [37]:
!curl -X GET \
      -H "Accept:text/csv" \
      --user "${SPARQL_USER}:${SPARQL_PASSWORD}" \
      "${SPARQL_ENDPOINT_QUERY}" \
      --data-urlencode "query=SELECT DISTINCT ?g ?class (COUNT(DISTINCT ?s) AS ?count) WHERE { GRAPH ?g { ?s a ?class . } } GROUP BY ?g ?class ORDER BY ?g DESC(?count)"

g,class,count
https://graphs.brox.de/dwt24/data/,https://ontologies.brox.de/dwt24/Job,20913
https://graphs.brox.de/dwt24/data/,https://ontologies.brox.de/dwt24/Skill,7549
https://graphs.brox.de/dwt24/data/,https://ontologies.brox.de/dwt24/SkillDistance,4631
https://graphs.brox.de/dwt24/data/,http://xmlns.com/foaf/0.1/Person,2930
https://graphs.brox.de/dwt24/data/,https://ontologies.brox.de/dwt24/JobCategory,355
https://graphs.brox.de/dwt24/ontologies/,http://www.w3.org/2002/07/owl#DatatypeProperty,6
https://graphs.brox.de/dwt24/ontologies/,http://www.w3.org/2002/07/owl#Class,4
https://graphs.brox.de/dwt24/ontologies/,http://www.w3.org/2002/07/owl#ObjectProperty,4
https://graphs.brox.de/dwt24/ontologies/,http://www.w3.org/2002/07/owl#Ontology,1
https://graphs.brox.de/dwt24/shacl-reports/,http://www.w3.org/ns/shacl#ValidationReport,1
https://graphs.brox.de/dwt24/shacl-shapes/,http://www.w3.org/ns/shacl#NodeShape,10
