# User guide for the ARIADNEplus Knowledge Base (GraphDB)

## Version 1.1
Changes from 1.0:

- [The endpoints](#The-endpoints): Staging GraphDB is accessible only from trusted networks
- [Organisation of the data](#Organisation-of-the-data): 
 - Fixed and shorten URI in table in section 
 - Added table as image to avoid bad rendering in PDF
- [Sample queries](#Sample-queries): deleted trailing cell code
- The whole notebook was re-run on 23/10/2023

### Authors: Alessia Bardi (CNR-ISTI), Enrico Ottonello (CNR-ISTI)

## Table of contents

- [The endpoints](#The-endpoints)
- [Using GraphDB workbench](#Using-GraphDB-workbench)
- [Organisation of the data](#Organisation-of-the-data)
- [Querying GraphDB programmatically](#Querying-GraphDB-programmatically)
 - [Java](#Java)
 - [Python](#Python)
- [Sample queries](#Sample-queries)

## The endpoints

* Production endpoint: https://graphdb.ariadne.d4science.org/ (non authenticated read-only access). Content from the production endpoint is fed to the public portal
* Staging endpoint: https://graphdb-test.ariadne.d4science.org/ (non authenticated read-only access) - non stable content, may include test data, mainly to be used by consortium members for quality checks before data publication on the public endpoint and public portal. Content from the staging graphDB is fed to the staging portal. Access is allowed only from authorised networks.

Both endpoints run on [GraphDB Free](https://www.ontotext.com/products/graphdb/graphdb-free/) version 9.0.0

| Endpoint      | Name of the repository | Type of content      | Suggested for |  Software |
| ----------- | ----------- | ----------- | ----------- | ----------- | 
| staging      | ariadneplus-ts01       | Non stable, test | Technical tests, query syntax checks | GraphDB Free version 9.0.0 |
| production   | ariadneplus-pr01        | Stable, approved by providers | Data access | GraphDB Free version 9.0.0 |


## Using GraphDB workbench

Open the URL of one of the endpoints with your browser to access the workbench (fig. 1)

1. In case you do not see the Active repository, choose it in the top-right menu 
2. On the left menu you have all functions: you are allowed to use any read function
3. Full documentation provided by GraphDB available under the “Help” menu on the left
4. “View resource”: type the URI of a known resource to get it: click on Visual for a visual representation, Text for a textual/tabular representation
5. Explore the data selecting an option from the left menu item named “Explore”
6. Run SPARQL queries selecting the left menu item named “SPARQL”. The page that opens is an advanced SPARQL query editor that you can also access directly with the URL <endpoint_url>/sparql

| ![fig1](workbench.jpg) |
|:--:|
| <b>Figure 1 Home page of GraphDB Workbench</b>|


## Organisation of the data

Data on GraphDB is organised in such a way that it is possible to perform incremental updates of its content. This has been done by using the concept of “named graph” in a specific way to enable continuous updates and enrichments to aggregated data. For details, please consult [D12.2 Mid-term report on data integration](https://doi.org/10.5281/zenodo.4922902) (Section 3.1). In brief:

* Table 1 below summarises how the data on GraphDB is organised (for readibility we omit the first part of the URI, which is always https://ariadne-infrastructure.eu/) 

| Named graph      | Template URI\[^1\] | Example URI      | What is it |  How many |
| ----------- | ----------- | ----------- | ----------- | ----------- | 
| Provenance      | N/A       | datasourceApis | A special graph to keep provenance information. It contains information about which endpoints and which datasets have been added to GraphDB and when | One |
| Core data   | api_________::ariadne_plus::<providerAcronym>::<datasetId>        | api_________::ariadne_plus::hnm::hnmad | One named graph per dataset[^2] | One per dataset [^3] |
| AAT matching rules   | api_________::ariadne_plus::<providerAcronym>::aat        | api_________::ariadne_plus::hnm::aat | One named graph with the matches between local subjects and Getty AAT terms[^4] | One per provider (optional[^5]) |
| AAT enrichments   | ariadneplus::<providerAcronym>::aatplus        | ariadneplus::hnm::aatplus | One named graph containing the triples inferred by intersecting the aggregated data and Getty AAT based on the provided matching | One per provider (optional) |
| PeriodO  provider’s terms   | ariadneplus::<providerAcronym>::periodo        | ariadneplus::hnm::periodo | One named graph with PeriodO terms covered by the provider | One per provider (optional[^6]) |
| ARIADNE’s PeriodO terms   | N/A       | ariadne/periodo | PeriodO collection created during the previous Ariadne project[^7] | One |
| PeriodO enrichments   | ariadneplus::<providerAcronym>::periodoplus       | ariadneplus::hnm::periodoplus| One named graph containing the triples inferred by intersecting the aggregated data and the PeriodO collection relevant for the provider | One per provider (optional) | 
    
 [^1] All URIs shall be considered as a mere identifier local to GraphDB and, although they start with ‘https’, they do not currently resolve to any content served by the HTTPS protocol.
 
 [^2] where “dataset” is to be intended as a set of metadata records from a provider
 
 [^3] One provider can have many “datasets” (i.e. many named graphs); one “dataset” (i.e. one named graph) has only one provider.

 [^4] as generated using the Vocabulary Matching Tool

 [^5] The existence of this graph is optional, as the transformed RDF triples may already contain terms of Getty AAT.
 
 [^6] The existence of this graph is optional, as the transformed RDF triples may already contain terms and dates of PeriodO or the provider might not have a dedicated PeriodO collection to import
 
 [^7] http://n2t.net/ark:/99152/p0qhb66

* Figure 2 shows the set of named graphs available for the HNM provider (as an example).  
* Figure 3 instead show part of the content of the provenance graph that tells us that the data collected from the dataset with internal identifier api_________::ariadne_plus::ads::1 belongs to Archaeology Data Service and that the content from it was inserted into GraphDB in 2020-10-07. Please note that the subject of the triples (https://ariadne-infrastructure.eu/api_________::ariadne_plus::ads::1) has the same URI as the named graph that groups all RDF triples generated from the data collected from this API. 

| ![fig2](named-graphs.png) |
|:--:|
| <b>Figure 2 The named graphs associated to data provided by HNM</b>|    

| ![fig3](triples.png) |
|:--:|
| <b>Figure 3 Sample triples in the “provenance graph”</b>|    
    
## Querying GraphDB programmatically

GraphDB provides 
    
* the Workbench REST API, documented at https://graphdb.ariadne.d4science.org/webapi (fig. 4)
* RDF4J API as documented at https://graphdb.ontotext.com/documentation/9.0/free/using-graphdb-with-the-rdf4j-api.html
    
| ![fig4](rest.png) |
|:--:|
| <b>Figure 4 Documentation on graphDB REST RDF4J API</b>| 

### Java
    
You can query the endpoint with any RDF4J compliant client. For example the [Eclipse RDF4J API](https://rdf4j.org/) for the Java programming language. 
Sample code for connecting and querying the staging endpoint with the Java RDF4J API is available at https://github.com/ARIADNE-Infrastructure/sample-code .
The code includes a generic GraphDBReader class with an example of TupleQuery on the provenance graph and one CONSTRUCT query for one of the collection records provided by ADS.

In addition, some simple but useful SPARQL queries are available in a dedicated folder. 
In the following interactive paragraphs we mention some examples, but additional queries may be added to the github repositories based on users’ requests for support.

    
### Python

Check out the following interactive paragraphs, which uses the [SPARQLWrapper library](https://github.com/RDFLib/sparqlwrapper).












In [1]:
#install libraries
!pip install rdflib
!pip install SPARQLWrapper
!pip install prettytable

Collecting prettytable
  Using cached prettytable-3.4.1-py3-none-any.whl (26 kB)
Installing collected packages: prettytable
Successfully installed prettytable-3.4.1


In [2]:
#imports
from rdflib import Graph
from SPARQLWrapper import SPARQLWrapper, JSON, N3
from pprint import pprint

In [3]:
sparql = SPARQLWrapper('https://graphdb.ariadne.d4science.org/repositories/ariadneplus-pr01')
sparql.setQuery('''
    PREFIX  aocat: <https://www.ariadne-infrastructure.eu/resource/ao/cat/1.1/>
    SELECT *
    WHERE { 
        ?c rdf:type aocat:AO_Collection .
        ?c ?p ?o .
    }
    LIMIT 10
''')
sparql.setReturnFormat(JSON)
#Short SPARQL queries can be sent with a GET, for long queries, better use a POST by uncommenting the line below
# sparql.setMethod('POST')
qres = sparql.query().convert()

# To view the raw output, uncomment the line below
# pprint(qres)
#For more readable triples, let's remove the base namespace from the predicates
for result in qres['results']['bindings']:
    s,p,o = result['c']['value'], result['p']['value'].rsplit('/', 1)[1],result['o']['value']
    print(f'{s} - {p} - {o}' )


https://ariadne-infrastructure.eu/aocat/Collection/ADS/AC6671C7-FD6D-311D-98D0-F635D5EFAA4F - 22-rdf-syntax-ns#type - https://www.ariadne-infrastructure.eu/resource/ao/cat/1.1/AO_Collection
https://ariadne-infrastructure.eu/aocat/Collection/ADS/AC6671C7-FD6D-311D-98D0-F635D5EFAA4F - rdf-schema#label - Collection  10.5284
https://ariadne-infrastructure.eu/aocat/Collection/ADS/AC6671C7-FD6D-311D-98D0-F635D5EFAA4F - has_ARIADNE_subject - https://ariadne-infrastructure.eu/aocat/Concept/AO_Subject/Artefact
https://ariadne-infrastructure.eu/aocat/Collection/ADS/AC6671C7-FD6D-311D-98D0-F635D5EFAA4F - has_ARIADNE_subject - https://ariadne-infrastructure.eu/aocat/Concept/AO_Subject/Inscription
https://ariadne-infrastructure.eu/aocat/Collection/ADS/AC6671C7-FD6D-311D-98D0-F635D5EFAA4F - has_ARIADNE_subject - https://ariadne-infrastructure.eu/aocat/Concept/AO_Subject/Fieldwork%20archive
https://ariadne-infrastructure.eu/aocat/Collection/ADS/AC6671C7-FD6D-311D-98D0-F635D5EFAA4F - has_ARIADNE_subje

In [4]:
import xml.dom.minidom
from IPython.display import Code

sparql = SPARQLWrapper('https://graphdb.ariadne.d4science.org/repositories/ariadneplus-pr01')
sparql.setQuery('''
    PREFIX  aocat: <https://www.ariadne-infrastructure.eu/resource/ao/cat/1.1/>
    SELECT ?c
    WHERE { 
        ?c rdf:type aocat:AO_Collection .
    }
    LIMIT 10
''')
sparql.setReturnFormat('xml')
qres = sparql.query().convert()

# for a real pretty print of XML we need to use IPython.display Code
Code(qres.toprettyxml(), language='xml')





## Sample queries

Let's try some sample queries. You can find more on [sample-queries folder on github](https://github.com/ARIADNE-Infrastructure/sample-code/tree/master/src/main/resources/ariadneplus/sparql-queries). In the following we mention some examples, but additional queries may be added to the github repositories based on users’ requests for support. 

### Get titles of 10 resources whose contributor is the Archaeology Data Service 



In [5]:
from rdflib import Graph
from SPARQLWrapper import SPARQLWrapper, JSON, N3
from pprint import pprint

sparql = SPARQLWrapper('https://graphdb.ariadne.d4science.org/repositories/ariadneplus-pr01')
sparql.setQuery('''
    PREFIX aocat: <https://www.ariadne-infrastructure.eu/resource/ao/cat/1.1/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT ?title  WHERE {
        ?resourceIRI aocat:has_contributor ?contributor .
        ?contributor rdfs:label "Archaeology Data Service" .
        ?resourceIRI aocat:has_title ?title
    }
    LIMIT 10
''')
sparql.setReturnFormat(JSON)
qres = sparql.query().convert()

# To view the raw output, uncomment the line below
# pprint(qres)
#For more readable triples, let's remove the base namespace from the predicates
print('Titles of resources with ADS as contributor')
print('-------------------------------------------')
for result in qres['results']['bindings']:
    t = result['title']['value']
    print(f'{t}' )




Titles of resources with ADS as contributor
-------------------------------------------
Day of Archaeology Archive
Gwynedd Regional HER
Glamorgan-Gwent HER
York Archive Gazetteer
Englands Rock Art
Northern Ireland Sites and Monuments Record
Parks and Gardens Data Service
Greater London Sites and Monuments Record
Defence of Britain Archive
Greater London Sites and Monuments Record


### Count number of Collections by provider

In [6]:
from prettytable import PrettyTable
from rdflib import Graph
from SPARQLWrapper import SPARQLWrapper, JSON, N3
from pprint import pprint


sparql = SPARQLWrapper('https://graphdb.ariadne.d4science.org/repositories/ariadneplus-pr01')
sparql.setQuery('''
    PREFIX aocat: <https://www.ariadne-infrastructure.eu/resource/ao/cat/1.1/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT ?name (COUNT(?c) AS ?num_collections) WHERE {
        ?c rdf:type aocat:AO_Collection .
        ?c aocat:has_publisher ?publisher .
        ?publisher aocat:has_name ?name .
    } GROUP BY ?name
''')

sparql.setReturnFormat(JSON)
qres = sparql.query().convert()

t = PrettyTable(['Provider', 'Num Collections'])
total = 0
for result in qres['results']['bindings']:
    contrib,count = result['name']['value'],result['num_collections']['value']
    t.add_row([contrib, count])
    total += int(count)

print(f'Total number of collections: {total}')
print(t)



Total number of collections: 60302
+------------------------------------------------------------------------------+-----------------+
|                                   Provider                                   | Num Collections |
+------------------------------------------------------------------------------+-----------------+
|                           Archaeology Data Service                           |        32       |
| National Institute of Archaeology with Museum, Bulgarian Academy of Sciences |        1        |
|                             University of Patras                             |        3        |
|                        Swedish National Data Service                         |       484       |
|                      Swedish Rock Art Research Archives                      |        1        |
|                                   ZRC SAZU                                   |        2        |
|                                     HNM                                 

### Get number of resources by provider and ARIADNE subjects

In [7]:
from rdflib import Graph
from SPARQLWrapper import SPARQLWrapper, JSON, N3
from pprint import pprint


sparql = SPARQLWrapper('https://graphdb.ariadne.d4science.org/repositories/ariadneplus-pr01')
sparql.setQuery('''
PREFIX aocat: <https://www.ariadne-infrastructure.eu/resource/ao/cat/1.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT (count(?resource) AS ?cnt) ?publisherName ?asl  WHERE {
    ?resource aocat:has_publisher ?publisher . 
    ?publisher aocat:has_name ?publisherName .
    ?resource aocat:has_ARIADNE_subject ?as .
    ?as rdfs:label ?asl
}
GROUP BY ?publisherName ?asl
''')

sparql.setReturnFormat(JSON)
qres = sparql.query().convert()

t = PrettyTable(['Provider', 'ARIADNE subjects', 'Count'])

for result in qres['results']['bindings']:
    t.add_row([result['publisherName']['value'], result['asl']['value'], result['cnt']['value']])

print(t)



+------------------------------------------------------------------------------+------------------+--------+
|                                   Provider                                   | ARIADNE subjects | Count  |
+------------------------------------------------------------------------------+------------------+--------+
|                           Archaeology Data Service                           |     Artefact     |  4706  |
|                           Archaeology Data Service                           |   Inscription    | 26315  |
|                           Archaeology Data Service                           |       Date       |  9026  |
|                           Archaeology Data Service                           | Fieldwork report | 63596  |
|                           Archaeology Data Service                           |  Site/monument   | 804458 |
|                           Archaeology Data Service                           |     Maritime     |   81   |
|                  

### Records from ADS, Aarhus or DANS that are with ARIADNE_subject = “Artefact” and derived_subject = “Brooch” (http://vocab.getty.edu/aat/300045995)

In [8]:
from rdflib import Graph
from SPARQLWrapper import SPARQLWrapper, JSON, N3
from pprint import pprint


sparql = SPARQLWrapper('https://graphdb.ariadne.d4science.org/repositories/ariadneplus-pr01')
sparql.setQuery('''
PREFIX aocat: <https://www.ariadne-infrastructure.eu/resource/ao/cat/1.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX periodo: <http://n2t.net/ark:/99152/p0v#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?provider ?country (count(distinct ?resource) AS ?cnt)   WHERE {
    ?resource aocat:has_publisher ?publisher . 
    ?publisher rdfs:label ?provider .
    ?resource aocat:has_derived_subject <http://vocab.getty.edu/aat/300045995> .
    ?resource aocat:has_temporal_coverage ?tc .
    ?tc aocat:has_period ?periodo .
    ?periodo dcterms:spatial ?spatial .
    ?spatial skos:prefLabel ?country
    FILTER(?provider="Archaeology Data Service" || ?provider="Data Archiving and Networked Services (DANS)" || ?provider="Aarhus University")
}
GROUP BY ?country ?provider
''')

sparql.setReturnFormat(JSON)
qres = sparql.query().convert()

print('Records from ADS, Aarhus or DANS that are with ARIADNE_subject = “Artefact” and derived_subject = “Brooch” by country')
t = PrettyTable(['Provider', 'Country', 'Count'])

for result in qres['results']['bindings']:
    t.add_row([result['provider']['value'], result['country']['value'], result['cnt']['value']])

print(t)

Records from ADS, Aarhus or DANS that are with ARIADNE_subject = “Artefact” and derived_subject = “Brooch” by country
+----------------------------------------------+----------------+-------+
|                   Provider                   |    Country     | Count |
+----------------------------------------------+----------------+-------+
|           Archaeology Data Service           |    Hungary     |  1971 |
|           Archaeology Data Service           | United Kingdom |  2452 |
|           Archaeology Data Service           |      Asia      |  343  |
|           Archaeology Data Service           |     Europe     |  343  |
|           Archaeology Data Service           |     Earth      |   18  |
|           Archaeology Data Service           |    Scotland    |  2394 |
| Data Archiving and Networked Services (DANS) |  Netherlands   |  9740 |
|              Aarhus University               |    Denmark     |   13  |
+----------------------------------------------+----------------+---

### Derived subjects by provider for Fieldwork and Fieldwork report resources

In [9]:
from rdflib import Graph
from SPARQLWrapper import SPARQLWrapper, JSON, N3
from pprint import pprint


sparql = SPARQLWrapper('https://graphdb.ariadne.d4science.org/repositories/ariadneplus-pr01')
sparql.setQuery('''
PREFIX aocat: <https://www.ariadne-infrastructure.eu/resource/ao/cat/1.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX periodo: <http://n2t.net/ark:/99152/p0v#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?provider ?dsl (count(distinct ?resource) AS ?cnt)   WHERE {
    ?resource aocat:has_publisher ?publisher . 
    ?publisher rdfs:label ?provider .
    ?resource aocat:has_ARIADNE_subject ?as .
    ?as rdfs:label ?asl .
    ?resource aocat:has_derived_subject ?ds .
    ?ds skos:prefLabel ?dsl .
    FILTER(?asl = 'Fieldwork report'@en || ?asl = 'Fieldwork'@en)
}
GROUP BY ?dsl ?provider
ORDER BY desc(?cnt)
''')

sparql.setReturnFormat(JSON)
qres = sparql.query().convert()

print('Derived subjects by provider for Fieldwork and Fieldwork report resources')
  
t = PrettyTable(['Provider', 'Derived subject label', 'Count'])

for result in qres['results']['bindings']:
    t.add_row([result['provider']['value'], result['dsl']['value'], result['cnt']['value']])

print(t)

Derived subjects by provider for Fieldwork and Fieldwork report resources
+--------------------------------------------------------------------+---------------------------------------------------------------------+-------+
|                              Provider                              |                        Derived subject label                        | Count |
+--------------------------------------------------------------------+---------------------------------------------------------------------+-------+
|                               AIS CR                               |                               trenches                              | 60227 |
|                               AIS CR                               |                        museums (institutions)                       | 54978 |
|                                HNM                                 |                         settlement patterns                         | 38277 |
|                               