# SPARQLing to local Triple Store

Within this Jupyter Notebook script, a local triple store is created using the OWLready2 Python package that can be queried afterwards.
Accordingly, necessary and useful libraries are imported and helper functions are implemented.
The SPARQL queries are read in from especially created files that contain only the SPARQL query body (text of SPARQL query). Such files may be created at will and then read in.

They follow the general pattern of SPARQL queries:

```SPARQL
PREFIX ex: <https://example.org/my/namespace/>

SELECT ?s ?p ?o
WHERE {
    ?s ?p ?o
}
```


## Import of relevant packages | Definition of helper functions

In [4]:
%%capture
# Import relevant and useful packages
import numpy as np
import pandas as pd
import owlready2 as or2

In [2]:
# Definition of helper functions
# Function to transform inputs to IRIs.
def to_iri(input):
    try:
        return input.iri
    except:
        pass
    return input

# Function to write the result of a SPARQL query into a (pandas) data frame.
def sparql_result_to_df(res):
    l = []
    for row in res:
        r = [ to_iri(item)  for item in row]
        l.append(r)
    return pd.DataFrame(l)

## Definition of Sources

In the following cell, the sources of ontologies to be read in (parsed) as well as the source of a possible A-Box, another ontology, or file, is to be specified.

In [None]:
# Define links to ontologies, files, etc. to be loaded in the local triple store
link_ontology_1 = "https://materialdigital.github.io/core-ontology/ontology.rdf" # Example: PMD Core Ontology (PMDco) hosted on corresponding GitHub repository
link_data = "your_data.rdf"

triple_store = or2.World()
triple_store.get_ontology(link_ontology_1).load() # Example: https://w3id.org/pmd/co
triple_store.get_ontology(link_data).load()

## Specification of SPARQL Query

In the following cell, the source of the SPARQL query file is to be selected. The query contained in this file will be used for querying in the subsequent cell.

In [6]:
# Define the location of the file containing a SPARQL query that is supposed to be run
link_SPARQL_query = 'example_SPARQL_query.sparql'

In [7]:
# Open the file and read the SPARQL query
with open(link_SPARQL_query, 'r') as file:
    query = file.read()

# Execute the SPARQL query
res = triple_store.sparql(query)

# Convert the result to a DataFrame
data = sparql_result_to_df(res)

# Print (the first few rows of) the DataFrame
# data

## Depiction of Results 

For a probably somewhat nicer depiction / visualization of results in table format, the module tabulate may be used.
Furthermore, as the SPARQL query is defined by a dedicated SPARQL query file (link_SPARQL_query), the headers of the result table can be read from the select clause in the query. This way, the result can be double-checked manually and consistency is ensured (did the SPARQL query select statement really address the information I wanted to obtain?). Hence, the following code includes a read in of the information queried for (the terms / concepts / entities addressed using the select clause).

In [None]:
# Import necessary packages (such as tabulate)
import re
from tabulate import tabulate

# Step 1: Read the SPARQL file content
with open(link_SPARQL_query, 'r') as file:
    sparql_query = file.read()

# Step 2: Extract the terms from the SELECT clause
# This regular expression looks for the SELECT or SELECT DISTINCT clause and captures the terms.
select_clause_match = re.search(r'SELECT\s+(DISTINCT\s+)?(.*?)\s+WHERE', sparql_query, re.DOTALL)

if select_clause_match:
    select_clause = select_clause_match.group(2)  # Use group(2) to capture the variables
    # Split the terms by whitespace and strip any leading or trailing spaces
    headers = [term.strip().lstrip('?') for term in select_clause.split() if term.strip().startswith('?')]
else:
    print("No headers were found. Please check the select clause within the SPARQL query.")

# Step 3: Use the headers in the tabulate print statement
# Print the data with tabulate
print(tabulate(data, headers=headers, tablefmt='psql', showindex=True))