# SPARQLing to local Triple Store

Within this Jupyter Notebook script, a local (emulated) triple store is created using the OWLready2 Python package to perform SPARQL queries to the example dataset provided, which can be used for quick data checks. 
Accordingly, within this script, necessary and useful libraries are imported and helper functions are implemented, first. Afterwards, some example SPARQL queries are performed.
The SPARQL queries are read in from separately created files that contain only the SPARQL query body (text of SPARQL query). They can be found in the dedicated [SPARQL folder]() within this repository.

They follow the general pattern of SPARQL queries:

```SPARQL
PREFIX ex: <https://example.org/my/namespace/>

SELECT ?s ?p ?o
WHERE {
    ?s ?p ?o
}
```


## Import of relevant packages | Definition of helper function(s)

In [1]:
%%capture
# Import relevant and useful packages
import numpy as np
import pandas as pd
import owlready2 as or2

In [2]:
# Definition of helper functions
# Function to transform inputs to IRIs.
def to_iri(input):
    try:
        return input.iri
    except:
        pass
    return input

# Function to write the result of a SPARQL query into a (pandas) data frame.
def sparql_result_to_df(res):
    l = []
    for row in res:
        r = [ to_iri(item)  for item in row]
        l.append(r)
    return pd.DataFrame(l)

## Definition of Sources

In the following cell, the sources of ontologies to be read in (parsed) as well as of the A-Box to be queried are specified.

In [None]:
# Define links to ontologies, files, etc. to be loaded in the local triple store
link_PMDco = "https://materialdigital.github.io/core-ontology/ontology.rdf" # PMD Core Ontology (PMDco) hosted on corresponding GitHub repository
link_ontoFNCT = "https://MarkusSchilling.github.io/ontoFNCT/ontology.rdf" # FNCT Ontology (ontoFNCT) hosted on corresponding GitHub repository
link_data = "https://raw.githubusercontent.com/MarkusSchilling/ontoFNCT/main/analysis/ontoFNCT_exemplary_data_PE-HD.rdf" # Example Dataset hosted on corresponding GitHub repository

triple_store = or2.World()
triple_store.get_ontology(link_PMDco).load()
triple_store.get_ontology(link_ontoFNCT).load()
triple_store.get_ontology(link_data).load()

## Specification of SPARQL Queries

In the following cells, some example SPARQL queries are selected (corresponding files). The queries contained in these files will be used for querying in the subsequent cell(s).

In [5]:
# Define the location of files containing SPARQL queries that are supposed to be run
link_SPARQL_query_count_all_entities = 'SPARQL\count_all_entities_in_triple_store.sparql' # Count all triples that can be found in the dataset
link_SPARQL_query_count_number_FNCT_tests = 'SPARQL\count_number_of_FNCT_tests.sparql' # Query for the number of instances of type "FNCT" in the dataset (number of FNCT tests included)
link_SPARQL_query_select_all_materials_tested = 'SPARQL\select_all_materials_tested.sparql' # Query to obtain the names of materials tested
link_SPARQL_query_FNCT_results = 'SPARQL\FNCT_results.sparql' # Query to obtain material names, media measured in and the FNCT results considering time to failure and the measured stress

In [6]:
# Perform SPARQL Queries

# First example SPARQL Query
# Open the file and read the SPARQL query
with open(link_SPARQL_query_count_all_entities, 'r') as file:
    query = file.read()
# Execute the SPARQL query
res = triple_store.sparql(query)
# Convert the result to a DataFrame
data_count_all_entities = sparql_result_to_df(res)

# Second SPARQL Query
# Open the file and read the SPARQL query
with open(link_SPARQL_query_count_number_FNCT_tests, 'r') as file:
    query = file.read()
# Execute the SPARQL query
res = triple_store.sparql(query)
# Convert the result to a DataFrame
data_count_number_of_FNCT_tests = sparql_result_to_df(res)

# Third example SPARQL Query
# Open the file and read the SPARQL query
with open(link_SPARQL_query_select_all_materials_tested, 'r') as file:
    query = file.read()
# Execute the SPARQL query
res = triple_store.sparql(query)
# Convert the result to a DataFrame
data_select_all_materials_tested = sparql_result_to_df(res)

# Fourth example SPARQL Query
# Open the file and read the SPARQL query
with open(link_SPARQL_query_FNCT_results, 'r') as file:
    query = file.read()
# Execute the SPARQL query
res = triple_store.sparql(query)
# Convert the result to a DataFrame
data_FNCT_results = sparql_result_to_df(res)

## Results 

For a depiction / visualization of results in table format, the module tabulate is used.
Furthermore, as the SPARQL queries are defined by dedicated SPARQL query files, the headers of the result table can be read from the select clause in the queries, respectively. This way, the result can be double-checked manually and consistency is ensured (did the SPARQL query select statement really address the information I wanted to obtain?). Hence, the following code includes a read in of the information queried for (the terms / concepts / entities addressed using the select clause).

In [None]:
# Import necessary packages (such as tabulate)
import re
from tabulate import tabulate

# First example SPARQL Query
# Step 1: Read the first example SPARQL file content
with open(link_SPARQL_query_count_all_entities, 'r') as file:
    sparql_query = file.read()

# Step 2: Extract the terms from the SELECT clause
# This regular expression looks for the SELECT or SELECT DISTINCT clause and captures the terms.
select_clause_match = re.search(r'SELECT\s+(DISTINCT\s+)?(.*?)\s+WHERE', sparql_query, re.DOTALL)

if select_clause_match:
    select_clause = select_clause_match.group(2)  # Use group(2) to capture the variables
    
    # Extract variables that appear after AS ?variable or are directly selected as ?variable
    headers = []
    for match in re.findall(r'AS\s+\?(\w+)|\s+\?(\w+)', select_clause):
        # match[0] captures the alias (after AS), match[1] captures direct selections
        header = match[0] or match[1]
        headers.append(header)
else:
    print("No headers were found. Please check the select clause within the SPARQL query.")

# Step 3: Use the headers in the tabulate print statement
# Print the data with tabulate
print("Number of all instances found in the triple store:")
print(tabulate(data_count_all_entities, headers=headers, tablefmt='psql', showindex=True))
print("\n")  # Print a blank line between tables to enhance output readability


# Second example SPARQL Query
# Step 1: Read the first example SPARQL file content
with open(link_SPARQL_query_count_number_FNCT_tests, 'r') as file:
    sparql_query = file.read()

# Step 2: Extract the terms from the SELECT clause
# This regular expression looks for the SELECT or SELECT DISTINCT clause and captures the terms.
select_clause_match = re.search(r'SELECT\s+(DISTINCT\s+)?(.*?)\s+WHERE', sparql_query, re.DOTALL)

if select_clause_match:
    select_clause = select_clause_match.group(2)  # Use group(2) to capture the variables
    
    # Extract variables that appear after AS ?variable or are directly selected as ?variable
    headers = []
    for match in re.findall(r'AS\s+\?(\w+)|\s+\?(\w+)', select_clause):
        # match[0] captures the alias (after AS), match[1] captures direct selections
        header = match[0] or match[1]
        headers.append(header)
else:
    print("No headers were found. Please check the select clause within the SPARQL query.")

# Step 3: Use the headers in the tabulate print statement
# Print the data with tabulate
print("Number of all instances of type FNCT in the dataset (number of FNCT tests included):")
print(tabulate(data_count_number_of_FNCT_tests, headers=headers, tablefmt='psql', showindex=True))
print("\n")  # Print a blank line between tables to enhance output readability


# Third example SPARQL Query
# Step 1: Read the first example SPARQL file content
with open(link_SPARQL_query_select_all_materials_tested, 'r') as file:
    sparql_query = file.read()

# Step 2: Extract the terms from the SELECT clause
# This regular expression looks for the SELECT or SELECT DISTINCT clause and captures the terms.
select_clause_match = re.search(r'SELECT\s+(DISTINCT\s+)?(.*?)\s+WHERE', sparql_query, re.DOTALL)

if select_clause_match:
    select_clause = select_clause_match.group(2)  # Use group(2) to capture the variables
    # Split the terms by whitespace and strip any leading or trailing spaces
    headers = [term.strip().lstrip('?') for term in select_clause.split() if term.strip().startswith('?')]
else:
    print("No headers were found. Please check the select clause within the SPARQL query.")

# Step 3: Use the headers in the tabulate print statement
# Print the data with tabulate
print("Names of materials tested given in the triple store:")
print(tabulate(data_select_all_materials_tested, headers=headers, tablefmt='psql', showindex=True))
print("\n")  # Print a blank line between tables to enhance output readability


# Fourth example SPARQL Query
# Step 1: Read the first example SPARQL file content
with open(link_SPARQL_query_FNCT_results, 'r') as file:
    sparql_query = file.read()

# Step 2: Extract the terms from the SELECT clause
# This regular expression looks for the SELECT or SELECT DISTINCT clause and captures the terms.
select_clause_match = re.search(r'SELECT\s+(DISTINCT\s+)?(.*?)\s+WHERE', sparql_query, re.DOTALL)

if select_clause_match:
    select_clause = select_clause_match.group(2)  # Use group(2) to capture the variables
    # Split the terms by whitespace and strip any leading or trailing spaces
    headers = [term.strip().lstrip('?') for term in select_clause.split() if term.strip().startswith('?')]
else:
    print("No headers were found. Please check the select clause within the SPARQL query.")

# Step 3: Use the headers in the tabulate print statement
# Print the data with tabulate
print("FNCT results including material names, media measured in, time to failure values and the measured stress values:")
print(tabulate(data_FNCT_results, headers=headers, tablefmt='psql', showindex=True))