# SHACL-ACL: Access Control with SHACL

---

## What is SHACL-ACL?
_Knowledge graphs_ (KGs) are commonly used to share data on the Web. KGs are commonly represented using the _Resource Description Framework_ (RDF) [\[1\]](#1); a W3C recommendation. The W3C recommendation language to query RDF data is the _SPARQL Protocol And RDF Query Language_ (SPARQL) [\[2\]](#2). The _Shapes Constraint Language_ (SHACL) [\[3\]](#3) is the W3C recommendation to define integrity constraints over RDF data.
As soon as the KG contains private data, the access needs to be controlled. SHACL-ACL is an approach controlling the access of SPARQL query execution by access control policies expressed in SHACL.

## Why SHACL-ACL?

Private data on the Web needs to be protected against unauthorized access. The _Open Digital Rights Language_ (ODRL) [\[4\]](#4) is designed for that purpose. However, the evaluation of ODRL policies is not explained in the specification and, hence, the semantics of different implementations may not be the same. Additionally, only data from the policy or known to the evaluation system can be considered for reaching a conclusion whether the access should be granted or denied. SHACL-ACL relies exclusively on concepts and techniques well-known in the Semantic Web community. Using SHACL to specify the access control policies gives the benefit of a clear semantics for the policy evaluation. Furthermore, utilizing the _RDF Mapping Language_ (RML) [\[5\]](#5), SHACL-ACL can consider external data during the decision-making process.

## How does SHACL-ACL work?

When receiving a SPARQL query, SHACL-ACL creates a virtual KG with the data necessary for the policy validation. In order to create said virtual KG, RML mappings are utilized. The access control policies specified in SHACL are then validated against the virtual KG. If the validation result contains at least one violation, the access is denied and an error will be returned. If the virtual KG confirms with the access control policies, i.e., no violations were detected, the access is granted. When the access is granted, the SPARQL query is executed and the query result will be returned.

---

## Demo: Preparation

First, the requirements are installed, the code of SHACL-ACL is imported, and some reoccurring variables are set.
SHACL-ACL uses the following tools:
  - a modified version of the SDM-RDFizer [\[6\]](#6) for collecting JSON data from Web APIs and returning an RDFLib graph (i.e., a virtual KG) instead of generating a file with the RDF triples
  - Trav-SHACL [\[7\]](#7) for validating the SHACL shape schema
  - DeTrusty [\[8\]](#8) for executing the SPARQL query if the access is granted

In [1]:
%%capture
pip install --no-cache-dir -r requirements.txt

In [2]:
import shaclacl
SCHEMA_DIR = 'shapes/'
SOURCE_DESCRIPTION = 'config/rdfmts.json'
QUERY = 'data/query.rq'

---

## Demo: Use Case Description

For this demonstration of SHACL-ACL, a SPARQL query returning the life expectancy of Germany for the last three years should be executed over the data from World Bank. The endpoint containing the World Bank data comprises 250,097,191 RDF triples. However, the following conditions need to be met in order for the access to be granted so that the query can be executed:

- The local time, i.e., where the script is executed, needs to be between 7 pm and 6 am.
- The local CPU usage must be below 30%.
- At least 80% RAM must be available in the local machine.
- The current temperature in Hannover (Germany) has to be below 25°C.
- The current humidity in Hannover (Germany) has to be at least 75%.

---

## Demo: Current Data for Policy Validation

The following code collects the necessary data for the policy validation. The gathered data is semantified, i.e., transformed into RDF, using the SDM-RDFizer. The virtual KG containing the information for the policies is then validated against a SHACL shape schema encoding above restrictions using Trav-SHACL. In the case that all requirements are met, DeTrusty will execute the query.

__Note:__ It is very likely that the current conditions violate at least one of the constraints.

In [3]:
# Let us get the current condition of the machine and store them in a CSV file
shaclacl.get_machine_condition()

# Semantify the data necessary for policy validation, generate virtual KG
virtual_kg = shaclacl.semantify_data('config/config_rdfizer.ini')

# Validate the policies over the virtual KG, check access, execute query
result_current = shaclacl.query_with_access_control(virtual_kg, QUERY, SOURCE_DESCRIPTION, SCHEMA_DIR)
if 'error' not in result_current:
    result_current = shaclacl.query_result_to_dataframe(result_current)
result_current

Beginning Knowledge Graph Creation Process.
The Process has ended.
Total execution time:  7  ms


{'error': 'Access denied! Query not executed!'}

---

## Demo: Invalid Data for Policy Validation
For this part of the demonstration we provide a KG for the policy validation that violates the access control policies. The table below shows the data used for the policy validation; violations are marked in red.<br><br>

<table>
  <tr>
    <th>Time (24h)</th>
    <th>CPU Usage (%)</th>
    <th>RAM Free (%)</th>
    <th>Temperature (°C)</th>
    <th>Humidity (%)</th>
  </tr>
  <tr>
    <td style="text-align: center; color: red;">09:09:09</td>
    <td style="text-align: center;">0.4</td>
    <td style="text-align: center; color: red;">50.50</td>
    <td style="text-align: center;">0.6</td>
    <td style="text-align: center;">99</td>
  <tr>
</table>


In [4]:
# Load the invalid KG
kg_invalid = shaclacl.graph_from_file('data/kg_invalid.ttl')

# Validate the policies over the invalid KG, check access, deny and return without executing the query
query_result = shaclacl.query_with_access_control(kg_invalid, QUERY, SOURCE_DESCRIPTION, SCHEMA_DIR)
query_result

Total execution time:  8  ms


{'error': 'Access denied! Query not executed!'}

As can be seen, the query was not executed since the access control policies were not fulfilled.

---

## Demo: Valid Data for Policy Validation

Finally, we want to demonstrate the execution of the SPARQL query in the case that all the requirements are met. For this purpose, we provide a KG for the policy validation that is valid. The table below shows the data used for the policy validation in this case.<br><br>

<table>
  <tr>
    <th>Time (24h)</th>
    <th>CPU Usage (%)</th>
    <th>RAM Free (%)</th>
    <th>Temperature (°C)</th>
    <th>Humidity (%)</th>
  </tr>
  <tr>
    <td style="text-align: center;">20:15:36</td>
    <td style="text-align: center;">20.5</td>
    <td style="text-align: center;">86.21</td>
    <td style="text-align: center;">9.1</td>
    <td style="text-align: center;">87</td>
  <tr>
</table>

In [5]:
# Load the valid KG
kg_valid = shaclacl.graph_from_file('data/kg_valid.ttl')

# Validate the policies over the invalid KG, check access, execute the query
query_result = shaclacl.query_with_access_control(kg_valid, QUERY, SOURCE_DESCRIPTION, SCHEMA_DIR)
shaclacl.query_result_to_dataframe(query_result)

Total execution time:  7  ms


2023-03-02 17:37:42,161 - DeTrusty.Wrapper.RDFWrapper - INFO - Contacting endpoint: https://labs.tib.eu/sdm/worldbank_endpoint/sparql


Unnamed: 0,year,life_exp
0,2020,80.9415
1,2019,81.2927
2,2018,80.8927


As can be seen, the access is granted and the query is executed. DeTrusty returns the query result which is shown above.

---

## Conclusion

The demonstration of SHACL-ACL shows that access control policies can be implemented in SHACL. Additionally, the use of a virtual KG generated by the SDM-RDFizer allows to gather external data to be considered during the policy evaluation. A benefit of SHACL-ACL is that it only relies on widespread concepts that are well-known within the Semantic Web community, i.e., RDF, RML, SPARQL, and SHACL.

---

## References

<a name="1">\[1\]</a> RDF Specification. 2004. URL: [https://www.w3.org/TR/2004/REC-rdf-primer-20040210/](https://www.w3.org/TR/2004/REC-rdf-primer-20040210/).

<a name="2">\[2\]</a> SPARQL Specification. 2008. URL: [https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/](https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/).

<a name="3">\[3\]</a> SHACL Specification. 2017. URL: [https://www.w3.org/TR/2017/REC-shacl-20170720/](https://www.w3.org/TR/2017/REC-shacl-20170720/).

<a name="4">\[4\]</a> ODRL Specification. 2018. URL: [https://www.w3.org/TR/2018/REC-odrl-model-20180215/](https://www.w3.org/TR/2018/REC-odrl-model-20180215/).

<a name="5">\[5\]</a> A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannes, R. Van de Walle. RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data. In: Proceedings of the Workshop on Linked Data on the Web co-located with WWW, CEUR-WS, Aachen, Germany, 2014. URL: [https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf](https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf).

<a name="6">\[6\]</a> E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal. SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. In: CIKM ’20:Proceedings of the 29th ACM International Conference on Information & Knowledge Management, ACM, New York, NY,USA, 2020. DOI: [10.1145/3340531.3412881](10.1145/3340531.3412881).

<a name="7">\[7\]</a> M. Figuera, P.D. Rohde, M.-E. Vidal. Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. In: The Web Conference, ACM, New York, NY,USA, 2021. DOI: [10.1145/3442381.3449877](10.1145/3442381.3449877).

<a name="8">\[8\]</a> P.D. Rohde. DeTrusty v0.11.2. 2023. DOI: [10.5281/zenodo.7670670](10.5281/zenodo.7670670).