# Demo: InterpretME over the French Royalty KG

<center><img src="Image/InterpretME-Train_Deduce_Explain.png" alt="InterpretME" />

<font size=3> Proposing an analytical tool, **InterpretME**, that integrates the data either from KGs or datasets, i.e., CSV and JSON format aiming to exploit the semantic information, prediction, interpretation as well as traversing about an entity and generate the InterpretME KG with all traced metadata about the trained model to provide the user more meaningful and trustable interpretations. 

`Overview`: This an example on how **InterpretME** can be used to interpret the prediction and trace back a particular target entity. The KG of the *French Royalty Benchmark* is a fully curated subset of DBpedia; for each person we added the class `dbo:Person` as well as different properties like the number of children or predecessors, and further triple related counts. Here, the predictive task is a binary classification to predict whether a person has a spouse. The statistics of the *French Royalty KG* are presented in the following:

| #triples | #entities | #predicates | #objects | #triples / #entities |
| :-: | :-: | :-: | :-: | :-: |
| 31,599 | 3,439 | 133 | 4,390 | 9.18 |

Importing required modules from **InterpretME** library:

* `pipeline()`: Run the predictive tasks and interpretation tools (e.g., LIME).
* `plots.sampling()`: Generates plot of the target class distribution.
* `plots.feature_importance()`: Creates bar plot of important features.
* `plots.decision_trees()`: Generates trees of predictions made by predictive model.
* `plots.constraints_decision_trees()`: Trees are incorporated with SHACL validation results.
* `federated()`: Query the *InterpretME KG* and the input KG to trace back all properties of a target entity.


In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
from IPython.display import SVG, display
import os
def show_svg(rel_path):
    if type(rel_path) == list:
        for path in rel_path:
            show_svg(path)
    else:
        display(SVG(url='file://' + os.getcwd() + '/' + rel_path))

from InterpretME import pipeline, plots

**InterpretME** takes a JSON file as input (i.e., *URL of the input KG or path to dataset, features’ definition, target class definition, SHACL constraints, sampling strategy, class definition*); a `SPARQL query` is generated based on the feature definition given by the user and the query is used to retrieve the application domain data from the input KG.

Given the input **KG** that integrates the features’ and class target definitions about French Royalty; and their SHACL constraints. The features’ definition is classified into independent and dependent variables; later used in the predictive modeling pipeline. The features can be defined in the following format:

```JSON
{
  "Endpoint": "https://labs.tib.eu/sdm/InterpretME-og/sparql",
  "Index_var": "Person",
  "Independent_variable": {
    "x": "?x a <http://dbpedia.org/ontology/Person>.\n ",
    "gender": "Optional { ?x <http://dbpedia.org/ontology/gender> ?gender }"},
  "Dependent_variable": {
    "HasSpouse": "{ SELECT ?x, ((?partners > 0) AS ?HasSpouse) WHERE { ?x <http://dbpedia.org/ontology/numSpouses> ?partners . }} \n"}
  "Constraints": [{
    "name": "C1",
    "inverted": false,
    "shape_schema_dir": "shapes/french_royalty/spouse/rule1",
    "target_shape": "Spouse"
  }],
  "classes": {
    "NoSpouse": "0",
    "HasSpouse": "1"
  },
  "sampling_strategy": "undersampling",
  "number_important_features": 5,
  "cross_validation_folds": 5,
  "test_split": 0.3,
  "model": "Random Forest"
}
```

<div class="alert alert-info"><b>Note:</b> As of v1.2.0, InterpretME is also able to work with CSV and JSON datasets. See `example_csv_french_royalty.json` for an example configuration for datasets.</div>

The purpose of `pipeline()` is to assemble several components of **InterpretME** [1] that can be evaluated together while setting different parameters. First, it starts with evaluating the SHACL constraints over the nodes of input KGs and generates a validation report per target entity. The *data preprocessing* step includes transforming the data extracted from the input KG into a form that can be used to train the predictive pipeline. To avoid imbalance, the sampling strategy defined by the user is deployed. The *predictive model building* step can be achieved based on user preferences. Given the French Royalty preprocessed data, automated tools are utilized for models (e.g., *Ensemble Learning*) and to optimize the hyperparameter selection (e.g., *AutoML*) for predictive tasks. Here, the automated predictive model can perform stratified shuffle split cross-validation with *Random Forest*, *Adaboost Classifier*, or *Gradient Boosting Classifier* and identify the relevant features; they are used to train a *Decision Tree* classifier to predict and visualize the outcomes. The current version of InterpretME uses *LIME* [2] to have local interpretations of the target entities. *LIME* also identifies the top-10 relevant features for the target entity and assigns weights. The traced metadata collected from the trained predictive model is later used for creation of the InterpretME KG. To incorporate the metadata gathered from the predictive pipeline into the **InterpretME KG**, RML mappings are defined. SDM-RDFizer [3], an efficient RML engine for creating knowledge graphs, semantifies the metadata using the RML mappings. InterpretME extends the ML schema vocab, it is avaiable on [VoCol](http://ontology.tib.eu/InterpretME/).  

In [2]:
results = pipeline(path_config='./example_jupyter.json', lime_results='./output/LIME')

InterpretME Pipeline:   0%|          | 0/5 [00:00<?, ?task/s]

## Demonstration of Use Cases

InterpretME aims to comprehend use cases that allow attendees for understanding of both the characteristics of a target entity in the predictive model and in the input KGs. We demonstrate the following use cases: 

<div class="alert-success">
    <font size="5"> Unveiling Important Features </font>
</div>

**InterpretME** uses feature significance analysis to give users with insights into the most important characteristics of an ML model. This use case assists users in determining which input features have the most impact on the model's decisions. Users can get insights into the elements influencing the model's decision-making process by selecting the most relevant features and its contribution.

Executing following *SPARQL* query over the **InterpretME KG** retrieves the most important features of a target entity (e.g., `Louis_XIV`) which is interpreted by tool, LIME, and with all feature contribution for ML models' prediction. Currently, InterpretME is customized for LIME, in future release, will be extended with other interpretable tools.

In [3]:
from InterpretME.federated_query_engine import configuration, federated

In [4]:
input_query = """
SELECT DISTINCT ?LIMEEntity ?InterpretableTool ?feature ?value ?targetClass 
WHERE {
    SERVICE <https://labs.tib.eu/sdm/InterpretME-wog/sparql>{
        FILTER( ?LIMEEntity=<http://interpretme.org/entity/Louis_XIV> )
        ?entity a <http://interpretme.org/vocab/TargetEntity> .
        ?entity <http://www.w3.org/2002/07/owl#sameAs> ?sourceEntity .
        ?entity <http://interpretme.org/vocab/hasEntity> ?LIMEEntity .
        ?entity <http://interpretme.org/vocab/hasInterpretedFeature> ?interpretedFeature .
        ?interpretedFeature <http://www.w3.org/ns/prov#hasGeneratedBy> ?InterpretableTool .
        ?interpretedFeature <http://interpretme.org/vocab/hasFeatureWeight> ?featureWeight .
        ?entity <http://interpretme.org/vocab/hasEntityClassProbability> ?classProb .
        ?classProb <http://interpretme.org/vocab/hasClass> ?targetClass .
        FILTER(?targetClass = <http://interpretme.org/entity/1>)
        ?featureWeight <http://interpretme.org/vocab/hasFeature> ?feature .
        ?featureWeight <http://interpretme.org/vocab/hasWeight> ?value .
    }
} order by desc (?value)
"""

In [5]:
interpretme_endpoint = 'https://labs.tib.eu/sdm/InterpretME-wog/sparql'
input_endpoint = 'https://labs.tib.eu/sdm/InterpretME-og/sparql'

In [6]:
config = configuration(interpretme_endpoint, input_endpoint)
query_answer = federated(input_query, config)
query_answer

2023-06-29 15:33:15,923 - rdfmts - INFO - https://labs.tib.eu/sdm/InterpretME-og/sparql: ['http://dbpedia.org/ontology/Person']
2023-06-29 15:33:15,925 - rdfmts - INFO - http://dbpedia.org/ontology/Person
2023-06-29 15:33:16,691 - rdfmts - INFO - ----- DONE in 0.8822085857391357 seconds!-----
2023-06-29 15:33:16,914 - DeTrusty.Wrapper.RDFWrapper - INFO - Contacting endpoint: https://labs.tib.eu/sdm/InterpretME-wog/sparql
2023-06-29 15:33:16,914 - DeTrusty.Wrapper.RDFWrapper - INFO - Contacting endpoint: https://labs.tib.eu/sdm/InterpretME-wog/sparql


Unnamed: 0,targetEntity,feature,value,targetClass
0,http://interpretme.org/entity/Louis_XIV,http://interpretme.org/entity/objects_1%20%3C%...,0.749627,http://interpretme.org/entity/1
1,http://interpretme.org/entity/Louis_XIV,http://interpretme.org/entity/subjects_1%20%3C...,0.207415,http://interpretme.org/entity/1
2,http://interpretme.org/entity/Louis_XIV,http://interpretme.org/entity/childs_0%20%3C%3...,0.0,http://interpretme.org/entity/1
3,http://interpretme.org/entity/Louis_XIV,http://interpretme.org/entity/gender_female%20...,-0.0292482,http://interpretme.org/entity/1
4,http://interpretme.org/entity/Louis_XIV,http://interpretme.org/entity/preds_1%20%3C%3D...,-0.000241988,http://interpretme.org/entity/1


<div class="alert-success">
    <font size="5"> Insights Beyond Numbers </font>
</div>

**InterpretME** seek to provide users with a more in-depth understanding of how the ML model makes predictions and how a target entity is contextually connected in the Input KG, going beyond simple numerical outputs. Altough interpretable tools, such as LIME [2] offers interpretations for a particular target entity. This use case presents a *SPARQL* query over the InterpretME KG and the Input KG, where LIME provides the interpretations for two entities (e.g., `Louis_XIV` and `Philipp_III_of_Spain`) while both entities in the input KG presents same concept. Thus, InterpetME highlights the significance of considering semantics inside the ML model and interpretable tool, such as LIME.

In [None]:
input_query = """
SELECT DISTINCT ?predicate ?object ?InterpretableTool ?feature ?value ?targetClass ?probability
WHERE {
    SERVICE <https://labs.tib.eu/sdm/InterpretME-wog/sparql> {
        FILTER( ?LIMEentity=<http://interpretme.org/entity/Louis_XIV> )
        ?entity a <http://interpretme.org/vocab/TargetEntity> .
        ?entity <http://www.w3.org/2002/07/owl#sameAs> ?sourceEntity .
        ?entity <http://interpretme.org/vocab/hasEntity> ?LIMEentity.
        ?entity <http://interpretme.org/vocab/hasInterpretedFeature> ?interpretedFeature .
        ?interpretedFeature <http://interpretme.org/vocab/hasFeatureWeight> ?featureWeight .
        ?interpretedFeature <http://www.w3.org/ns/prov#hasGeneratedBy> ?InterpretableTool .
        ?entity <http://interpretme.org/vocab/hasEntityClassProbability> ?classProb .
        ?classProb <http://interpretme.org/vocab/hasPredictionProbability> ?probability .
        ?classProb <http://interpretme.org/vocab/hasClass> ?targetClass .
        ?featureWeight <http://interpretme.org/vocab/hasFeature> ?feature .
        ?featureWeight <http://interpretme.org/vocab/hasWeight> ?value .
    }
"""

In [None]:
interpretme_endpoint = 'https://labs.tib.eu/sdm/InterpretME-wog/sparql'
input_endpoint = 'https://labs.tib.eu/sdm/InterpretME-og/sparql'

In [None]:
config = configuration(interpretme_endpoint, input_endpoint)
query_answer = federated(input_query, config)
query_answer

<div class="alert-success">
    <font size="5"> Validity of a Target Entity </font>
</div>

InterpretME resorts to [TravSHACL](https://github.com/SDM-TIB/Trav-SHACL) [5] for ensuring the validity of a target entity. SHACL constraints are defined to check whether a target entity satisfy the domain constraints. For instance, constraint: ***If two people have a child, they are likely to have a spouse***. This gives the user a broader perspective on interpreting ML model decisions for entities that violate domain constraints.

In [7]:
input_query ="""
SELECT DISTINCT ?sourceEntity ?SHACLSchema ?SHACLShape ?SHACLConstraint  ?SHACLValidationResult
WHERE {
    SERVICE <https://labs.tib.eu/sdm/InterpretME-wog/sparql> {
        FILTER( ?LIMEentity=<http://interpretme.org/entity/Louis_XIV> )
        ?entity a <http://interpretme.org/vocab/TargetEntity> .
        ?entity <http://www.w3.org/2002/07/owl#sameAs> ?sourceEntity .
        ?entity <http://interpretme.org/vocab/hasEntity> ?LIMEentity .
        ?entity <http://interpretme.org/vocab/hasSHACLIC> ?SHACLIC .
        ?SHACLIC <http://interpretme.org/vocab/hasSHACLSchema> ?SHACLSchema .
        ?SHACLIC <http://interpretme.org/vocab/hasSHACLShape> ?SHACLShape .
        ?SHACLIC <http://interpretme.org/vocab/hasSHACLConstraint> ?SHACLConstraint .
        ?SHACLIC <http://interpretme.org/vocab/hasSHACLResult> ?SHACLValidationResult .
    }
}
"""

In [8]:
interpretme_endpoint = 'https://labs.tib.eu/sdm/InterpretME-wog/sparql'
input_endpoint = 'https://labs.tib.eu/sdm/InterpretME-og/sparql'

In [9]:
config = configuration(interpretme_endpoint, input_endpoint)
query_answer = federated(input_query, config)
query_answer

2023-06-29 15:36:02,820 - rdfmts - INFO - https://labs.tib.eu/sdm/InterpretME-og/sparql: ['http://dbpedia.org/ontology/Person']
2023-06-29 15:36:02,823 - rdfmts - INFO - http://dbpedia.org/ontology/Person
2023-06-29 15:36:03,714 - rdfmts - INFO - ----- DONE in 1.0069730281829834 seconds!-----
2023-06-29 15:36:03,763 - DeTrusty.Wrapper.RDFWrapper - INFO - Contacting endpoint: https://labs.tib.eu/sdm/InterpretME-wog/sparql
2023-06-29 15:36:03,763 - DeTrusty.Wrapper.RDFWrapper - INFO - Contacting endpoint: https://labs.tib.eu/sdm/InterpretME-wog/sparql
2023-06-29 15:36:03,763 - DeTrusty.Wrapper.RDFWrapper - INFO - Contacting endpoint: https://labs.tib.eu/sdm/InterpretME-wog/sparql


Unnamed: 0,sourceEntity,SHACLSchema,SHACLShape,SHACLConstraint,SHACLValidationResult
0,http://dbpedia.org/resource/Louis_XIV,http://interpretme.org/entity/../example/shape...,http://interpretme.org/entity/Spouse,http://interpretme.org/entity/C10,http://interpretme.org/entity/True
1,http://dbpedia.org/resource/Louis_XIV,http://interpretme.org/entity/../example/shape...,http://interpretme.org/entity/Spouse,http://interpretme.org/entity/C2,http://interpretme.org/entity/True
2,http://dbpedia.org/resource/Louis_XIV,http://interpretme.org/entity/../example/shape...,http://interpretme.org/entity/Spouse,http://interpretme.org/entity/C5,http://interpretme.org/entity/True
3,http://dbpedia.org/resource/Louis_XIV,http://interpretme.org/entity/../example/shape...,http://interpretme.org/entity/Spouse,http://interpretme.org/entity/C1,http://interpretme.org/entity/True
4,http://dbpedia.org/resource/Louis_XIV,http://interpretme.org/entity/../example/shape...,http://interpretme.org/entity/Spouse,http://interpretme.org/entity/C9,http://interpretme.org/entity/True
5,http://dbpedia.org/resource/Louis_XIV,http://interpretme.org/entity/../example/shape...,http://interpretme.org/entity/Spouse,http://interpretme.org/entity/C6,http://interpretme.org/entity/True
6,http://dbpedia.org/resource/Louis_XIV,http://interpretme.org/entity/../example/shape...,http://interpretme.org/entity/Spouse,http://interpretme.org/entity/C7,http://interpretme.org/entity/True
7,http://dbpedia.org/resource/Louis_XIV,http://interpretme.org/entity/../example/shape...,http://interpretme.org/entity/Spouse,http://interpretme.org/entity/C4,http://interpretme.org/entity/True
8,http://dbpedia.org/resource/Louis_XIV,http://interpretme.org/entity/../example/shape...,http://interpretme.org/entity/Spouse,http://interpretme.org/entity/C3,http://interpretme.org/entity/True
9,http://dbpedia.org/resource/Louis_XIV,http://interpretme.org/entity/../example/shape...,http://interpretme.org/entity/Spouse,http://interpretme.org/entity/C8,http://interpretme.org/entity/True


<div class="alert-success">
    <font size="5"> From Opacity to Clarity </font>
</div>

InterpretME helps user with KG federation, here [DeTrusty](https://github.com/SDM-TIB/DeTrusty) [4] a federated query engine is utilized to answer the user's questions via SPARQL queries over **both** the input KG and the *InterpretME KG*. The following query retrieves the main characteristics of target entity, for instance, `Louis_XIV` from the input KG and the InterpretME KG. Thus, InterpretME provides more comprehensive interpretations and ensure trust to better understand the ML models' decisions.

In [10]:
input_query = """
SELECT DISTINCT ?predicate ?object ?InterpretableTool ?feature ?value ?targetClass ?probability
WHERE {
    SERVICE <https://labs.tib.eu/sdm/InterpretME-wog/sparql> {
        FILTER( ?LIMEentity=<http://interpretme.org/entity/Louis_XIV> )
        ?entity a <http://interpretme.org/vocab/TargetEntity> .
        ?entity <http://www.w3.org/2002/07/owl#sameAs> ?sourceEntity .
        ?entity <http://interpretme.org/vocab/hasEntity> ?LIMEentity.
        ?entity <http://interpretme.org/vocab/hasInterpretedFeature> ?interpretedFeature .
        ?interpretedFeature <http://interpretme.org/vocab/hasFeatureWeight> ?featureWeight .
        ?interpretedFeature <http://www.w3.org/ns/prov#hasGeneratedBy> ?InterpretableTool .
        ?entity <http://interpretme.org/vocab/hasEntityClassProbability> ?classProb .
        ?classProb <http://interpretme.org/vocab/hasPredictionProbability> ?probability .
        ?classProb <http://interpretme.org/vocab/hasClass> ?targetClass .
        ?featureWeight <http://interpretme.org/vocab/hasFeature> ?feature .
        ?featureWeight <http://interpretme.org/vocab/hasWeight> ?value .
    }
    SERVICE <https://labs.tib.eu/sdm/InterpretME-og/sparql> {
        ?sourceEntity ?predicate ?object
    }
}
"""

In [11]:
interpretme_endpoint = 'https://labs.tib.eu/sdm/InterpretME-wog/sparql'
input_endpoint = 'https://labs.tib.eu/sdm/InterpretME-og/sparql'

In [12]:
config = configuration(interpretme_endpoint, input_endpoint)
query_answer = federated(input_query, config)
query_answer

2023-06-29 15:36:57,273 - rdfmts - INFO - https://labs.tib.eu/sdm/InterpretME-og/sparql: ['http://dbpedia.org/ontology/Person']
2023-06-29 15:36:57,276 - rdfmts - INFO - http://dbpedia.org/ontology/Person
2023-06-29 15:36:58,039 - rdfmts - INFO - ----- DONE in 0.8792650699615479 seconds!-----
2023-06-29 15:36:58,094 - DeTrusty.Wrapper.RDFWrapper - INFO - Contacting endpoint: https://labs.tib.eu/sdm/InterpretME-wog/sparql
2023-06-29 15:36:58,094 - DeTrusty.Wrapper.RDFWrapper - INFO - Contacting endpoint: https://labs.tib.eu/sdm/InterpretME-wog/sparql
2023-06-29 15:36:58,094 - DeTrusty.Wrapper.RDFWrapper - INFO - Contacting endpoint: https://labs.tib.eu/sdm/InterpretME-wog/sparql
2023-06-29 15:36:58,094 - DeTrusty.Wrapper.RDFWrapper - INFO - Contacting endpoint: https://labs.tib.eu/sdm/InterpretME-wog/sparql
2023-06-29 15:36:58,186 - DeTrusty.Wrapper.RDFWrapper - INFO - Contacting endpoint: https://labs.tib.eu/sdm/InterpretME-og/sparql
2023-06-29 15:36:58,186 - DeTrusty.Wrapper.RDFWrappe

Unnamed: 0,predicate,object,InterpretableTool,feature,value,targetClass,probability
0,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://dbpedia.org/ontology/Person,http://interpretme.org/entity/LIME,http://interpretme.org/entity/childs_0%20%3C%3...,0.0,http://interpretme.org/entity/0,0.0653595
1,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://dbpedia.org/ontology/Person,http://interpretme.org/entity/LIME,http://interpretme.org/entity/childs_0%20%3C%3...,0.0,http://interpretme.org/entity/1,0.934641
2,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://dbpedia.org/ontology/Person,http://interpretme.org/entity/LIME,http://interpretme.org/entity/gender_female%20...,-0.0292482,http://interpretme.org/entity/1,0.934641
3,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://dbpedia.org/ontology/Person,http://interpretme.org/entity/LIME,http://interpretme.org/entity/gender_female%20...,-0.0292482,http://interpretme.org/entity/0,0.0653595
4,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://dbpedia.org/ontology/Person,http://interpretme.org/entity/LIME,http://interpretme.org/entity/objects_1%20%3C%...,0.749627,http://interpretme.org/entity/0,0.0653595
...,...,...,...,...,...,...,...
305,http://dbpedia.org/ontology/numSubjects,9,http://interpretme.org/entity/LIME,http://interpretme.org/entity/objects_1%20%3C%...,0.749627,http://interpretme.org/entity/1,0.934641
306,http://dbpedia.org/ontology/numSubjects,9,http://interpretme.org/entity/LIME,http://interpretme.org/entity/preds_1%20%3C%3D...,-0.000241988,http://interpretme.org/entity/0,0.0653595
307,http://dbpedia.org/ontology/numSubjects,9,http://interpretme.org/entity/LIME,http://interpretme.org/entity/preds_1%20%3C%3D...,-0.000241988,http://interpretme.org/entity/1,0.934641
308,http://dbpedia.org/ontology/numSubjects,9,http://interpretme.org/entity/LIME,http://interpretme.org/entity/subjects_1%20%3C...,0.207415,http://interpretme.org/entity/0,0.0653595


------------
**References**

[1] Y. Chudasama, D. Purohit, P.D. Rohde, J. Gercke, M.E. Vidal InterpretME v1.3.1, June 2023. DOI: [10.5281/zenodo.8109278](https://zenodo.org/record/8109278)

[2] Marco Ribeiro, Sameer Singh, and Carlos Guestrin. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM. 2016. DOI: [10.1145/2939672.2939778](https://dl.acm.org/doi/10.1145/2939672.2939778).

[3] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana and M.-E. Vidal. SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. In: CIKM ’20:Proceedings of the 29th ACM International Conference on Information & Knowledge Management, ACM, New York, NY,USA, 2020. DOI: [10.1145/3340531.3412881](https://dl.acm.org/doi/pdf/10.1145/3340531.3412881).

[4] P.D. Rohde, M. Bechara and Avellino DeTrusty v0.12.2, February 2023. DOI: [10.5281/zenodo.8063472](https://zenodo.org/record/8063472).

[5] Figuera, M., Rohde, P.D., Vidal, M.E.: Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. In: The Web Conference. ACM, NY, USA (2021). DOI: [10.1145/3442381.3449877](https://arxiv.org/abs/2101.07136) 