# Demo: How to explain predictions for Rule-based Temporal Knowledge Graph Forecasting using CountTRuCoLa

## 0. Prerequesites


* **Install** the required packages:
  ```bash
  pip install -r requirements.txt
  ```
  *Note:* The `requirements.txt` file is located one folder above this notebook. 

* **Input files** 
  Place all inputs in the folder:
  ```
  ../files/explanations/yourexperimentname/input
  ```

    * **Rules**:
        * Filename must follow the pattern:  
          ```
          datasetname-whateveryouwant-id.txt
          ```
          Example: `tkgl-icews14-example-ids.txt`
        * Rules can be generated, e.g., from **CountTRuCoLa** (stored in `../files/rules/`).
        * Otherwise, Custom rules should be written one per line, with the format:  
          ```
          lmbda alpha phi rho kappa gamma F head_id(X,Y,T) <= body_id(X,Y,U)
          ```
          Example:  
          ` 0   0.014492753623188404   0   0   0   1   F   89(X,Y,T) <= 6(X,Y,U) `

    * **Quadruples**: (optional)
        * Plain text file `quadruples.txt` with first line `subject rel object timestep`
        * One quadruple per line: `subject relation object timestamp`
        * Quadruple formats supported:
            * **IDs**: `1 273 710 313` (example for `tkgl-icews14`)  
            * **Wildcards (`x`)**:  
                - `1 x 710 x` → all quadruples with subject=1 and object=710  
                - `1 x x x`, `x x 710 x`, etc.  
            * **Strings**:  
                - `women x police x` → matches any quadruple with *women* in subject and *police* in object (no Match case)
                - Example match: `'Women_(Australia)' Bring_lawsuit_against 'Police_(Australia)' 317`  
        * If no quadruple file is provided, the user will be prompted to enter quadruple interactively.


* **Output** 
  After running the Explainer, results are written to:  
  ```
  ../files/explanations/yourexperimentname/output
  ```



In [1]:
# imports 
import os
import explainer_utils
import sys
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "rule_based")))
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "..")))
import rule_based.eval as eval
import rule_based.utils as utils

import sys
import os

from IPython.display import display, HTML
import webbrowser

## 1. User defined configurations
Change this as required

### 1.1 Configurations for the explainer

In [None]:

exp_name = 'example' # we added some example rules for illustration. replace this with your own rules.
dataset_split = 'test' # which part of the dataset are the quadruples from? 'test' or 'val'

plot_figures_flag=True
recreate_figures = False #True # do you want to recreate the figures? If yes, all old figures will be deleted, if no, the old figures for exp_name will be reused

explain_all_quads_flag = False # explain all quads in the val or test set, instead of quads from quadruples.txt. this might be slow if you have a large dataset
max_rules_per_pred = 30 # how many rules should be shown per prediction? If there are more rules, the top ones will be shown based on their score.
num_cpus= 1 # number of cpus to use for parallel processing 

### 1.2 Configurations for the rule application. 
If you do not know what to do, leave them None. In that case, the default params will be used.

In [3]:
## If you do not know what to do, leave them None, then the default params will be used.
AGGREGATION_FUNCTION, NUM_TOP_RULES, AGGREGATION_DECAY, F_UNSEEN_NEGATIVES, Z_RULES_FACTOR, APPLY_WINDOW_SIZE, RULE_TYPE_Z_FLAG, RULE_TYPE_F_FLAG = None, None, None, None, None, None, None, None

## Otherwise, change the values here:

# ## a) rule aggregation
# AGGREGATION_FUNCTION = "noisyor" # select from "maxplus" / "noisyor" / "max" / 
# NUM_TOP_RULES = 5 #  noisy-or top-h; stops adding predicting rules to a candidate of a query if already num_top_rules; -1 means no limit
# # predicted the candidate; if all candidates are predicted by num_top_rules, rule
# # application is stopped; can be used in conjunction with "noisyor" to achieve
# AGGREGATION_DECAY = 0.8  # decay factor for the aggregation function; only used for "noisyor"; the second score is multiplied by decay, the third by decay^2 and so on; if set to 1, no decay is applied; 

# ## b) f and z-rules
# RULE_TYPE_Z_FLAG = True  # do you want to use z-rules
# RULE_TYPE_F_FLAG = True # do you want to use f-rules 
# F_UNSEEN_NEGATIVES = 30  # A constant added to the denominator when computing confidences for f-rules. 
# Z_RULES_FACTOR = 0.1 # A scaling factor Z ∈ [0, 1] applied to the score predicted by the z-rules.

# # c) window size for rule applciation
# APPLY_WINDOW_SIZE = -1 # how many previous interactions do we take into account for the rules (for apply) - recommend: set to -1, to use all timesteps; or set as large as possible





## 2. Internal explainer setup steps

Set all configurations. 
Do not change if you do not know what you are doing.

In [4]:
# prepare paths
in_folder, out_folder = explainer_utils.prepare_paths(exp_name, recreate_figures, jupyter_flag=True)

# get the data from user input: what quadruples to explain? what dataset to use?
dataset, dataset_name, testset_dict, path_rules, quadruples = explainer_utils.get_data_from_user_input(in_folder, dataset_split, explain_all_quads_flag)

# set all options
user_options = {
    "AGGREGATION_FUNCTION": AGGREGATION_FUNCTION,
    "NUM_TOP_RULES": NUM_TOP_RULES,
    "AGGREGATION_DECAY": AGGREGATION_DECAY,
    "F_UNSEEN_NEGATIVES": F_UNSEEN_NEGATIVES,
    "Z_RULES_FACTOR": Z_RULES_FACTOR,
    "APPLY_WINDOW_SIZE": APPLY_WINDOW_SIZE,
    "RULE_TYPE_Z": RULE_TYPE_Z_FLAG,
    "RULE_TYPE_F": RULE_TYPE_F_FLAG
}

options_explain = explainer_utils.set_options( user_options, num_cpus, dataset_name, config_path = os.path.join(os.path.dirname(os.getcwd()), "rule_based"))
print("Options for the explainer:")
print(options_explain)

Successfully copied ..\files\explanations\styles.css to ..\files\explanations\example\output\styles.css.
you decided to reuse the old figures which are stored in:  ..\files\explanations\example\output\figures
I will use the rules specified in the following file:
..\files\explanations\example\input\tkgl-icews14-example-ruleset-ids.txt
Operating on dataset:  tkgl-icews14
raw file found, skipping download
Dataset directory is  c:\Users\jgasting\PythonScripts\counttrucola\tgb/datasets\tkgl_icews14
loading processed file
num_rels:  230
>>> loading and indexing of dataset 2.711 seconds
>>> average number of time steps for a triple: 1.804
>>> checked order of time steps, everything is fine
I will use rules and params from file tkgl-icews14-example-ruleset-ids.txt
I will explain the quadruples specified in the following file:
..\files\explanations\example\input\quadruples.txt
I will explain in total 320 quadruples.
Namespace(params=None)
---------------------
parsed_options_dict:  {}
Using dat

## 3. Explaining

In [5]:
num_rules, rule_triple_dict = explainer_utils.explain(dataset,out_folder, dataset_split, path_rules=path_rules, options_explain=options_explain,
                         max_rules_per_pred=max_rules_per_pred, plot_figures_flag=plot_figures_flag)

print("You can find the explanations here:")
print( os.path.join(out_folder, "explanations_fancy.html"))
# Construct the path to the HTML file (e.g., explanations.html in the output folder)
html_file = os.path.join(out_folder, "explanations_fancy.html")

# Display a clickable link in the notebook
display(HTML(f'<a href="{html_file}" target="_blank">Open Explanations in Browser</a>'))


webbrowser.open_new_tab(html_file)


read rules and params from file ..\files\explanations\example\input\tkgl-icews14-example-ruleset-ids.txt
read 6 rules from file ..\files\explanations\example\input\tkgl-icews14-example-ruleset-ids.txt
MEM at beginning of apply 575
gathering z-rule statistics ...
... done with gathering z-rule statistics
gather f-rule statistics ...

MEM at beginning of f-rule aquisition: 576


100%|██████████| 452/452 [00:01<00:00, 329.14it/s]


... done with gathering f-rule statistics, found 100582 f-rules.
MEM after f-rule aquisition: 892
apply rules to the test set


100%|██████████| 120/120 [00:02<00:00, 51.61it/s]


MEM at end of apply after deleting stuff 715


  normalized_score = np.log10(score) / np.log10(max_score + 1)


HTML file has been created with new rows: ..\files\explanations\example\output\explanations_fancy.html
You can find the explanations here:
..\files\explanations\example\output\explanations_fancy.html


True

## 4. Evaluation (optional)

In [6]:
path_rankings = os.path.join(out_folder,'ranks.txt')
mrr, hits10, hits1, hits100, mrrperrel,hits1perrel, _, _ = eval.evaluate(dataset, path_rankings, 0.01, evaluation_mode=dataset_split, eval_type='random', special_evalquads=quadruples)
utils.write_ranksperrel(mrrperrel, hits1perrel, out_folder, dataset.dataset.name, 'val')

print('Evaluation results on the requested quadruples with the given rules:')
print('mrr: ', mrr)
print(f'hits@1, hits@10, hits@100: {hits1}, {hits10}, {hits100}')


loading negative test samples
>>> starting evaluation for every triple, in the  test set


324it [00:00, 5452.37it/s]             


eval mode: test
mean mrr: 0.5973109131654383
mean hits@1: 0.525
mean hits@10: 0.725
mean hits@100: 0.825
time to evaluate: 0.6070351600646973
Evaluation results on the requested quadruples with the given rules:
mrr:  0.5973109131654383
hits@1, hits@10, hits@100: 0.525, 0.725, 0.825
