# Supplementary material: Towards Pattern-based Complex Ontology Matching using SPARQL and LLM
*Ondřej Zamazal*


This Jupyter notebook is a supplementary material for the poster paper submission "**Towards Pattern-based Complex Ontology Matching using SPARQL and LLM**" by Ondřej Zamazal to **the SEMANTiCS 2024**.

The Jupyter notebook has been tested in the Colab environment.

Input files for preliminary experiment are online:
- ontologies:
  - https://oaei.ontologymatching.org/2024/conference/data/cmt.owl
  - https://oaei.ontologymatching.org/2024/conference/data/ekaw.owl
- simple alignment (correspondences):
  - https://oaei.ontologymatching.org/2024/conference/data/subset1-cmt-ekaw.rdf
- further simple alignments:
  - https://oaei.ontologymatching.org/2024/conference/data/LogMap-cmt-ekaw.rdf  
  - https://oaei.ontologymatching.org/2024/conference/data/somm1-cmt-ekaw.rdf
  - https://oaei.ontologymatching.org/2024/conference/data/reference-cmt-ekaw.rdf



## Approach

We provide a description of pattern-based pipeline along with an example targeting the alignment pattern \emph{Class by Attribute Type} (CAT) [Scharffe09](https://scholar.google.com/scholar_url?url=https://www.academia.edu/download/30806287/manuscript.pdf&hl=en&sa=T&oi=gsb-gga&ct=res&cd=0&d=621239557845776854&ei=9Ut1Zvj6B6aty9YP54e9kAc&scisig=AFWwaeZPfF1N_zmYFPxB4qOSuqY8). This pattern specifies equivalence between a class in O1 and a class in O2 restricted on its scope using existential restriction; in Manchester OWL syntax: *O1:Class1 EquivalentTo O2:Class1 and (O2:property some O2:Class2)*.

The pipeline overview:
- **Step 1**: Detection is based on a structural aspect (a pair of SPARQL queries).
- **Step 2**: Results from detecting both ontologies are joined according to the alignment pattern separately per each input correspondence.
- **Step 3**: Pattern-based template-driven verbalization to natural language (English) is applied on complex correspondence candidates to enable their validation using LLM.
- **Step 4**: Finally, LLM is used to validate whether verbalized complex correspondence candidates are (probably) positives/negatives.

While the Jupyter notebook covers three steps, the fourth step, dealing with prompting LLM, should be straightforward and implemented after further experimentation with other LLMs.

For SPARQL query we use [rdflib](https://rdflib.readthedocs.io/en/stable/) library.

In [20]:
!pip install rdflib



In [21]:
# name of alignment file
mapping = "subset1-cmt-ekaw"

# if you need to swap the order of mapping ontologies; originally False
# reverse_mapping = False
reverse_mapping = True

# ontologies names
onto1 = "cmt"
onto2 = "ekaw"

if (reverse_mapping):
  onto1, onto2 = onto2, onto1

In [22]:
# printing indexed entities
def print_indexed_entities(entity1, entityNext1):
  for ent in entityNext1:
    print(entity1[ent])

Getting simple correspondences from input alignment:

In [23]:
from rdflib import Graph
import re
from urllib.parse import urlparse

# function for getting better textual names for entities, also simple tokenization
def get_local_name(url):
  local_name = urlparse(url).fragment.rstrip('>')
  if("_" in local_name):
    # separator undescore
    split_local_name = local_name.split("_")
    sentence = ' '.join(split_local_name).lower()
    print(sentence)
    return(sentence)
  else:
    #de-camel-casing
    split_local_name = re.sub(r'([a-z])([A-Z])', r'\1 \2', local_name).lower()
    print(split_local_name)
    return split_local_name

# Load your OWL ontology into an RDF graph
g = Graph()

g.parse(f"https://oaei.ontologymatching.org/2024/conference/data/{mapping}.rdf")

# SPARQL query
# ?bnode a  align:Cell.
#
query1 = """
PREFIX align: <http://knowledgeweb.semanticweb.org/heterogeneity/alignment#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX cmt: <http://cmt#>

SELECT DISTINCT ?entity1 ?entity2 ?measure

WHERE {

    ?bnode a  align:Cell .
  	?bnode align:entity1 ?entity1 .
  	?bnode align:entity2 ?entity2 .
  	?bnode align:measure ?measure .

}

"""

# Execute the SPARQL query
results = g.query(query1)
entity1 = []
entity2 = []

entitiesIndex = []

# Print out the results
for row in results:
    #print(row)
    # Extract string representations of the RDF terms
    s1_str = row.entity1.n3()
    s2_str = row.entity2.n3()

    # Collect the extracted strings'
    entity1.append(s1_str)
    entity2.append(s2_str)

if (reverse_mapping):
  entity1, entity2 = entity2, entity1

print(entity1)

print(entity2)

['<http://ekaw#Paper>', '<http://ekaw#Person>']
['<http://cmt#Paper>', '<http://cmt#Person>']


Limiting input correspondences on these which have subclasses in onto2.

In [24]:
# meeting condition on one side subclass
#index to simple mapping
entitiesIndex = []

# Load your OWL ontology into an RDF graph
g = Graph()
g.parse(f"https://oaei.ontologymatching.org/2024/conference/data/{onto2}.owl", format="xml")

for i, ent in enumerate(entity2):

  query1 = """
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
  PREFIX owl: <http://www.w3.org/2002/07/owl#>
  PREFIX ekaw: <http://cmt#>

  ASK

  {
      ?ent1 a owl:Class .
      ?class1 rdfs:subClassOf ?ent1 .
  }

  """

  query1 = query1.replace("?ent1", ent)

  # Execute the SPARQL query
  results = g.query(query1)

  # meeting condition on one side subclass
  if (bool(results)):
    entitiesIndex.append(i)

#TODO smazat
#print(entity2)
#print(entitiesIndex)
#print(entity2[entitiesIndex[0]])
#print(entity1[entitiesIndex[0]])

### Step 1
Detection is based on a structural aspect (a pair of SPARQL queries).

In [25]:
#resulted_sentences from query
all_resulted_sentences_onto1 = []
all_resulted_sentences_OWL_onto1 = []
#21-06-24
all_resulted_raw_onto1 = []
#updated indices
entitiesIndexUpdate = []

# Load your OWL ontology into an RDF graph
g = Graph()
g.parse(f"https://oaei.ontologymatching.org/2024/conference/data/{onto1}.owl", format="xml")

for i, ent in enumerate(entitiesIndex):
  # SPARQL query for each entity from simple correspondences
  query1 = """
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
  PREFIX owl: <http://www.w3.org/2002/07/owl#>
  PREFIX cmt: <http://cmt#>

  SELECT DISTINCT ?class1 ?property1 ?class2 ?class3

  WHERE {

        ?ent1 a owl:Class .
        ?property1 rdfs:domain ?ent1 .
        ?property1 rdfs:range ?class2 .
        ?class3 rdfs:subClassOf ?class2

  }

  """
  # resulted candidates/sentences for one entity from simple correspondences
  resulted_sentences = []
  #in Manchester OWL syntax
  resulted_sentences_OWL = []
  #21-06-24, in raw
  resulted_raw = []

  # SPARQL query for each entity from simple correspondences (e.g.,?ent1=Person)
  query1 = query1.replace("?ent1", entity1[ent])
  print(i)

  # Execute the SPARQL query
  results = g.query(query1)

  # Processing the results
  for row in results:
      # Extract string representations of the RDF terms
      #s1_str = row.class1.n3()
      s2_str = row.class2.n3()
      s3_str = row.property1.n3()
      s4_str = row.class3.n3()

      print("?ent1="+entity1[ent]+";?class1="+s2_str+";?class2="+s4_str+";?property1="+s3_str)

      #21-06-24
      resulted_raw.append("O1:ent1="+entity1[ent]+";O1:property1="+s3_str+";O1:class2="+s4_str)

      # textual output improvement
      if ("has" in s3_str):
        sentence = " is the same as "+get_local_name(entity1[ent])+" which "+get_local_name(s3_str)+" "+get_local_name(s4_str)
      else:
        sentence = " is the same as "+get_local_name(entity1[ent])+" which is "+get_local_name(s3_str)+" "+get_local_name(s4_str)
      print(sentence)

      resulted_sentences.append(sentence.lower())

      #in Manchester OWL syntax
      sentence = " EquivalentTo "+entity1[ent]+" and ("+s3_str+" some "+s4_str+")"
      resulted_sentences_OWL.append(sentence)

  if(resulted_sentences != []):
    entitiesIndexUpdate.append(entitiesIndex[i])
  else:
    print("empty")
    continue
  all_resulted_sentences_onto1.append(resulted_sentences)
  all_resulted_sentences_OWL_onto1.append(resulted_sentences_OWL)
  #21-06-24
  all_resulted_raw_onto1.append(resulted_raw)

# control printing
print_indexed_entities(entity1, entitiesIndexUpdate)
print_indexed_entities(entity2, entitiesIndexUpdate)


0
?ent1=<http://ekaw#Paper>;?class1=<http://ekaw#Review>;?class2=<http://ekaw#Positive_Review>;?property1=<http://ekaw#hasReview>
paper
has review
positive review
 is the same as paper which has review positive review
?ent1=<http://ekaw#Paper>;?class1=<http://ekaw#Review>;?class2=<http://ekaw#Neutral_Review>;?property1=<http://ekaw#hasReview>
paper
has review
neutral review
 is the same as paper which has review neutral review
?ent1=<http://ekaw#Paper>;?class1=<http://ekaw#Review>;?class2=<http://ekaw#Negative_Review>;?property1=<http://ekaw#hasReview>
paper
has review
negative review
 is the same as paper which has review negative review
?ent1=<http://ekaw#Paper>;?class1=<http://ekaw#Possible_Reviewer>;?class2=<http://ekaw#PC_Member>;?property1=<http://ekaw#hasReviewer>
paper
has reviewer
pc member
 is the same as paper which has reviewer pc member
1
?ent1=<http://ekaw#Person>;?class1=<http://ekaw#Document>;?class2=<http://ekaw#Abstract>;?property1=<http://ekaw#authorOf>
person
author

SPARQL querying in the second ontology.

In [26]:
# resulted_sentences (candidates) from query
all_resulted_sentences_onto2 = []
all_resulted_sentences_OWL_onto2 = []
#21-06-24
all_resulted_sentences_raw_onto2 = []

# Load your OWL ontology into an RDF graph
g = Graph()
g.parse(f"https://oaei.ontologymatching.org/2024/conference/data/{onto2}.owl", format="xml")

for ent in entitiesIndexUpdate:
  # SPARQL query
  # indirect subclassof relation using * it works but it includes the class itself
  query1 = """
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
  PREFIX owl: <http://www.w3.org/2002/07/owl#>
  PREFIX cmt: <http://cmt#>

  SELECT DISTINCT ?class1

  WHERE {

        ?ent1 a owl:Class .
        ?class1 rdfs:subClassOf* ?ent1

  }

  """
  #resulted sentences for one entity from simple input correspondences, e.g., Person
  resulted_sentences = []
  #in Manchester OWL syntax
  resulted_sentences_OWL = []
  #21-06-24
  resulted_sentences_raw = []

  # SPARQL query for each entity from simple correspondences (e.g.,?ent1=Person)
  query1 = query1.replace("?ent1", entity2[ent])
  #print(query1)

  # Execute the SPARQL query
  results = g.query(query1)

  # Print out the results
  for row in results:
      # Extract string representations of the RDF terms
      s1_str = row.class1.n3()
      # omitting the same entities such as Person = Person and ...
      if (entity2[ent]==s1_str):
        continue

      print("?ent1="+entity2[ent]+";?class1="+s1_str)
      #21-06-24
      resulted_sentences_raw.append("O2:class1="+s1_str)

      sentence = get_local_name(s1_str)
      resulted_sentences.append(sentence.lower())

      #in Manchester OWL syntax
      resulted_sentences_OWL.append(s1_str)

  all_resulted_sentences_onto2.append(resulted_sentences)
  all_resulted_sentences_OWL_onto2.append(resulted_sentences_OWL)
  #21-06-24
  all_resulted_sentences_raw_onto2.append(resulted_sentences_raw)

# control printing
print_indexed_entities(entity1, entitiesIndexUpdate)
print_indexed_entities(entity2, entitiesIndexUpdate)

?ent1=<http://cmt#Paper>;?class1=<http://cmt#PaperAbstract>
paper abstract
?ent1=<http://cmt#Paper>;?class1=<http://cmt#PaperFullVersion>
paper full version
?ent1=<http://cmt#Person>;?class1=<http://cmt#ProgramCommitteeMember>
program committee member
?ent1=<http://cmt#Person>;?class1=<http://cmt#ProgramCommitteeChair>
program committee chair
?ent1=<http://cmt#Person>;?class1=<http://cmt#ConferenceMember>
conference member
?ent1=<http://cmt#Person>;?class1=<http://cmt#ConferenceChair>
conference chair
?ent1=<http://cmt#Person>;?class1=<http://cmt#Author>
author
?ent1=<http://cmt#Person>;?class1=<http://cmt#Co-author>
co-author
?ent1=<http://cmt#Person>;?class1=<http://cmt#AuthorNotReviewer>
author not reviewer
?ent1=<http://cmt#Person>;?class1=<http://cmt#Reviewer>
reviewer
?ent1=<http://cmt#Person>;?class1=<http://cmt#Meta-Reviewer>
meta-reviewer
?ent1=<http://cmt#Person>;?class1=<http://cmt#AssociatedChair>
associated chair
?ent1=<http://cmt#Person>;?class1=<http://cmt#ExternalReview

In the following code there are covered the next steps:

### Step 2
Results from detecting both ontologies are joined according to the alignment pattern separately per each input correspondence.

### Step 3
Pattern-based template-driven verbalization to natural language (English) is applied on complex correspondence candidates to enable their validation using LLM. Similarly, for Manchester OWL syntax.

In [27]:
results_OWL = []
results_sentences = []
#21-06-24
results_raw = []

for i in range(len(entitiesIndexUpdate)):
  #21-06-24
  #for raw - step 2
  for ent1 in all_resulted_sentences_raw_onto2[i]:
    for j in range(len(all_resulted_raw_onto1[i])):
      #print(ent1 + all_resulted_sentences_OWL_onto1[i][j])
      #output.append()
      results_raw.append(ent1 +"=="+all_resulted_raw_onto1[i][j])
  #for Manchester OWL syntax - step 3
  for ent1 in all_resulted_sentences_OWL_onto2[i]:
    for j in range(len(all_resulted_sentences_OWL_onto1[i])):
      results_OWL.append(ent1 + all_resulted_sentences_OWL_onto1[i][j])
  #for natural language - step 3
  for ent1 in all_resulted_sentences_onto2[i]:
    for j in range(len(all_resulted_sentences_onto1[i])):
      results_sentences.append(ent1.capitalize() + all_resulted_sentences_onto1[i][j]+".")

In [28]:
print("Step 2")
print("Joined results according to the alignment pattern separately per each input correspondence:")
for sentence in results_raw:
  print(sentence)

Step 2
Joined results according to the alignment pattern separately per each input correspondence:
O2:class1=<http://cmt#PaperAbstract>==O1:ent1=<http://ekaw#Paper>;O1:property1=<http://ekaw#hasReview>;O1:class2=<http://ekaw#Positive_Review>
O2:class1=<http://cmt#PaperAbstract>==O1:ent1=<http://ekaw#Paper>;O1:property1=<http://ekaw#hasReview>;O1:class2=<http://ekaw#Neutral_Review>
O2:class1=<http://cmt#PaperAbstract>==O1:ent1=<http://ekaw#Paper>;O1:property1=<http://ekaw#hasReview>;O1:class2=<http://ekaw#Negative_Review>
O2:class1=<http://cmt#PaperAbstract>==O1:ent1=<http://ekaw#Paper>;O1:property1=<http://ekaw#hasReviewer>;O1:class2=<http://ekaw#PC_Member>
O2:class1=<http://cmt#PaperFullVersion>==O1:ent1=<http://ekaw#Paper>;O1:property1=<http://ekaw#hasReview>;O1:class2=<http://ekaw#Positive_Review>
O2:class1=<http://cmt#PaperFullVersion>==O1:ent1=<http://ekaw#Paper>;O1:property1=<http://ekaw#hasReview>;O1:class2=<http://ekaw#Neutral_Review>
O2:class1=<http://cmt#PaperFullVersion>==O1

In [29]:
print("Step 3")
print("Verbalized complex correspondence candidates in English:")
for sentence in results_sentences:
  print(sentence)

Step 3
Verbalized complex correspondence candidates in English:
Paper abstract is the same as paper which has review positive review.
Paper abstract is the same as paper which has review neutral review.
Paper abstract is the same as paper which has review negative review.
Paper abstract is the same as paper which has reviewer pc member.
Paper full version is the same as paper which has review positive review.
Paper full version is the same as paper which has review neutral review.
Paper full version is the same as paper which has review negative review.
Paper full version is the same as paper which has reviewer pc member.
Program committee member is the same as person which is author of abstract.
Program committee member is the same as person which is author of review.
Program committee member is the same as person which is author of multi-author volume.
Program committee member is the same as person which is author of web site.
Program committee member is the same as person which is a

In [30]:
print("Complex correspondence candidates in Manchester OWL syntax:")
for sentence in results_OWL:
  print(sentence)

Complex correspondence candidates in Manchester OWL syntax:
<http://cmt#PaperAbstract> EquivalentTo <http://ekaw#Paper> and (<http://ekaw#hasReview> some <http://ekaw#Positive_Review>)
<http://cmt#PaperAbstract> EquivalentTo <http://ekaw#Paper> and (<http://ekaw#hasReview> some <http://ekaw#Neutral_Review>)
<http://cmt#PaperAbstract> EquivalentTo <http://ekaw#Paper> and (<http://ekaw#hasReview> some <http://ekaw#Negative_Review>)
<http://cmt#PaperAbstract> EquivalentTo <http://ekaw#Paper> and (<http://ekaw#hasReviewer> some <http://ekaw#PC_Member>)
<http://cmt#PaperFullVersion> EquivalentTo <http://ekaw#Paper> and (<http://ekaw#hasReview> some <http://ekaw#Positive_Review>)
<http://cmt#PaperFullVersion> EquivalentTo <http://ekaw#Paper> and (<http://ekaw#hasReview> some <http://ekaw#Neutral_Review>)
<http://cmt#PaperFullVersion> EquivalentTo <http://ekaw#Paper> and (<http://ekaw#hasReview> some <http://ekaw#Negative_Review>)
<http://cmt#PaperFullVersion> EquivalentTo <http://ekaw#Paper>

## Work in Progress and Future Work

We are working on covering step 4 in the code. It will be ready after further experimentation with other LLMs.

We are working on automatic gathering context for contextual question prompting.

Supplementary material for the poster paper submission (**Towards Pattern-based Complex Ontology Matching using SPARQL and LLM**) to **SEMANTiCS 2024** by Ondřej Zamazal.