**Project Knowledge Engineering**

*Group formed by : Diego Rodriguez de Roa & Fernando Gonzalez Sanz. (Erasmus Students)*

This proyect consists in connect an ontology with a Large Language Model, to answer questions in natural language, based on the given ontology.

In [13]:
import rdflib
from rdflib import Graph, RDF, RDFS, OWL

Function that receives an .owl ontology and returns the verbalised ontology in plain text

In [14]:

def verbalize_ontology(file_path):
    # Load the ontology
    g = Graph()
    g.parse(file_path)

    # Dictionary of prefixes traduction
    namespaces = {
        'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
        'rdfs': 'http://www.w3.org/2000/01/rdf-schema#',
        'owl': 'http://www.w3.org/2002/07/owl#',
        'xsd': 'http://www.w3.org/2001/XMLSchema#',
        '': 'http://www.example.org/musica#'
    }

    # Function to obtain the local name of an URI
    def local_name(uri):
        for prefix, ns in namespaces.items():
            if uri.startswith(ns):
                return uri.replace(ns, f'{prefix}:')

    verbalization = ""

    # Classes verbalization
    verbalization += "The ontology contains the following classes:\n"
    for class_uri in g.subjects(predicate=RDF.type, object=OWL.Class):
        verbalization += f"- {local_name(class_uri)}\n"

    # Properties verbalization
    verbalization += "\nThe ontology contains the following properties:\n"
    for prop_uri in g.subjects(predicate=RDF.type, object=OWL.ObjectProperty):
        domain = next(g.objects(subject=prop_uri, predicate=RDFS.domain), None)
        range_ = next(g.objects(subject=prop_uri, predicate=RDFS.range), None)
        verbalization += f"- {local_name(prop_uri)} with domain {local_name(domain)} and range {local_name(range_)}\n"

    # Instances verbalization
    verbalization += "\nThe ontology contains the following instances:\n"
    for instance_uri in g.subjects():
        if instance_uri != RDF.type:
            class_type = next(g.objects(subject=instance_uri, predicate=RDF.type), None)
            verbalization += f"- {local_name(instance_uri)} is an instance of {local_name(class_type)}"

            # Add relations
            properties = g.predicates(subject=instance_uri)
            if properties:
                verbalization += " with the following relationships: "
                for prop in properties:
                    prop_name = local_name(prop)
                    prop_value = next(g.objects(subject=instance_uri, predicate=prop), None)
                    verbalization += f"{prop_name} {local_name(prop_value)}, "
                verbalization = verbalization.rstrip(", ")
            verbalization += "\n"

    return verbalization


In [15]:
verb=verbalize_ontology("music-ontology.owl")
print(verb)

The ontology contains the following classes:
- :MusicalWork
- :Composer
- :Instrument
- :MusicalGenre

The ontology contains the following properties:
- :composedBy with domain :MusicalWork and range :Composer
- :performedWith with domain :MusicalWork and range :Instrument
- :belongsGenre with domain :MusicalWork and range :MusicalGenre
- :composesMusicOf with domain :Composer and range :MusicalGenre

The ontology contains the following instances:
- :composesMusicOf is an instance of owl:ObjectProperty with the following relationships: rdf:type owl:ObjectProperty, rdfs:domain :Composer, rdfs:range :MusicalGenre
- :SymphonyNo5 is an instance of :MusicalWork with the following relationships: rdf:type :MusicalWork, :composedBy :LudwigVanBeethoven, :belongsGenre :Classical, :performedWith :Piano
- :BohemianRhapsody is an instance of :MusicalWork with the following relationships: rdf:type :MusicalWork, :composedBy :FreddieMercury, :belongsGenre :Rock, :performedWith :Guitar
- :Piano is an i

Add git to the PATH to use the LLM (Required by the model)

In [None]:
import os
new_path = "C:\\Program Files\\Git\\bin"
os.environ['PATH'] = new_path + os.pathsep + os.environ['PATH']


In [18]:

print(os.getenv('PATH'))


C:\Program Files\Git\bin;C:\Program Files\Git\bin;C:\Program Files\Git\bin;c:\Users\Fernando_Glez_Sanz\AppData\Local\Microsoft\WindowsApps;c:\Users\Fernando_Glez_Sanz\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\Scripts;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\libnvvp;C:\xpressmp\bin;C:\xpressmp\workbench;C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files\MATLAB\R2021b\bin;C:\Program Files\SASHome\SASFoundation\9.4\ets\sasexe;C:\Program Files\SASHome\Secure\ccme4;C:\Program Files\NVIDIA Corporation\Nsight Compute 2022.3.0\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Users\Fernando_Glez_Sanz\AppData\Local\Microsoft\WindowsApps;C:\Program Files\swipl\bin;C:\Users\Fernando_Glez_Sanz\App

Importations for the LLM

In [19]:
import torch
import transformers
from huggingface_hub import *

Login with our access token of huggingface

In [20]:
login("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", add_to_git_credential=True)

Token is valid (permission: fineGrained).
Your token has been saved in your configured git credential helpers (manager,store).
Your token has been saved to C:\Users\Fernando_Glez_Sanz\.cache\huggingface\token
Login successful


*Asking for the Competency Questions*

In [9]:
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
login("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")

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": verb},
    {"role": "user", "content": "tell me 3 competency questions about this ontology"},
]

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][-1])

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to C:\Users\Fernando_Glez_Sanz\.cache\huggingface\token
Login successful


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


{'role': 'assistant', 'content': 'Here are three competency questions about this ontology:\n\n1. Who composed the musical work "Symphony No. 5"?\n\nThis competency question is related to the property :composedBy, which connects a :MusicalWork to a :Composer. The answer to this question would be :LudwigVanBeethoven.\n\n2. What instrument is typically used to perform the musical work "Bohemian Rhapsody"?\n\nThis competency question is related to the property :performedWith, which connects a :MusicalWork to an :Instrument. The answer to this question would be :Guitar.\n\n3. What genre of music does the composer :FreddieMercury typically compose?\n\nThis competency question is related to the property :composesMusicOf, which connects a :Composer to a :MusicalGenre. The answer to this question would be :Rock.'}


In [10]:
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
login("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")

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": verb},
    {"role": "user", "content": "Who composed the musical work Symphony No. 5?"},
]

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][-1])

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to C:\Users\Fernando_Glez_Sanz\.cache\huggingface\token
Login successful


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


{'role': 'assistant', 'content': 'According to the ontology, Ludwig Van Beethoven composed the musical work Symphony No. 5.'}


*Function created to answer the questions about the ontology*

In [21]:
def questionAnswering(ontology, question):

    model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
    login("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")

    pipeline = transformers.pipeline(
        "text-generation",
        model=model_id,
        model_kwargs={"torch_dtype": torch.bfloat16},
        device_map="auto",
    )

    messages = [
        {"role": "system", "content": ontology},
        {"role": "user", "content": question},
    ]

    terminators = [
        pipeline.tokenizer.eos_token_id,
        pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
    ]

    outputs = pipeline(
        messages,
        max_new_tokens=256,
        eos_token_id=terminators,
        do_sample=True,
        temperature=0.6,
        top_p=0.9,
    )
    return(outputs[0]["generated_text"][-1])


*Competency Question 1*

In [26]:
with open('competency-questions/CQ1.txt','r',encoding='utf-8') as file:
    question1=file.read()
response=questionAnswering(verb,question1)
print(question1)
print(response)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to C:\Users\Fernando_Glez_Sanz\.cache\huggingface\token
Login successful


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


*Competency Question 2*

In [24]:
with open('competency-questions/CQ2.txt','r',encoding='utf-8') as file:
    question2=file.read()
response=questionAnswering(verb,question2)
print(question2)
print(response)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to C:\Users\Fernando_Glez_Sanz\.cache\huggingface\token
Login successful


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


KeyboardInterrupt: 

*Competency Question 3*

In [None]:
with open('competency-questions/CQ3.txt','r',encoding='utf-8') as file:
    question3=file.read()
response=questionAnswering(verb,question3)
print(question3)
print(response)