llm4ke

Repository for Large Language Models for Knowledge Engineering (LLM4KE).

Objectives

Original idea:

How much LLM could co-contribute in the knowledge engineering process together with our usual methodology (competency questions, ontology re-use, authoring tests, etc.).

Set of questions we could investigate:

Could a LLM reverse engineer an ontology and find out what good competency questions could be derived?
Could a LLM take as input the CQ and generate parts of the ontology?
Could a LLM take as input the CQ and extend an existing ontology?
Could a LLM take as input the CQ and generate abstract patterns?
Could a LLM write an authoring test (a SPARQL query) given the ontology and the CQ?
Given a dataset and an ontology, is an LLM able to generate an adequate set of RML rules for data ingestion?
Could a LLM take as input the CQ and extend an existing ontology?

The content of this code repository accompanies the research project explained in the following paper:

@inproceedings{llm4ke-2024,
  title     = {{Can LLMs Generate Competency Questions?}},
  author    = {{Youssra Rebboud} and {Lionel Tailhardat} and {Pasquale Lisena} and {Rapha\"el Troncy}},
  booktitle = {Semantic Web - 21st International Conference (ESWC), LLMs for KE track, Hersonissos, Crete, Greece, May 26 - 30, 2024},
  year      = {2024}
}

Usage

See the Repository Structure for navigating into this repository:

llm4ke
├───data <Reference data models with their related components>
│   └─[DataModelName]
│     ├─dm <data model implementation>
│     ├─rq <set of queries>
│     └─...
├───src <Processing pipeline code>
└───...

Generating Competency Questions

We will now address the research question "1. Could a LLM reverse engineer an ontology and identify potential competency questions?" mentioned above.

The pipeline uses LangChain, and in particular Ollama.

Install Ollama from its website.
Install requirements
```
pip install -r requirements.txt
```
Download the desidered LLM (full list of available LLMs)
```
ollama pull llama2
```

Run the pipeline to generate Competency Questions for a given ontology

# Canonical form:
# python src/main.py <task> --name <OntologyName> --input <OntologyFolder> --llm <ModelName>

# Basic example for the Odeuropa ontology:
python src/main.py all_classes --name Odeuropa --input ./data/Odeuropa/ --llm llama2

Then browse the results in the out/Odeuropa/ directory. You can get the full list of available parameters with python src/main.py --help

Evaluating the LLM's Competency Questions

With the output data from the above Generating Competency Questions step,

Run the evaluation pipeline to compute similarity scores for all ontologies or a given ontology

# Canonical form:
# python src/eval.py <all|OntologyName>

# Basic example for the Odeuropa ontology with a 0.8 similarity threshold and verbose logging:
python3 ./src/eval.py Odeuropa -t 0.8 --log 10

Then browse the results in the ./results_<all|OntologyName>.json/ file.

Copyright

License

Apache.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
data		data
data_out		data_out
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Json2table.ipynb		Json2table.ipynb
LICENSE		LICENSE
LLMs.py		LLMs.py
README.md		README.md
config.yml		config.yml
related-work.md		related-work.md
requirements.txt		requirements.txt
results_all.json		results_all.json
try.py		try.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm4ke

Objectives

Usage

Generating Competency Questions

Evaluating the LLM's Competency Questions

Copyright

License

Maintainer

About

Releases 1

Packages

Contributors 4

Languages

License

D2KLab/llm4ke

Folders and files

Latest commit

History

Repository files navigation

llm4ke

Objectives

Usage

Generating Competency Questions

Evaluating the LLM's Competency Questions

Copyright

License

Maintainer

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages