Skip to content

darrylteo/autoKE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semi-Automated Knowledge Engineering and Process Mapping for Total Airport Management

This repository provides a framework for scaffolded symbolic fusion, bridging the gap between unstructured aviation documentation and machine-readable Knowledge Graphs (KGs). By leveraging expert-curated ontological structures to guide Large Language Model (LLM) extraction, this pipeline enables the automated synthesis of domain-grounded operational models with high semantic fidelity and verifiable data provenance.


Workflow Overview

The pipeline transitions from unstructured technical corpora to deterministic process models through three primary stages:

  1. Schema-Guided Extraction: Utilizing the LangExtract library to perform document-level inference constrained by formal class hierarchies and expert-curated KGs.
  2. Symbolic Synthesis: Mapping extracted semantic triples into an OWL formatted KG.
  3. Process Visualization: Automated derivation of standardized swimlane diagrams from the synthesized KG for operational validation and stakeholder alignment.

Repository Structure

./data/

Contains the source documentation and text conversions used for extraction.

  • milestones.pdf: The original source document (Eurocontrol A-CDM manual).
  • milestones.txt: The full source document converted from PDF to text via PDF24.
  • milestone{i}.txt: Segmented text files used for modular processing.
  • extracted_triples_{i}.csv: Raw semantic triples generated by the LLM. Includes columns for data provenance (Source Text column) and expert-validated ground truth (Good column).

./results/

Contains the output of the extraction process.

  • aviation_KG.owl: The constructed KG. Best viewed and edited using Protégé.
  • airport_swimlanes.drawio: The swimlane diagrams derived from the KG. Best viewed via draw.io.

./src/

  • get_triples_lang.py The main extraction engine that implements KG guided prompts extraction to ensure structural determinism and symbolic fusion. Page-level implementation is provided. Requires an LLM API key to run code.
  • triples_to_KG_and_swimlanes.py: Python script to deterministically formalize knowledge triples into a knowlegde graph (.owl format) and generate swimlane diagrams (.drawio format).
  • utils/: Helper functions.

requirements.txt: Contains a minimal installation list of required libraries.


Citation

If you use this code or data in your research, please cite:

@inproceedings{teo2026semi, title={Semi-Automated Knowledge Engineering and Process Mapping for Total Airport Management}, author={Teo, Darryl and Sam, Adharsha and Koh, Chuan Shen Marcus and Nagi, Rakesh and Ribeiro, Nuno Antunes}, booktitle={Proceedings of the 29th International Conference on Information Fusion (FUSION)}, year={2026} }

About

Supporting code for the paper: Semi-Automated Knowledge Engineering and Process Mapping for Total Airport Management

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages