This repository provides a framework for scaffolded symbolic fusion, bridging the gap between unstructured aviation documentation and machine-readable Knowledge Graphs (KGs). By leveraging expert-curated ontological structures to guide Large Language Model (LLM) extraction, this pipeline enables the automated synthesis of domain-grounded operational models with high semantic fidelity and verifiable data provenance.
The pipeline transitions from unstructured technical corpora to deterministic process models through three primary stages:
- Schema-Guided Extraction: Utilizing the
LangExtractlibrary to perform document-level inference constrained by formal class hierarchies and expert-curated KGs. - Symbolic Synthesis: Mapping extracted semantic triples into an OWL formatted KG.
- Process Visualization: Automated derivation of standardized swimlane diagrams from the synthesized KG for operational validation and stakeholder alignment.
Contains the source documentation and text conversions used for extraction.
milestones.pdf: The original source document (Eurocontrol A-CDM manual).milestones.txt: The full source document converted from PDF to text via PDF24.milestone{i}.txt: Segmented text files used for modular processing.extracted_triples_{i}.csv: Raw semantic triples generated by the LLM. Includes columns for data provenance (Source Textcolumn) and expert-validated ground truth (Goodcolumn).
Contains the output of the extraction process.
aviation_KG.owl: The constructed KG. Best viewed and edited using Protégé.airport_swimlanes.drawio: The swimlane diagrams derived from the KG. Best viewed via draw.io.
get_triples_lang.pyThe main extraction engine that implements KG guided prompts extraction to ensure structural determinism and symbolic fusion. Page-level implementation is provided. Requires an LLM API key to run code.triples_to_KG_and_swimlanes.py: Python script to deterministically formalize knowledge triples into a knowlegde graph (.owl format) and generate swimlane diagrams (.drawio format).utils/: Helper functions.
requirements.txt: Contains a minimal installation list of required libraries.
If you use this code or data in your research, please cite:
@inproceedings{teo2026semi, title={Semi-Automated Knowledge Engineering and Process Mapping for Total Airport Management}, author={Teo, Darryl and Sam, Adharsha and Koh, Chuan Shen Marcus and Nagi, Rakesh and Ribeiro, Nuno Antunes}, booktitle={Proceedings of the 29th International Conference on Information Fusion (FUSION)}, year={2026} }