This is the transformation module of the DemKG framework. Implemented as a python package, it provides the functionalities needed to transform your tabular research data into a Biolink-compliant graph in KGX format, under the design patterns and knowledge framework defined in DemKG.
KG-Transform is a low-code solution that interprets a daset descriptor schema that guides the transformer to access the data values from your dataset and generate the KG entities and edges using the implemented mapping rules under the design patterns. The descriptor provides information about the input data file, the columns containing the data values, and the mapping rules.
The supported research data artifacts are:
- Subject data,
- Medical history
- Physical examination
- Cognitive screening
- Diagnosis
- Specimen assays (ELISA, proteomics, etc.)
- Imaging analyses (Freesurfer, ASHS, etc.)
Each section has an associated entity ID, and date. If your dataset does not provide one, it will be autogenerated. The main aim of the transformer is to map data values from your dataset to concrete ontological concepts describing phenotypes and/or diseases. For this, the approach consists on describing possible values and target entities through value_mappings subsections. These mappings may be based on direct, categorical values, or numeric based on defined cutoffs or ranges.
In order to find good ontology terms to map with your data, it is important to note that the DemKG framework is primarily based on the OBO ontologies and, principles and extensions. The following ontologies are a good starting point to find terms to map your data:
- Human Phenotype Ontology (HP)
- Monarch Disease Ontology (MONDO)
- Phenotypic Quality Ontology (PATO)
- Uber Anatomy Ontology (UBERON)
- Foundational Model of Anatomy (FMA)
- Protein Ontology (PR)
- Gene Ontology (GO)
- Neuropsychological Test Ontology (NPT)
- Ontology for Biomedical Investigations (OBI)
- Mass Spectrometry Ontology (MS)
- Unit Ontology (UO)
To search for ontology terms, you can use the following tools:
- OLS: https://www.ebi.ac.uk/ols4/
- BioPortal: https://bioportal.bioontology.org/
- Ontobee: http://www.ontobee.org/
The descriptor file is a YAML file that specifies how to transform the data from your dataset file into a Knowledge Graph representation. The file consists of a set of rules that define how to extract and transform the data. See the descriptor-schema.yaml for details.
To install the package, you can use poetry:
poetry install
This will install the package and its dependencies. If you don't have poetry installed, you can follow the instructions on the official website to install it.
Alternatively, you can run with the poetry shell and exec commands:
poetry shell
KG-Transform can be used with Command Line Interface (CLI) or as a module in your code. In all cases, you have to provide the YAML descriptor file. The default is descriptor.yaml
in your CWD
.
You can run a transformation directly as a Command Line Interface (CLI) with poetry.
poetry shell
transform
Specifying a non default descriptor file.
transform -d /path/to/dataset-descriptor.yaml
To use the package, you can import the DemKGTransformer
class and create an instance with the path to the descriptor file:
from demkgtransformer import DemKGTransformer
transformer = DemKGTransformer()
transformer.transform()
transformer.save()
The transform
method reads the descriptor file and applies the transformations specified in the descriptor file. The save
method writes the KGX graph into the specified output file destination.
This project is licensed under the MIT License - see the LICENSE file for details.