Skip to content

demkg-framework/kg-transform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KG-Transform

This is the transformation module of the DemKG framework. Implemented as a python package, it provides the functionalities needed to transform your tabular research data into a Biolink-compliant graph in KGX format, under the design patterns and knowledge framework defined in DemKG.

KG-Transform is a low-code solution that interprets a daset descriptor schema that guides the transformer to access the data values from your dataset and generate the KG entities and edges using the implemented mapping rules under the design patterns. The descriptor provides information about the input data file, the columns containing the data values, and the mapping rules.

The supported research data artifacts are:

  • Subject data,
  • Medical history
  • Physical examination
  • Cognitive screening
  • Diagnosis
  • Specimen assays (ELISA, proteomics, etc.)
  • Imaging analyses (Freesurfer, ASHS, etc.)

Each section has an associated entity ID, and date. If your dataset does not provide one, it will be autogenerated. The main aim of the transformer is to map data values from your dataset to concrete ontological concepts describing phenotypes and/or diseases. For this, the approach consists on describing possible values and target entities through value_mappings subsections. These mappings may be based on direct, categorical values, or numeric based on defined cutoffs or ranges.

In order to find good ontology terms to map with your data, it is important to note that the DemKG framework is primarily based on the OBO ontologies and, principles and extensions. The following ontologies are a good starting point to find terms to map your data:

  • Human Phenotype Ontology (HP)
  • Monarch Disease Ontology (MONDO)
  • Phenotypic Quality Ontology (PATO)
  • Uber Anatomy Ontology (UBERON)
  • Foundational Model of Anatomy (FMA)
  • Protein Ontology (PR)
  • Gene Ontology (GO)
  • Neuropsychological Test Ontology (NPT)
  • Ontology for Biomedical Investigations (OBI)
  • Mass Spectrometry Ontology (MS)
  • Unit Ontology (UO)

To search for ontology terms, you can use the following tools:

Descriptor File

The descriptor file is a YAML file that specifies how to transform the data from your dataset file into a Knowledge Graph representation. The file consists of a set of rules that define how to extract and transform the data. See the descriptor-schema.yaml for details.

Installation

To install the package, you can use poetry:

poetry install

This will install the package and its dependencies. If you don't have poetry installed, you can follow the instructions on the official website to install it.

Alternatively, you can run with the poetry shell and exec commands:

poetry shell

Usage

KG-Transform can be used with Command Line Interface (CLI) or as a module in your code. In all cases, you have to provide the YAML descriptor file. The default is descriptor.yaml in your CWD.

CLI

You can run a transformation directly as a Command Line Interface (CLI) with poetry.

poetry shell
transform

Specifying a non default descriptor file.

transform -d /path/to/dataset-descriptor.yaml

As a module

To use the package, you can import the DemKGTransformer class and create an instance with the path to the descriptor file:

from demkgtransformer import DemKGTransformer

transformer = DemKGTransformer()
transformer.transform()
transformer.save()

The transform method reads the descriptor file and applies the transformations specified in the descriptor file. The save method writes the KGX graph into the specified output file destination.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

The transform module for research data in DemKG

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages