Python tool with Graphical User Interface to map datasets to a specific Common Data Elements (CDEs) metadata schema of a federation of the Medical Informatics Platform (MIP). It is developed to support members of a MIP Federation in the task of mapping their dataset to the CDEs schema of this federation. This project is distributed under the Apache 2.0 open-source license (See LICENSE for more details).
- Create your installation directory, go to this directory, and create a new virtual Python 3.9 environment:
$ mkdir -p "/installation/directory"
$ cd "/prefered/directory"
$ virtualenv venv -p python3.9
- Activate the environment and install the package, at a specific version, directly from GitHub with Pip:
$ source ./venv/bin/activate
(venv)$ pip install -r https://raw.githubusercontent.com/HBPMedical/mip-dmp/main/requirements.txt
(venv)$ pip install git+https://github.com/HBPMedical/mip-dmp.git@0.0.5
- Clone the Git repository in your prefered directory:
$ cd "/prefered/directory"
$ git clone git@github.com:HBPMedical/mip-datatools.git
- Go to the cloned repository and create a new virtual Python 3.9 environment:
$ cd mip-datatools
$ virtualenv venv -p python3.9
- Activate the environment and install the package with Pip:
$ source ./venv/bin/activate
(venv)$ pip install -r requirements.txt
(venv)$ pip install -e .
You can use the installed mip_dataset_mapper_ui
script to start the MIP Dataset Mapper UI application.
Usage
In a terminal, you can launch it with the following command:
$ mip_dataset_mapper_ui
This displays the main window of MIP Dataset Mapper UI application that consists of four main component in a grid layout fashion, as shown in the screeshot below.
The task of mapping the dataset consists of the following tasks:
- Load a input CSV dataset in
.csv
format (top left) - Load a CDEs schema in
.xlxs
format (bottom left) - Edit the columns / CDEs mapping table (top right)
- Configure output directory / filename and create the output CSV dataset mapped to the CDEs schema (bottom right)
You can use the installed mip_dataset_mapper
script to start the command-line interface of the MIP Dataset Mapper.
Usage
usage: mip_dataset_mapper [-h] --source_dataset SOURCE_DATASET --mapping_file MAPPING_FILE --cdes_file CDES_FILE --target_dataset
TARGET_DATASET
Map a source dataset to a target dataset given a mapping file in JSON format generated by the MIP Dataset Mapper UI application
(mip_dataset_mapper_ui).
optional arguments:
-h, --help show this help message and exit
--source_dataset SOURCE_DATASET
Source dataset file in CSV format.
--mapping_file MAPPING_FILE
Source Dataset Columns / Common data elements (CDEs) mapping file in JSON format. The mapping file can be
generated by the MIP Dataset Mapper UI application.
--cdes_file CDES_FILE
Common data elements (CDEs) metadata schema file in EXCEL format.
--target_dataset TARGET_DATASET
Path to the target / output dataset file in CSV format.
If you are using the MIP Dataset Mapper (mip_dmp
) in your work, please acknowledge this software with the following entry:
Tourbier, Sebastien, Schaffhauser, Birgit, & Ryvlin, Philippe. (2023). HBPMedical/mip-dmp: v0.0.7 (0.0.7). Zenodo. https://doi.org/10.5281/zenodo.8056371
This project received funding from the European Union's H2020 Framework Programme for Research and Innovation under the Specific Grant Agreement No. 945539 (Human Brain Project SGA3, as part the Medical Informatics Platform (MIP)).
Thanks goes to these wonderful people (emoji key):
Sébastien Tourbier 🐛 💻 🎨 📖 💡 🤔 🚇 🚧 🧑🏫 👀 |
BSchaffhauser 💵 🔍 |
This project follows the all-contributors specification. Contributions of any kind welcome!