Skip to content

This repo contains a Machine Learning-based methodology for the preliminary design of a risk calculator using medical tabular databases, combining the knowledge of different clinically validated cardiovascular risk calculators using Transfer Learning (TL).

Notifications You must be signed in to change notification settings

antorguez95/CVD_risk_and_TL

Repository files navigation

Using Transfer Learning and Machine Learning to design a Cardiovascular Diseases risk calculator

What's in this repository?

This repository contains the code of our work presented on the 26th Euromicro Conference Series on Digital System Design (DSD) in Durres, Albania, in September of 2023: "Novel Approach for AI-based Risk Calculator Development using Transfer Learning Suitable for Embedded Systems". This works presents a methodology for the preliminary design of a risk calculator using medical tabular databases based on Machine Learning (ML), combining the knowledge of different clinically validated cardiovascular risk calculators using Transfer Learning (TL). This aims a more personalized NCD risk estimation than the current regression-based approaches. This work is enclosed in the WARIFA European Project, whose main ojective is to develop an AI-based application aiming chronic conditions prevention and management, such as Diabetes Mellitus or Cardiovascular Diseases (CVD), by providing personalized recommendations depending on the subject and the variables that are collected from him/her. Besides, a preliminary basic high-level performance profiling has been also done to estimate the feasibility of implementing this ML-based calculator in a micro-controller.

The content of the scripts are described below:

  • Framingham_utils.py and Steno_utils.py: data curation and preparation of the datasets.
  • exploratory_data_analysis.py: exploratory data analysis.
  • model_evaluation.py: model evaluation functions for the selected ML models.
  • train_utils.py: functions to train the models.
  • profiling.py: profiling functions extracted from this example
  • constants.py: file with the name of the directories, file names, dataset names, and numerical and categorical features. MUST BE CHANGED WITH YOUR OWN PATHS, FILES, ETC!!!
  • steno2fram.ipynband fram2steno.ipynbare the Python Notebooks that contain the framework itself. The former taking Steno database as reference, and the latter taking Framingham dataset.

Please cite our paper if this framework somehow helped you in your research and/or development work, or if you used this piece of code:

A. J. Rodríguez-Almeida, H. Fabelo, C. Soguero-Ruiz, R. M. Sanchez-Hernandez, A. M. Wägner and G. M. Callico, "Novel Approach for AI-Based Risk Calculator Development Using Transfer Learning Suitable for Embedded Systems," 2023 26th Euromicro Conference on Digital System Design (DSD), Golem, Albania, 2023, pp. 103-110, doi: 10.1109/DSD60849.2023.00024.

Datasets Availability

Both datasets are avilable under request to their authors (see [5] and [6] references in the paper to check Steno and Framingham availability, respectively).

Requirements to run this code

This code was developed with Python 3.8.13, with ipykernel installed to run the framework using Jupyter Notebooks, so this feature must be supported by your software development tool.

How do I run these scripts?

After changing the paths, filenames, etc. from constants.py to the corresponding ones of your paths, you just have to run one of the .ipynb files in the development environment you use.

Generated results

The execution of each .ipynbfile generates the EDA and results folders. Mainly, in the EDA folder, the histograms of the different continous variables of the datasets are stored, to visually demonstrate the heterogeneity of both datasets. In the results folder, an Excel file containing the results pre-TL and post-TL are placed, including also the classification confusion matrices. Please, refer to our paper for a more detailed analysis of the obtained results.

Learn more

For any other questions related with the code or the proposed framework itself, you can post an issue on this repository or contact me via email.

About

This repo contains a Machine Learning-based methodology for the preliminary design of a risk calculator using medical tabular databases, combining the knowledge of different clinically validated cardiovascular risk calculators using Transfer Learning (TL).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published