The FusionGDA model utilises a fusion module to enrich the gene and disease semantic representations encoded by pre-trained language models for GDA prediction.
# Download the latest Anaconda installer
wget https://repo.anaconda.com/archive/Anaconda3-latest-Linux-x86_64.sh
# Install Anaconda
bash Anaconda3-latest-Linux-x86_64.sh -b
# Clean up the installer to save space
rm Anaconda3-latest-Linux-x86_64.sh
# Set path to conda
ENV PATH /root/anaconda3/bin:$PATH
# Updating Anaconda packages
conda update --all
# Install the latest version of PyTorch and related libraries with CUDA support
# Note: Replace 'cudatoolkit=x.x' with the version compatible with your CUDA version
RUN conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
# Install other Python packages
pip install pytdc
pip install wandb
pip install lightgbm
pip install -U adapter-transformers
pip install pytorch-metric-learning
Make sure you are in the directory ~/dpa_pretrain/scripts You adjust the required parameters directly.
bash run_pretrain_gda_ml_adapter_infoNCE.sh
TDC Dataset
bash run_finetune_gda_lightgbm_infoNCE_tdc.sh
DisGeNET Dataset
bash run_finetune_gda_lightgbm_infoNCE.sh
Check your results in the wandb account.
We store all required datasets in the Google Drive. Here
Please cite our paper if you find our work useful in your own research.
@article{meng2024heterogeneous,
title={Heterogeneous biomedical entity representation learning for gene-disease association prediction},
author={Meng, Zhaohan and Liu, Siwei and Liang, Shangsong and Jani, Bhautesh and Meng, Zaiqiao},
journal={Briefings in Bioinformatics},
volume={25},
number={5},
pages={bbae380},
year={2024},
publisher={Oxford University Press}
}