Heterogeneous biomedical entity representation learning for gene-disease association prediction

The FusionGDA model utilises a fusion module to enrich the gene and disease semantic representations encoded by pre-trained language models for GDA prediction.

Framework and Fusion Module

Installation

# Download the latest Anaconda installer
wget https://repo.anaconda.com/archive/Anaconda3-latest-Linux-x86_64.sh

# Install Anaconda
bash Anaconda3-latest-Linux-x86_64.sh -b

# Clean up the installer to save space
rm Anaconda3-latest-Linux-x86_64.sh

# Set path to conda
ENV PATH /root/anaconda3/bin:$PATH

# Updating Anaconda packages
conda update --all

# Install the latest version of PyTorch and related libraries with CUDA support
# Note: Replace 'cudatoolkit=x.x' with the version compatible with your CUDA version
RUN conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

# Install other Python packages
pip install pytdc
pip install wandb
pip install lightgbm
pip install -U adapter-transformers
pip install pytorch-metric-learning

Executing program

Make sure you are in the directory ~/dpa_pretrain/scripts You adjust the required parameters directly.

Pre-training phase

bash run_pretrain_gda_ml_adapter_infoNCE.sh

Fine-tuning phase

TDC Dataset

bash run_finetune_gda_lightgbm_infoNCE_tdc.sh

DisGeNET Dataset

bash run_finetune_gda_lightgbm_infoNCE.sh

Check your results in the wandb account.

Datasets

We store all required datasets in the Google Drive. Here

Citation

Please cite our paper if you find our work useful in your own research.

@article{meng2024heterogeneous,
  title={Heterogeneous biomedical entity representation learning for gene-disease association prediction},
  author={Meng, Zhaohan and Liu, Siwei and Liang, Shangsong and Jani, Bhautesh and Meng, Zaiqiao},
  journal={Briefings in Bioinformatics},
  volume={25},
  number={5},
  pages={bbae380},
  year={2024},
  publisher={Oxford University Press}
}

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
Figure		Figure
data		data
save_model_ckp/pretrain		save_model_ckp/pretrain
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
manual.md		manual.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heterogeneous biomedical entity representation learning for gene-disease association prediction

Framework and Fusion Module

Installation

Executing program

Pre-training phase

Fine-tuning phase

Datasets

Citation

About

Releases

Packages

Languages

ZhaohanM/FusionGDA

Folders and files

Latest commit

History

Repository files navigation

Heterogeneous biomedical entity representation learning for gene-disease association prediction

Framework and Fusion Module

Installation

Executing program

Pre-training phase

Fine-tuning phase

Datasets

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages