This is the official implementation of MagBERT (Model Agnostic Graph based BERT for Biomedical Relation Extraction) model described in Multimodal Graph-based Transformer Framework for Biomedical Relation Extraction.
The recent advancement of pre-trained Transformer models has propelled the development of effective text mining models across various biomedical tasks. In this study, we introduced a novel framework that enables the model to learn multi-omnics biological information about entities (proteins) with the help of additional multi-modal cues like molecular structure.
We use the HPRD Dataset and Bioinfer Datasets. Their corresponding text samples are already included in this repository (In Molecular Structure Folder). Additionally, you can also download the pdb files of protein entities from HPRD dataset from here and for Bio-Infer Dataset from here.
- Pytorch
- Sklearn
- transformers
- Networkx
- nltk
- pypdb
- Bio
In order to replicate the results mentioned in paper, please follow the following steps.
- Download the zip folder and place the two files at ./Graph-Bert/data/ppi.
- Do the same for this zip folder and place the two files at ./Graph-Bert/data/biomed.
- Next use these following commands:
cd Graph-Bert
python script_1_preprocess.py
python script_2_pre_train.py
python script_3_fine_tuning.py
Details of these python scripts present in Graph-Bert folder
The above scripts enable training on the HPRD and Bio-Infer Datasets (Make sure to choose your dataset by un-commenting "ppi" for HPRD or "biomed" for Bio-Infer in the three python scripts). Although, you can train Graph-BERT on any dataset provided you have all the PDB files extracted. Demo notebooks to extract the protein instances from text, download corresponding pdb files, and extracting molecular structure data from pdb files are present in the Molecular Structure Folder.
@misc{pingali2021multimodal,
title={Multimodal Graph-based Transformer Framework for Biomedical Relation Extraction},
author={Sriram Pingali and Shweta Yadav and Pratik Dutta and Sriparna Saha},
year={2021},
eprint={2107.00596},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Graph-BERT code for this project is based on