Skip to content

Code for the Paper "Multimodal Graph-based Transformer Framework for Biomedical Relation Extraction"

Notifications You must be signed in to change notification settings

SriramPingali/MagBERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MagBERT

This is the official implementation of MagBERT (Model Agnostic Graph based BERT for Biomedical Relation Extraction) model described in Multimodal Graph-based Transformer Framework for Biomedical Relation Extraction.

Introduction

The recent advancement of pre-trained Transformer models has propelled the development of effective text mining models across various biomedical tasks. In this study, we introduced a novel framework that enables the model to learn multi-omnics biological information about entities (proteins) with the help of additional multi-modal cues like molecular structure.

MagBERT

Datasets

We use the HPRD Dataset and Bioinfer Datasets. Their corresponding text samples are already included in this repository (In Molecular Structure Folder). Additionally, you can also download the pdb files of protein entities from HPRD dataset from here and for Bio-Infer Dataset from here.

Dependencies

  • Pytorch
  • Sklearn
  • transformers
  • Networkx
  • nltk
  • pypdb
  • Bio

Train Graph-BERT on Bio-Infer / HPRD

In order to replicate the results mentioned in paper, please follow the following steps.

  1. Download the zip folder and place the two files at ./Graph-Bert/data/ppi.
  2. Do the same for this zip folder and place the two files at ./Graph-Bert/data/biomed.
  3. Next use these following commands:
    cd Graph-Bert
    python script_1_preprocess.py
    python script_2_pre_train.py
    python script_3_fine_tuning.py

Details of these python scripts present in Graph-Bert folder

Train on Custom Dataset

The above scripts enable training on the HPRD and Bio-Infer Datasets (Make sure to choose your dataset by un-commenting "ppi" for HPRD or "biomed" for Bio-Infer in the three python scripts). Although, you can train Graph-BERT on any dataset provided you have all the PDB files extracted. Demo notebooks to extract the protein instances from text, download corresponding pdb files, and extracting molecular structure data from pdb files are present in the Molecular Structure Folder.

References

@misc{pingali2021multimodal,
      title={Multimodal Graph-based Transformer Framework for Biomedical Relation Extraction}, 
      author={Sriram Pingali and Shweta Yadav and Pratik Dutta and Sriparna Saha},
      year={2021},
      eprint={2107.00596},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgements

Graph-BERT code for this project is based on

About

Code for the Paper "Multimodal Graph-based Transformer Framework for Biomedical Relation Extraction"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published