The implementation of method in our paper: Extracting Protein-Protein Interactions Affected by Mutations via Auxiliary Task and Domain Pre-trained Model
requirements are listed in requirements.txt
Original dataset and evaluation scripts can be downloaded here,
And some of annotation of genes are modified as shown in the end of the document
The modified datasets and the results obtained by GNP are available here.
BioBERT-Base v1.1 (+ PubMed 1M) is available here
Original BERT is also known as bert-base-uncased.
The gene normalization module(uninvolved to our paper) referes to the method proposed by Tung Tran in the paper titled Exploring a Deep Learning Pipeline for the BioCreative VI Precision Medicine Task.
This method requires the use of eutils pacakage and this package will automatically throttle requests according to NCBI guidelines (3 or 10 requests/second without or with an API key, respectively).
Your private API key and email are supposed to be added to ./.env
if you need higher throughoutput.
- RC-Only with original BERT:
bash scripts/train_RC-BaseBERT.sh
- RC-Only:
bash scripts/train_RC.sh
- RC+NER:
bash scripts/train_RC+NER.sh
- RC+Triage:
bash scripts/train_RC+Triage.sh
- RC-Only:
bash scripts/test_RC.sh
- RC+NER:
bash scripts/test_RC+NER.sh
- RC+Triage:
bash scripts/test_RC+Triage.sh
The result of RC with confidence will be saved as ./outputjson/{loggercomment}_{epoch}.json
.
-
analysis.ipynb
Predicted relations need to be post-processed here before homolo eval and exact eval.
Scripts about case study can be found here.
-
head_view_bert.ipynb
: Visulalization of Attention in BERT via BertViz -
python cross_fold_metrics.py > metrics.tsv
print results of the cross fold validation tometrics.tsv
PMID
: 9488135
{
"text": "receptor R2",
"infons": {
"type": "Gene",
"NCBI GENE": "7133"
},
"locations": [
{
"length": 11,
"offset": 339 -> 328
}
]
}
PMID
: 21751375
{
"text": "CBP/b", -> CBP
"infons": {
"type": "Gene",
"NCBI GENE": "1387"
},
"locations": [
{
"length": 5, -> 3
"offset": 340
}
]
}
PMID
22014570
{
"text": "FoxM1-b", -> FoxM1
"infons": {
"type": "Gene",
"NCBI GENE": "2305"
},
"locations": [
{
"length": 7, -> 5
"offset": 694
}
]
}
{
"text": "FoxM1-b", ->FoxM1
"infons": {
"type": "Gene",
"NCBI GENE": "2305"
},
"locations": [
{
"length": 7, -> 5
"offset": 801
}
]
}