GitHub

Overview

This repository contains the code for the paper "KFU NLP Team at SMM4H 2020 Tasks: Cross-lingual TransferLearning with Pretrained Language Models for Drug Reactions" [1].

Data

This repository is devoted to the second task of the SMM4H 2020 Shared task. The task is the binary classification of tweets that contain a mention of adverse effects of a medication.

Our solution for the classification task consists of two steps:

Pretraining on the multilabel sentence classification task using the union of the RuDReC [2] and PsyTAR [3] corpora.
Fine-tuning on the target binary classification task using the data of the Shared task.

The preprocessing and pretraining code for the multilabel classification is taken from this Colab example which is published in this repository: https://github.com/cimm-kzn/RuDReC

Example

For the example of ADR sentences classification, see the "SMM4H_2020_ADR_classification.ipynb" notebook (also available via Colab).

This example contains both the pretraining and the fine-tuning steps. The example utilizes the EnRuDR-BERT model that is available at: https://github.com/cimm-kzn/RuDReC

Repository structure

The "training" directory contains the following scripts:

Script for the pretraining on the multilabel classification task.
Script for the binary classification task.

Both scripts rely on the Google's BERT implementation.

The "preprocessing" directory contains scripts for:

The merging of RuDReC and PsyTAR sentences into the combined training and validation sets.
The preprocessing of the tweets of the SMM4H binary classification task.
The merging of the Russian and English datasets of tweets.

The "evaluation" directory contains scripts for the evaluation of binary classification results and for the ensembling of multiple predictions.

References

Miftahutdinov Z., Sakhovskiy A., Tutubalina E. KFU NLP Team at SMM4H 2020 Tasks: Cross-lingual Transfer Learning with Pretrained Language Models for Drug Reactions //Proceedings of the Fifth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task. – 2020
https://doi.org/10.1093/bioinformatics/btaa675

 @article{10.1093/bioinformatics/btaa675,
    author = {Tutubalina, Elena and Alimova, Ilseyar and Miftahutdinov, Zulfat and Sakhovskiy, Andrey and Malykh, Valentin and Nikolenko, Sergey},
    title = {The Russian Drug Reaction Corpus and Neural Models for Drug Reactions and Effectiveness Detection in User Reviews},
    journal = {Bioinformatics},
    year = {2020},
    month = {07},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btaa675},
    url = {https://doi.org/10.1093/bioinformatics/btaa675},
    note = {btaa675},
    eprint = {https://academic.oup.com/bioinformatics/article-pdf/doi/10.1093/bioinformatics/btaa675/33539752/btaa675.pdf},
}

Zolnoori, Maryam, et al. "A systematic approach for developing a corpus of patient reported adverse drug events: a case study for SSRI and SNRI medications." Journal of biomedical informatics 90 (2019): 103091.Zolnoori, Maryam, et al. "A systematic approach for developing a corpus of patient reported adverse drug events: a case study for SSRI and SNRI medications." Journal of biomedical informatics 90 (2019): 103091.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data/raw		data/raw
evaluation		evaluation
preprocessing		preprocessing
training		training
.gitignore		.gitignore
README.md		README.md
SMM4H_2020_ADR_classification.ipynb		SMM4H_2020_ADR_classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/raw

data/raw

evaluation

evaluation

preprocessing

preprocessing

training

training

.gitignore

.gitignore

README.md

README.md

SMM4H_2020_ADR_classification.ipynb

SMM4H_2020_ADR_classification.ipynb

Repository files navigation

Overview

Data

Example

Repository structure

References

About

Releases

Packages

Languages

Andoree/smm4h_classification

Folders and files

Latest commit

History

Repository files navigation

Overview

Data

Example

Repository structure

References

About

Resources

Stars

Watchers

Forks

Languages