Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

BioRelEx: Biological Relation Extraction Benchmark

BioRelEx is a dataset of 2000+ sentences from biological journals with complete annotations of proteins, genes, chemicals and other entities along with binding interactions between them.

A paper describing the dataset is accepted at ACL BioNLP Workshop 2019.

We invite everyone to submit their relation extraction systems to our Codalab competition.

Dataset format

Training and development sets are provided as JSON files. Each version of the dataset is one release of this repository.

Each JSON file is a list of objects, one per sentence. More details will be added soon.


We propose two main metrics for evaluation, one for entity recognition and another one for relation extraction. We provide a script for the main evaluation metrics and several additional metrics designed for error analysis.

The test set is not released. Please submit your solution in this Codalab competition.


The paper describes two non-trivial baselines. One is an existing rule-based system called REACH, and the other one is based on a neural multitask architecture called SciIE. The baselines are implemented in another repository.


If you use the dataset, please cite:

    title = "{B}io{R}el{E}x 1.0: Biological Relation Extraction Benchmark",
    author = "Khachatrian, Hrant  and Nersisyan, Lilit  and Hambardzumyan, Karen  and Galstyan, Tigran  and Hakobyan, Anna  and Arakelyan, Arsen  and Rzhetsky, Andrey  and Galstyan, Aram",
    booktitle = "Proceedings of the 18th BioNLP Workshop and Shared Task",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "",
    pages = "176--190"


🧬 BioRelEx: Biological Relation Extraction Benchmark @ ACL BioNLP Workshop 2019



No packages published