GitHub - declare-lab/HyperRED: This repository implements our EMNLP 2022 research paper A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach.

A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach

This repository implements our EMNLP 2022 research paper.

HyperRED is a dataset for the new task of hyper-relational extraction, which extracts relation triplets together with qualifier information such as time, quantity or location. For example, the relation triplet (Leonard Parker, Educated At, Harvard University) can be factually enriched by including the qualifier (End Time, 1967). HyperRED contains 44k sentences with 62 relation types and 44 qualifier types. Inspired by table-filling approaches for relation extraction, we propose CubeRE, a cube-filling model which explicitly considers the interaction between relation triplets and qualifiers.

Setup

Install Python Environment

conda create -n cube python=3.7 -y
conda activate cube
pip install torch==1.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt

Download HyperRED Dataset (Available on Huggingface Datasets)

python data_process.py download_data data/hyperred/
python data_process.py process_many data/hyperred/ data/processed/

Data Exploration

from data_process import Data

path = "data/hyperred/train.json"
data = Data.load(path)

for s in data.sents[:3]:
    print()
    print(s.tokens)
    for r in s.relations:
        print(r.head, r.label, r.tail)
        for q in r.qualifiers:
            print(q.label, q.span)

Data Fields

tokens: Sentence text tokens.
entities: List of each entity span. The span indices correspond to each token in the space-separated text ( inclusive-start and exclusive-end index)
relations: List of each relationship label between the head and tail entity spans. Each relation contains a list of qualifiers where each qualifier has the value entity span and qualifier label.

Data Example

An example instance of the dataset is shown below:

{              
  "tokens": ['Acadia', 'University', 'is', 'a', 'predominantly', 'undergraduate', 'university', 'located', 'in', 'Wolfville', ',', 'Nova', 'Scotia', ',', 'Canada', 'with', 'some', 'graduate', 'programs', 'at', 'the', 'master', "'", 's', 'level', 'and', 'one', 'at', 'the', 'doctoral', 'level', '.'],
  "entities": [
    {'span': (0, 2), 'label': 'Entity'},
    {'span': (9, 13), 'label': 'Entity'},
    {'span': (14, 15), 'label': 'Entity'},
  ],
  "relations": [
    {
      "head": [0, 2],
      "tail": [9, 13],
      "label": "headquarters location",
      "qualifiers": [
        {"span": [14, 15], "label": "country"}
      ]
    }
  ], 
}

Model Training

python training.py \
--save_dir ckpt/cube_prune_20_seed_0 \
--seed 0 \
--data_dir data/processed \
--prune_topk 20 \
--config_file config.yml

Model Prediction

You can download and extract the pre-trained weights here

from prediction import run_predict

texts = [
    "Leonard Parker received his PhD from Harvard University in 1967 .",
    "Szewczyk played 37 times for Poland, scoring 3 goals .",
]
preds = run_predict(texts, path_checkpoint="cube_model")

Evaluation Scoring

from prediction import run_predict, score_preds

path_gold = "data/hyperred/test.json"
path_pred = "preds.json"

data = Data.load(path_gold)
texts = [s.text for s in data.sents]
preds = run_predict(texts, path_checkpoint="cube_model")
preds.save(path_pred)
score_preds(path_pred, path_gold)

Research Citation

If the code is useful for your research project, we appreciate if you cite the following paper:

@inproceedings{chia-etal-2022-hyperred,
    title = "A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach",
    author = "Chia, Yew Ken and Bing, Lidong and Aljunied, Sharifah Mahani and Si, Luo and Poria, Soujanya",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    year = "2022",
    url = "https://arxiv.org/abs/2211.10018",
}

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
.gitignore		.gitignore
README.md		README.md
analysis.py		analysis.py
config.yml		config.yml
configuration.py		configuration.py
data_process.py		data_process.py
data_reader.py		data_reader.py
demo.ipynb		demo.ipynb
embedders.py		embedders.py
modeling.py		modeling.py
nn_utils.py		nn_utils.py
prediction.py		prediction.py
requirements.txt		requirements.txt
scoring.py		scoring.py
training.py		training.py
vocabulary.py		vocabulary.py

declare-lab/HyperRED

Folders and files

Latest commit

History

Repository files navigation

A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach

Setup

Data Exploration

Data Fields

Data Example

Model Training

Model Prediction

Evaluation Scoring

Research Citation

About

Resources

Stars

Watchers

Forks

Languages