Skip to content

ECUST-NLP-Lab/medicalHypernymy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

An Attention-based Bi-GRU-CapsNet Model for Hypernymy Detection between Compound Entities

This repository contains the experiments done in the work An Attention-based Bi-GRU-CapsNet Model for Hypernymy Detection between Compound Entities by Qi Wang, Chenming Xu, Tong Ruan, Yangming Zhou, Daqi Gao and Ping He.

Named entities which composed of multiple continuous words frequently occur in knowledge graphs for biomedical sciences. These entities are usually composable and extensible. Typical examples are names of symptoms and diseases. To distinguish these entities from general entities, we name them compound entities.

Hypernymy detection is useful for natural language processing (NLP) tasks such as taxonomy creation, ontology extension, textual entailment recognition, sentence similarity estimation and text generation. However, existing methods for hypernymy detection deal with the case where an entity only includes a word. In this work, we present a novel attention-based Bi-GRU-CapsNet model to detect hypernymy relationship between compound entities.

Our model integrates several important components. English words or Chinese characters in compound entities are fed into Bidirectional Recurrent Units (Bi-GRUs) to avoid the Out-Of-Vocabulary (OOV) problem. An attention mechanism is then designed to focus on the differences between two compound entities. Since there are different cases in hypernymy relationship between compound entities, Capsule Network (CapsNet) is finally employed to decide whether the hypernymy relationship exists or not. Experimental results demonstrate the advantages of the proposed model over the state-of-the-art methods both on English and Chinese corpora of symptom and disease pairs.

This repository provides two corpora which contain hypernymy pairs of symptoms and diseases in English and Chinese. We will release the source code of this work after publication.

Datasets

This repository provides two corpora which contain hypernymy pairs of clinical findings in English and Chinese. The corpora also contain negative instances, and have been splited into training sets, test sets and validation sets.

Dataset Positive Negative All
English Train 27,872 27,872 55,744
Test 9,954 9,954 19,908
Val 1,991 1,991 3,982
Chinese Train 8,960 8,960 17,920
Test 3,200 3,200 6,400
Val 640 640 1,280

English Corpus

We extract terms which are labeled as "clinical finding" in SNOMED CT, and their children to construct positive hypernymy instances. We also take hyponymy and unrelated pairs as negative instances.

Chinese Corpus

We select six Chinese healthcare websites, and extract hypernymy and synonymy relations between symptoms from semi-structured and unstructured data on the detail pages. We set hypernymy symptom pairs as positive instances, hyponymy, synonymy and unrelated symptom pairs as negative instances.

Six Chinese Healthcare Websites

Website URL
XYWY http://www.xywy.com/
120ask http://www.120ask.com/
39Health
http://www.39.net/
99Health http://www.99.com.cn/
Familydoctor http://www.familydoctor.com.cn/
Fh21 http://www.fh21.com.cn/

Source Code

We put our source code in example folder with example training set, validation set and test set.

About

An Attention-based Bi-GRU-CapsNet Model for Hypernymy Detection between Compound Entities

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages