Hypernym-LIBre

A free Web-based Corpus for Hypernym Detection

This repository holds the code and other relevant details for the paper submitted to WAC which is held along with LREC 2020.

Authors : Rawat, S., Rico, M., Corcho O.

Description

Detecting hierarchial relationships between terms in text is a key area of research in the field of knowledge representation and NLP. In our paper, describe a new web-based corpus for extracting hypernyms and compare it with a corpus used for the current state-of-the-art but is not freely available.
The corpus is a combination of the UMBC Corpus and Wikipedia. We apply our own pre-processing and post-processing methodologies for extracting hypernym-hyponym pairs from the text using Hearst Patterns (Marti Hearst, 1992).

Location

The corpus is provided in two formats. One is the raw text format and the other is its part-of-speech tagged and dependency parsed version.

Raw Text Format of Hypernym-LIBre:

Part-of-Speech tagged and Dependency annotated version of Hypernym-LIBre:

Hypernym pairs extracted from Hypernym-LIBre using Hearst patterns:

Resource Description

Raw Text format of Hypernym LIBre:
Number of Files: 288 files of ~110MB each Size: 11.3GB compressed, 32GB uncompressed
PoS-tagged and Dependency annotated format of Hypernym-LIBre: Number of Files: 442 files of ~180MB each Size: 15GB compressed, 80.6GB uncompressed

File Description

multiprocess_script.py : Extract Hearst patterns from multiple chunks of Hypernym-LIBre using multiprocessing
hearst_counts_alternate.py: Create a dictionary of extractions using Hearst Patterns and get the frequencies
raw_count_model.py: Create a compressed sparse row matrix from the above file and calculate the raw probability model
PPMI_model.py: Create a positive pointwise mutual information matrix from the raw count matrix
SVD_raw_count_model_ppmi.py: Apply matrix factorization using SVD on both the raw count matrix and the PPMI matrix to get low rank embeddings. This helps in creating similar representations for similar words.

Wikipedia extractor used to convert wikidump into text file is here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPMI_model.py

PPMI_model.py

README.md

README.md

SVD_raw_count_model_ppmi.py

SVD_raw_count_model_ppmi.py

hearst_counts_alternate.py

hearst_counts_alternate.py

multiprocess_script.py

multiprocess_script.py

raw_count_model.py

raw_count_model.py

Repository files navigation

Hypernym-LIBre

Authors : Rawat, S., Rico, M., Corcho O.

Description

Location

Resource Description

File Description

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
PPMI_model.py		PPMI_model.py
README.md		README.md
SVD_raw_count_model_ppmi.py		SVD_raw_count_model_ppmi.py
hearst_counts_alternate.py		hearst_counts_alternate.py
multiprocess_script.py		multiprocess_script.py
raw_count_model.py		raw_count_model.py

abyssnlp/Hypernym-LIBre

Folders and files

Latest commit

History

Repository files navigation

Hypernym-LIBre

Authors : Rawat, S., Rico, M., Corcho O.

Description

Location

Resource Description

File Description

About

Resources

Stars

Watchers

Forks

Languages