Skip to content

FUSE: Multi-Faceted Set Expansion by Coherent Clustering of Skip-grams (ECML-PKDD 2020)

License

Notifications You must be signed in to change notification settings

WanzhengZhu/FUSE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python 3 License: MIT

FUSE: Multi-Faceted Set Expansion by Coherent Clustering of Skip-grams

Table of Contents

Introduction

This project aims at Multi-faceted Set Expansion. Please find the paper here (ECML-PKDD 2020).

What is set expansion?

Set expansion aims to expand a small set of seed entities into a complete set of relevant entities. For example, to explore all Universities in the U.S., one can feed a seed set (e.g., {"Stanford", "UCB", "Harvard"}) to a set expansion system and then expect outputs such as "Princeton", "MIT", "UW" and "UIUC".

Why multi-faceted set expansion?

In cases where the seed set is ambiguous (or some seeds are ambiguous), our algorithm FUSE is capable of returning multiple sets of entities, one for each semantic facet.

Model Overview

Our model consists of three modules:

  1. facet discovery: it identifies all semantic facets of each seed entity by extracting and clustering its skip-grams;
  2. facet fusion: it discovers shared semantic facets of the entire seed set by an optimization formulation;
  3. entity expansion: it expands each semantic facet by utilizing a masked language model (MLM) with pre-trained BERT models.

Requirements

The code is based on Python 3. Please install the dependencies as below:

pip install -r requirements.txt

Data

Please request the dataset from the authors of EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion (WSDM 2016) for reproducing our results.

We provide a sample dataset data/dataset_sample.txt for readers to play with the code. However, it does not represent any reliable results.

To run a sample code, please 1) download the pre-trained GloVe embedding and put it under data/ and 2) download the pre-trained BERT model and put it under data/ (You may want to fine-tune the BERT model depending on your own task).

Code:

python ./Main.py -p 20

Argument:

  • p - preference in Affinity Propagation. We note that preference is sensitive to the dimensions of the word embeddings and the dataset. Please tune the preference parameter accordingly.

Citation

@inproceedings{zhu2020fuse,
    title = {FUSE: Multi-Faceted Set Expansion by Coherent Clustering of Skip-grams},
    author = {Zhu, Wanzheng and Gong, Hongyu and Shen, Jiaming and Zhang, Chao and Shang, Jingbo and Bhat, Suma and Han, Jiawei},
    booktitle = {The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},
    year = {2020}
}

About

FUSE: Multi-Faceted Set Expansion by Coherent Clustering of Skip-grams (ECML-PKDD 2020)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages