Skip to content
/ DNA Public

EMNLP 2023 long paper "DNA: Denoised Neighborhood Aggregation for Fine-grained Category Discovery"

Notifications You must be signed in to change notification settings

Lackel/DNA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Denoised Neighborhood Aggregation (DNA)

Data and code for paper titled DNA: Denoised Neighborhood Aggregation for Fine-grained Category Discovery (EMNLP 2023 Main conference paper)

Fine-grained Category Discovery aims to discover novel fine-grained categories automatically from the coarse-grained labeled data, which can bridge the gap between the demand for fine-grained analysis and the high annotation cost.

Contents

1. Data

2. Model

3. Requirements

4. Running

5. Results

6. Thanks

6. Citation

Data

We performed experiments on three public datasets: clinc, wos and hwu64, which have been included in our repository in the data folder './data'.

Model

Our model mainly contains three steps. First, we maintain a dynamic queue to retrieve neighbors for queries based on their semantic similarities. Second, we propose three principles to filter out false-positive neighbors for better representation learning. Third, we perform neighborhood aggregation to learn compact embeddings for fine-grained clusters.

Requirements

  • python==3.8
  • pytorch==1.11.0
  • transformers==4.15.0
  • scipy==1.9.3
  • numpy==1.23.5
  • scikit-learn==1.2.0
  • faiss-gpu==1.7.2

Running

Training and testing our model through the bash scripts:

sh run.sh

You can also add or change parameters in run.sh. (More parameters are listed in init_parameter.py)

Results

It should be noted that the experimental results may be slightly different because of the randomness of clustering when testing.

Thanks

Some code references the following repositories:

Citation

If our paper or code is helpful to you, please consider citing our paper:

@inproceedings{an2023dna,
  title={DNA: Denoised Neighborhood Aggregation for Fine-grained Category Discovery},
  author={An, Wenbin and Tian, Feng and Shi, Wenkai and Chen, Yan and Zheng, Qinghua and Wang, Qianying and Chen, Ping},
  booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
  pages={12292--12302},
  year={2023}
}

About

EMNLP 2023 long paper "DNA: Denoised Neighborhood Aggregation for Fine-grained Category Discovery"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published