Data and code for paper titled DNA: Denoised Neighborhood Aggregation for Fine-grained Category Discovery (EMNLP 2023 Main conference paper)
Fine-grained Category Discovery aims to discover novel fine-grained categories automatically from the coarse-grained labeled data, which can bridge the gap between the demand for fine-grained analysis and the high annotation cost.
We performed experiments on three public datasets: clinc, wos and hwu64, which have been included in our repository in the data folder './data'.
Our model mainly contains three steps. First, we maintain a dynamic queue to retrieve neighbors for queries based on their semantic similarities. Second, we propose three principles to filter out false-positive neighbors for better representation learning. Third, we perform neighborhood aggregation to learn compact embeddings for fine-grained clusters.
- python==3.8
- pytorch==1.11.0
- transformers==4.15.0
- scipy==1.9.3
- numpy==1.23.5
- scikit-learn==1.2.0
- faiss-gpu==1.7.2
Training and testing our model through the bash scripts:
sh run.sh
You can also add or change parameters in run.sh. (More parameters are listed in init_parameter.py)
It should be noted that the experimental results may be slightly different because of the randomness of clustering when testing.Some code references the following repositories:
If our paper or code is helpful to you, please consider citing our paper:
@inproceedings{an2023dna,
title={DNA: Denoised Neighborhood Aggregation for Fine-grained Category Discovery},
author={An, Wenbin and Tian, Feng and Shi, Wenkai and Chen, Yan and Zheng, Qinghua and Wang, Qianying and Chen, Ping},
booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
pages={12292--12302},
year={2023}
}