Denoised Neighborhood Aggregation (DNA)

Data and code for paper titled DNA: Denoised Neighborhood Aggregation for Fine-grained Category Discovery (EMNLP 2023 Main conference paper)

Fine-grained Category Discovery aims to discover novel fine-grained categories automatically from the coarse-grained labeled data, which can bridge the gap between the demand for fine-grained analysis and the high annotation cost.

Data

We performed experiments on three public datasets: clinc, wos and hwu64, which have been included in our repository in the data folder './data'.

Model

Our model mainly contains three steps. First, we maintain a dynamic queue to retrieve neighbors for queries based on their semantic similarities. Second, we propose three principles to filter out false-positive neighbors for better representation learning. Third, we perform neighborhood aggregation to learn compact embeddings for fine-grained clusters.

Requirements

python==3.8
pytorch==1.11.0
transformers==4.15.0
scipy==1.9.3
numpy==1.23.5
scikit-learn==1.2.0
faiss-gpu==1.7.2

Running

Training and testing our model through the bash scripts:

sh run.sh

You can also add or change parameters in run.sh. (More parameters are listed in init_parameter.py)

Results

It should be noted that the experimental results may be slightly different because of the randomness of clustering when testing.

Thanks

Some code references the following repositories:

Citation

If our paper or code is helpful to you, please consider citing our paper:

@inproceedings{an2023dna,
  title={DNA: Denoised Neighborhood Aggregation for Fine-grained Category Discovery},
  author={An, Wenbin and Tian, Feng and Shi, Wenkai and Chen, Yan and Zheng, Qinghua and Wang, Qianying and Chen, Ping},
  booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
  pages={12292--12302},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
figures		figures
utils		utils
README.md		README.md
data.py		data.py
dna.py		dna.py
init_parameter.py		init_parameter.py
model.py		model.py
pretrain.py		pretrain.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Denoised Neighborhood Aggregation (DNA)

Contents

Data

Model

Requirements

Running

Results

Thanks

Citation

About

Releases

Packages

Languages

Lackel/DNA

Folders and files

Latest commit

History

Repository files navigation

Denoised Neighborhood Aggregation (DNA)

Contents

Data

Model

Requirements

Running

Results

Thanks

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages