Skip to content

dongguanting/Demo-NSF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DemoNSF: A Multi-task Demonstration-based Generative Framework for Noisy Slot Filling Task

🎥 Overview

This repository contains the open-sourced official implementation of the paper:

DemoNSF: A Multi-task Demonstration-based Generative Framework for Noisy Slot Filling Task (Findings of EMNLP 2023 (Short Paper)).

If you find this repo helpful, please cite the following paper:

@misc{dong2023demonsf,
      title={DemoNSF: A Multi-task Demonstration-based Generative Framework for Noisy Slot Filling Task}, 
      author={Guanting Dong and Tingfeng Hui and Zhuoma GongQue and Jinxu Zhao and Daichi Guo and Gang Zhao and Keqing He and Weiran Xu},
      year={2023},
      eprint={2310.10169},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Introduction

We propose a Multi-task Demonstration based Generative Framework for Noisy Slot Filling, named DemoNSF. Specifically, we introduce three noisy auxiliary tasks, namely Noisy Recovery (NR), Random Mask (RM), and Hybrid Discrimination (HD), to implicitly capture semantic structural information of input perturbations at different granularities. In the downstream main task, we design a noisy demonstration construction strategy for the generative framework, which explicitly incorporates task-specific information and perturbed distribution during training and inference. Experiments on two benchmarks demonstrate that DemoNSF outperforms all baseline methods and achieves strong generalization. Further analysis provides empirical guidance for the practical application of generative frameworks.

🍯 Overall Framework

image

🎯 Quick Start

Dependencies

conda create -n your_env_name python=3.9
conda activate your_env_name
pip3 install torch==2.0.1
pip3 install transformers==4.33.1
pip3 install sentence-transformers
pip3 install nlpaug
pip3 install datasets

You can copy and paste the above command in your terminal to create the environment.

Datasets

The two benchmarks used in our paper are both under the data path, which are multiWOZ and multi-noise, respectively.

The process.py shows some data processing operations for the multi-noise dataset, including data augmentation, building demonstrations, converting to NER data, converting to mask data, converting to multi-classification data, and so on.

Pre-training

bash scripts/pretrain.sh

Use the above command to start the pre-training stage.

--noise_path ${char_file} \
--clean_path ${train_file} \
--mask_output_path ${mask_output} \
--mask_input_path ${mask_input} \
--classify_output_path ${classify_output} \
--classify_input_path ${classify_input} \

The above parameters represent the datasets of the three pre-training tasks proposed in our pre-training stage.

Training

bash scripts/train.sh

Use the above command to start the training stage.

--add_demonstration \
--demons_train_path ${mix_input} \
--demons_out_path ${mix_output} \
--demons_valid_path ${valid_demons_input} \
--demons_val_out_path ${valid_demons_output} \

The 'add_demonstration' represents whether to add demonstrations in the training stage. If this parameter is provided, the following four data paths related to the demonstration need to be provided.

Testing

bash scripts/test.sh

Use the above command to start the testing stage.

--test_file_path ${test_path} \

The 'test_path' represents the root directory containing the test datasets.

About

The code of EMNLP 2023: DemoNSF: A Multi-task Demonstration-based Generative Framework for Noisy Slot Filling Task

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published