DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition
This repository is supplement material for the paper: DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition. This is accpeted by EMNLP 2025 Main Conference.
-
Our paper is accpeted by EMNLP 2025!!!
-
We add more existing open-source datasets in our format and also the format for fine-tuning and inferrence based on SWIFT! You can test CascadeNER easier!
-
We discover a problem that as SWIFT has been updated and some parameters has been changed, so please use the old version (according to requirements.txt).
-
We provide
demo.pyfor testing CascadeNER easier.
-
This repository includes DynamicNER and CascadeNER, our NER dataset and framework.
-
DynamicNER is the first dataset specially designed for NER with LLMs with a novel dynamic categorization system. It's multilingual and fine-grained.
-
CascadeNER is the first universal and multilingual NER framework with SLMs, which supports both few-shot and zero-shot scenarios and achieves SOTA performance on low-resource and fine-grained datasets
conda create -n dynamicner python=3.10pip install -r requirements.txt- You may also use a standard environment for SWIFT.
- You may also download
./DynamicNER.7zand unzip it to obtain the dataset for training.
-
Dataset preparation: the
DynamicNER_processdirectory contains the scripts for generating dynamic datasets, running format conversions, and validating labels. SeeDynamicNER_process/readme.mdfor the full checklist (includingcheck_dynamic_classify.py,prune_dynamic_classify.py, andsync_extract.py). -
Training data: please use SWIFT for model training. We strongly recommend Qwen series for your base models. You may follow the examples in any
train.jsonfrom BASE-format datasets on Hugging Face to understand the training layout. A concrete example is provided in./DynamicNER_process/example.json. -
CascadeNER inference: Stage-1 extraction, Stage-2 classification, and evaluation are documented in
./CascadeNER/README.md. Configure paths via CLI arguments/environment variables rather than editing the scripts directly. Typical usage involves runningextract.sh, thenpython model/infer.py, and finallypython evaluate.py. -
Transformation utilities: to transform your own corpora into SWIFT/BIO formats, use
./DynamicNER_process/transformation/stage1_trans.pyandstage2_trans.py; BIO exports are handled byBIO_trans.py(andBIO_trans_zh.py). -
PS: Due to the update of SWIFT, you may need to use the old version to directly use our code, or you can modify the code slightly with the guidance from SWIFT. We will later provide a updated version of code for this problem.
