Skip to content

[ACL 2024 Findings] LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition

Notifications You must be signed in to change notification settings

JinYuanLi0012/RiVEG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition

PWC


Here are code and datasets for our Findings of ACL 2024 and further extended paper: LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition & Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation

News🔥

  • [Jun. 2024] A new research has been released. In this paper, we propose a new Segmented Multimodal Named Entity Recognition (SMNER) task and construct the corresponding Twitter-SMNER dataset. Code and Twitter-SMNER dataset coming soon~🔨
  • [May. 2024] RiVEG has been accepted to ACL 2024 Findings.
  • [Oct. 2023] PGIM has been accepted to EMNLP 2023 Findings.

Todo List

  • Release datasets and training and inference scripts at each stage.
  • Release new Twitter-SMNER dataset.
  • Release full-process automatic training and inference scripts.

Usage

The implementation of RiVEG mainly includes the following three parts:

MNER Module

The MNER stage of RiVEG is mainly based on AdaSeq, AdaSeq project is based on Python version >= 3.7 and PyTorch version >= 1.8. For details on the configuration of the environment, see PGIM and AdaSeq.

After following the instructions of PGIM to complete the configuration, use the following commands to complete the training and prediction of the MNER stage. Note that we have provided RiVEG's yaml configuration file, please replace the original configuration of PGIM with the configuration file here.

python -m scripts.train -c examples/PGIM/twitter-10000.yaml

After completing training, the inference results will be saved in experiments/RiVEG-twitter-10000/.../pred.txt.

Visual Entailment Module

The VE stage of RiVEG is mainly based on OFA, please configure the corresponding environment according to its instructions.

RiVEG's VE data are available here. Download these data and place them in the OFA/dataset/snli_ve_data/ folder.

We have provided the twitter10000_addent_test_pred.tsv file which contains the ChatGPT answers. If you want to obtain the corresponding VE test set based on your own MNER inference results, you can run the corresponding code we provide here. First put the MNER prediction results experiments/RiVEG-twitter-10000/.../pred.txt into the data_processing/MNER_processing folder, and then run the extract entities.py and GPT request.py files in sequence. The GPT answer will be written into data_processing/MNER_processing/testAnswer.txt. Note that line 9 of the GPT request.py file needs to be replaced with your own key. And there will be a small amount of expenses.

After this, use Twitter10000_to_OFA_VE.py file to get the final tsv dataset, don't forget to replace lines 40 and 122 with the xml and img data paths of the original GMNER dataset.

Same usage as OFA_Visual Entailment, then use the script we provide to perform training and inference. Please pay attention to the modification of various file paths.

cd run_scripts/snli_ve
nohup sh train_twitter10000.sh > train_twitter10000.out &  # finetune for twitter10000_VE
sh evaluate_twitter10000.sh test  # inference and specify 'dev' or 'test'

The inferred VE results will be stored in OFA/results/snli_ve_twitter10000pred/_predict.json.

Visual Grounding Module

Same as VE Module, the VG stage of RiVEG is also based on OFA.

RiVEG's VG data are available here. Download these data and place them in the OFA/dataset/refcoco_data/ folder.

Similarly, we have provided the preprocessed twitter10000REC_addent_test_pred.tsv file. You can also make your own test set by running Twitter10000_to_OFA_REC.py to convert VE predictions _predict.json to VG input. Don't forget to replace line 98 of it with your own GMNER img path.

Same usage as OFA_Visual Grounding, then use the script we provide to perform training and inference. Please pay attention to the modification of various file paths.

cd run_scripts/refcoco
nohup sh train_twitter10000REC.sh > train_twitter10000REC.out &  # finetune for twitter10000_VG
sh evaluate_twitter10000REC.sh test  # inference and specify 'dev' or 'test'

The inference results of Visual Grounding are located at ../results/twitter10000REC/OFAlargeVE_OFAlargeREC_pred/refcoco_val_predict.json

Result statistics

If you did not build the test set yourself, you can directly obtain the statistical results by running statistics.py. Note that you need to modify its 12 lines to your ../results/twitter10000REC/OFAlargeVE_OFAlargeREC_pred/refcoco_val_predict.json file.

If you build the test set yourself, first replace the original path with your own data_precessing/VG_precessing/OFAVE_to_OFAREC.txt on line 4, and then use your ../results/twitter10000REC/OFAlargeVE_OFAlargeREC_pred/refcoco_val_predict.json on line 12 to get the statistical results.

Citation

If you find RiVEG useful in your research, please consider citing:

@inproceedings{li2023prompting,
  title={Prompting chatgpt in MNER: enhanced multimodal named entity recognition with auxiliary refined knowledge},
  author={Li, Jinyuan and Li, Han and Pan, Zhuo and Sun, Di and Wang, Jiahao and Zhang, Wenkun and Pan, Gang},
  booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
  year={2023}
}

@article{li2024llms,
  title={LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition},
  author={Li, Jinyuan and Li, Han and Sun, Di and Wang, Jiahao and Zhang, Wenkun and Wang, Zan and Pan, Gang},
  journal={arXiv preprint arXiv:2402.09989},
  year={2024}
}

@misc{li2024advancing,
      title={Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation}, 
      author={Jinyuan Li and Ziyan Li and Han Li and Jianfei Yu and Rui Xia and Di Sun and Gang Pan},
      year={2024},
      eprint={2406.07268},
      archivePrefix={arXiv},
      primaryClass={cs.MM}
}

Acknowledgement

The dataset we used is provided by H-Index. And our code is built upon the open-sourced AdaSeq, MoRe, OFA. Thanks for their great work!

About

[ACL 2024 Findings] LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published