Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation (VWSD)

This is the source code of the EMNLP 2023 paper Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation [paper].

Install

git clone https://github.com/anastasiakrith/multimodal-retrieval-for-vwsd.git
cd multimodal-retrieval-for-vwsd

On the project folder run the following commands:

$ virtualenv env to create a virtual environment
$ source venv/bin/activate to activate the environment
$ pip install -r requirements.txt to install packages
Create a .env file with the environmental variables. The project needs a OPENAI_API_KEY with the API key corresponding to your openai account, and optionally a DATASET_PATH corresponding to the absolute path of VWSD dataset.

python vl_retrieval_eval.py -llm "gpt-3.5" -vl "clip" -baseline -penalty

python qa_retrieval_eval.py -llm "gpt-3.5" -captioner "git" -strategy "greedy" -prompt "no_CoT" -zero_shot

python image_retrieval_eval.py -vl "clip" -wiki "wikipedia" -metric "cosine"

python text_retrieval_eval.py -captioner "git" -strategy "greedy" -extractor "clip" -metric "cosine"

The implementation relies on resources from openai-api and hugging-face transformers.