Skip to content

anastasiakrith/multimodal-retrieval-for-vwsd

Repository files navigation

Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation (VWSD)

This is the source code of the EMNLP 2023 paper Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation [paper].

Install

git clone https://github.com/anastasiakrith/multimodal-retrieval-for-vwsd.git
cd multimodal-retrieval-for-vwsd

Setting up (virtualenv)

On the project folder run the following commands:

  1. $ virtualenv env to create a virtual environment
  2. $ source venv/bin/activate to activate the environment
  3. $ pip install -r requirements.txt to install packages
  4. Create a .env file with the environmental variables. The project needs a OPENAI_API_KEY with the API key corresponding to your openai account, and optionally a DATASET_PATH corresponding to the absolute path of VWSD dataset.

Running the project

VL Retrieval

python vl_retrieval_eval.py -llm "gpt-3.5" -vl "clip" -baseline -penalty 

QA Retrieval

python qa_retrieval_eval.py -llm "gpt-3.5" -captioner "git" -strategy "greedy" -prompt "no_CoT" -zero_shot

Image-to-Image Retrieval

python image_retrieval_eval.py -vl "clip" -wiki "wikipedia" -metric "cosine"

Text-to-Text Retrieval

python text_retrieval_eval.py -captioner "git" -strategy "greedy" -extractor "clip" -metric "cosine"

Acknowledgement

The implementation relies on resources from openai-api and hugging-face transformers.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages