Skip to content

RainYuGG/BLIP-Adapter

Repository files navigation

image captioning based on Screen2Words

Download repo

git clone --recursive https://github.com/RainYuGG/image-captioning-based-on-Screen2Words.git

Build Environment & Requirement

  • Use conda to build the environment
conda env create -f environment.yml

Config file

Here are some arguments of partial tuning from the configuration file that are used to build the BLIP adapter model.

  • adapter_type: "vit" : vit adapter ("vit", "vit_grayscale")

  • bert_adapter: "bottleneck_adapter" : bert adapter ("bottleneck_adapter", "lora_adapter", or other implementations in adapterhub)

    • if you want to use other implementations in adapterhub, you need to simply modify the code in loader.py
  • tune_language: false : tune whole language model or not (True or False)

Train

python train.py --img-dir /path/to/rico --s2w-dir /path/to/screen2words -e 30 -b 32 -p 15

Evaluation

python eval.py -ckpt /path/to/checkpoint

Generate Caption

python generater.py --img-dir /path/to/rico python train.py -ckpt /path/to/checkpoint --image_id 54137

Install coco-caption

To properly obtain the CIDEr Score in eval.py, you need to install the coco_caption package. Follow the steps below to install it:

1. Install the model

Clone the coco_caption repository and navigate to the cloned directory:

cd coco_caption
bash get_stanford_models.sh
pip install gensim

2. Install java

  1. Download the latest version of Java 8 from the Oracle website: Java 8
  2. Copy the downloaded file to your system and extract it with the following commands:
sudo cp jdk-xxxxx_linux-x64_bin.tar.gz /opt
cd /opt
sudo mkdir java
sudo chown ${USER}:${USER} java
sudo tar -zxvf jdk-xxxxx_linux-x64_bin.tar.gz -C /opt/java
  1. Set the environment variable in ~/.bashrc by adding the following lines:
#set java environment
export JAVA_HOME=/opt/java/jdk1.8.xx
export PATH=${JAVA_HOME}/bin:${PATH}
  1. Source the .bashrc file to apply the changes:
source ~/.bashrc
  1. Verify that Java is installed by checking the version:
java -version

3. Install WMD

To install WMD, run the following command:

bash get_google_word2vec_model.sh

4. Demo

To run the demo, execute the following command:

python scorer.py

Make sure to run the demo after the installation to confirm that everything works as expected.

reference

dataset

Rico / Rico UI Screenshots and View Hierarchies dataset

Screen2Words: paper / code / dataset