image captioning based on Screen2Words

Download repo

git clone --recursive https://github.com/RainYuGG/image-captioning-based-on-Screen2Words.git

Build Environment & Requirement

Use conda to build the environment

conda env create -f environment.yml

Install coco-caption for evaluation (BLEU, CIDEr).

Config file

Here are some arguments of partial tuning from the configuration file that are used to build the BLIP adapter model.

adapter_type: "vit" : vit adapter ("vit", "vit_grayscale")
bert_adapter: "bottleneck_adapter" : bert adapter ("bottleneck_adapter", "lora_adapter", or other implementations in adapterhub)
- if you want to use other implementations in adapterhub, you need to simply modify the code in loader.py
tune_language: false : tune whole language model or not (True or False)

Train

python train.py --img-dir /path/to/rico --s2w-dir /path/to/screen2words -e 30 -b 32 -p 15

Evaluation

python eval.py -ckpt /path/to/checkpoint

Generate Caption

python generater.py --img-dir /path/to/rico python train.py -ckpt /path/to/checkpoint --image_id 54137

Install coco-caption

To properly obtain the CIDEr Score in eval.py, you need to install the coco_caption package. Follow the steps below to install it:

1. Install the model

Clone the coco_caption repository and navigate to the cloned directory:

cd coco_caption
bash get_stanford_models.sh
pip install gensim

2. Install java

Download the latest version of Java 8 from the Oracle website: Java 8
Copy the downloaded file to your system and extract it with the following commands:

sudo cp jdk-xxxxx_linux-x64_bin.tar.gz /opt
cd /opt
sudo mkdir java
sudo chown ${USER}:${USER} java
sudo tar -zxvf jdk-xxxxx_linux-x64_bin.tar.gz -C /opt/java

Set the environment variable in ~/.bashrc by adding the following lines:

#set java environment
export JAVA_HOME=/opt/java/jdk1.8.xx
export PATH=${JAVA_HOME}/bin:${PATH}

Source the .bashrc file to apply the changes:

source ~/.bashrc

Verify that Java is installed by checking the version:

java -version

3. Install WMD

To install WMD, run the following command:

bash get_google_word2vec_model.sh

4. Demo

To run the demo, execute the following command:

python scorer.py

Make sure to run the demo after the installation to confirm that everything works as expected.

reference

dataset

Rico / Rico UI Screenshots and View Hierarchies dataset

Screen2Words: paper / code / dataset

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
coco_caption @ 00e7d2b		coco_caption @ 00e7d2b
configs		configs
models		models
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
environment.yml		environment.yml
eval.py		eval.py
generator.py		generator.py
loader.py		loader.py
s2w_dataset.py		s2w_dataset.py
scorer.py		scorer.py
tfm.py		tfm.py
train.py		train.py

RainYuGG/BLIP-Adapter

Folders and files

Latest commit

History

Repository files navigation