RL-NLP

Code for the paper Learning Natural Language Generation with Truncated Reinforcement Learning accepted at Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

This repo is using CLEVR and VQAv2 Datasets:

Get data

gdown --id 1AVZXRzmKBxVH6Ul9ZviSWPVdj_kwU3yX --output data/data.zip
unzip data/data.zip -d data/
rm data/data.zip
rm data/vqa-v2/cache/vocab.json

If you want the whole clevr dataset:

rm -r data/CLEVR_v1.0
wget https://dl.fbaipublicfiles.com/clevr/CLEVR_v1.0.zip -O data/CLEVR_v1.0.zip
unzip data/CLEVR_v1.0.zip -d data
rm data/CLEVR_v1.0.zip

If you want the whole vqa-v2 dataset

wget https://dl.fbaipublicfiles.com/vilbert-multi-task/datasets/coco/features_100/COCO_trainval_resnext152_faster_rcnn_genome.lmdb/data.mdb
mv data.mdb data/vqa-v2/coco_trainval.lmdb/

To download the clevr dialog on 20000 images to train the external language model

gdown --id 1BSqXY6KV4wOxo6tdjP7xej54gMvqk7k1 --output data/clevr_ext/clevr_dialog_train_raw.json

To get the necessary models:

chmod +x sh/files/get_models.sh
sh/files/get_models.sh

Requirements

You can create a conda environment called rl-nlp: conda create -n rl-nlp
And activate it: conda activate rl-nlp
The required library can be installed via the file requirements.txt: pip install -r requirements.txt
The code relies on the CLOSURE github: you need to install it with: python -m pip install git+https://github.com/gqkc/CLOSURE.git --upgrade
And on the VILBERT multi-task github: python -m pip install git+https://github.com/gqkc/vilbert-multi-task.git --upgrade

File architecture

RL-NLP
├── config         # store the configuration file to create/train models
|  
|
├── output         # store the output experiments (pre-trained models, logs...)
|   ├── lm_model / model.pt : path for pre-trained model .pt on CLEVR dataset. 
|   ├── SL_LSTM_32_64 / model.pt: path for the pre-trained policy .pt on CLEVR dataset. 
|   ├── SL_LSTM_32_64_VQA / model.pt: path for the pre-trained policy conditioned on the answer (for VQA reward) on the CLEVR dataset. 
|   └── vqa_model_film / model.pt: path for the pre-trained oracle model for the "vqa" reward of the CLEVR dataset
    └── lm_model_vqa / model.pt: path for pre-trained lm model
    └── vqa_policy_512_1024_answer / model.pt
    └── vilbert_vqav2
         ├── model.bin : path for vilbert oracle fine-tuned on vqav2 task
         ├── bert_base_6layer_6conect.json : config file for vilbert oracle
|
├── data          
|   └── CLEVR1.0  # root folder for the CLEVR dataset.
    └── vqa-v2 # root folder for the VQA-V2 dataset.
         ├── coco_trainval.lmdb # lmdb folder for the image features (reduced one on local machine, complete one on VM). 
         ├── cache 
              ├── vocab.json: path for vocab. 
    └── closure_vocab.json: vocab path for closure dataset (used on the "vqa reward" of CLEVR). 
    └── vocab.json: vocab path for the CLEVR dataset. 
    └── train_questions.h5: h5 file for training question dataset. 
    └── val_questions.h5: h5 file for validation question dataset. 
    └── test_questions.h5: h5 file for test question dataset. 
    └── train_features.h5: h5 file for training images features. 
    └── val_features.h5: h5 file for validation images features. 
    └── test_features.h5: h5 file for test images features. 
    
|
└── src            # source files

Data preprocessing

CLEVR

To run all the scripts from the origin repo (RL-NLP), run first the following command line: export PYTHONPATH=src:${PYTHONPATH}

Preprocessing the dataset questions

To preprocess the questions of the three datasets, run the scripts src/sh/preprocess_questions or the 3 following command lines (in this order):

python src/preprocessing/preprocess_questions.py -data_path "data/CLEVR_v1.0/questions/CLEVR_train_questions.json" \ -out_vocab_path "data/vocab.json" -out_h5_path "data/train_questions.h5" -min_token_count 1
python src/preprocessing/preprocess_questions.py -data_path "data/CLEVR_v1.0/questions/CLEVR_val_questions.json" \ -out_vocab_path "data/vocab.json" -out_h5_path "data/val_questions.h5" -min_token_count 1
python src/preprocessing/preprocess_questions.py -data_path "data/CLEVR_v1.0/questions/CLEVR_test_questions.json" \ -out_vocab_path "data/vocab.json" -out_h5_path "data/test_questions.h5" -min_token_count 1

Extracting the image features

To extract the image features, run the script src/sh/extract_features.py or the 3 following command lines (batch size arg must be tuned depending on memory availability):

python src/preprocessing/extract_features.py \ --input_image_dir data/CLEVR_v1.0/images/train \ --output_h5_file data/train_features.h5 --batch_size 128
python src/preprocessing/extract_features.py \ --input_image_dir data/CLEVR_v1.0/images/val \ --output_h5_file data/val_features.h5 --batch_size 128
python src/preprocessing/extract_features.py \ --input_image_dir data/CLEVR_v1.0/images/test \ --output_h5_file data/test_features.h5 --batch_size 128

VQA

First, extract the vocab:

Extracting full vocab

python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "none" -min_split 0

This creates a file "vocab.json" with the vocab.

Extract a reduced vocab (on smaller train and val datasets)

python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "none" -min_split 1

This creates a file "vocab_min.json" with the vocab.

Secondly, get the preprocessed pkl file for each dataset

Full datasets (with vocab.json)

python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "data/vqa-v2/cache/vocab.json" -split "train" -min_split 0 -test 1

This will create a file "train_entries.pkl"

python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "data/vqa-v2/cache/vocab.json" -split "val" -min_split 0 -test 1

This will create a file "val_entries.pkl"

python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "data/vqa-v2/cache/vocab.json" -split "train" -min_split 1 -test 0

This will create a file "mintrain_entries.pkl"

python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "data/vqa-v2/cache/vocab.json" -split "val" -min_split 1 -test 0

This will create a file "minval_entries.pkl"

Reduced datasets on reduced vocab (train_dataset = 20,000 questions & val_dataset = 5,000 questions)

python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "data/vqa-v2/cache/vocab_min.json" -split "train" -min_split 1 -test 0

This will create a file "mintrain_minvocab_entries.pkl".

python src/preprocessing/preprocess_vqa_dataset.py -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -vocab_path "data/vqa-v2/cache/vocab_min.json" -split "val" -min_split 1 -test 0

This will create a file "minval_minvocab_entries.pkl".

if you want to use vilbert:

cd ..
git clone https://github.com/gqkc/vilbert-multi-task.git
cd vilbert-multi-task/tools/
git clone -b python3 https://github.com/lichengunc/refer.git
cd refer
make
# if problem, do instead: 
python setup.py install
cd ../../
python -m pip install -e . 

#if problem with python : 
python -m pip install --upgrade cython

Training the models

Link to the pre-trained models

CLEVR

Language Model .pt file here.
Levenshtein Task:

Pretrained Policy .pt file (word_emb_size = 32, hidden_size = 64) here.

VQA task:

Pretrained VQA model (FiLM version here.
Pretrained Policy here

VQAV2

VQA task:

Pretrained VQA VILBERT model here.
Pretrained VQA VILBERT config file here.

Training the Language Model on the Dataset of Questions

CLEVR

python src/train/launch_train.py -task "lm" -dataset "clevr" -model "lstm" -num_layers 1 -emb_size 512 -hidden_size 512 -p_drop 0.1 -lr 0.001 -data_path "data" -out_path "output" -bs 512 -ep 20 -num_workers 6

VQA

python src/train/launch_train.py -task "lm" -dataset "vqa" -model "lstm" -num_layers 1 -emb_size 512 -hidden_size 512 -p_drop 0.1 -lr 0.001 -data_path "data/vqa-v2" -features_path "data/vqa-v2/coco_trainval.lmdb" -out_path "output" -bs 512 -ep 50 -num_workers 6

Pre-training of the Policy with Supervised Learning

CLEVR

No answer conditioning

python src/train/launch_train.py -task "policy" -dataset "clevr" -data_path "data" -out_path "output/policy_pre_training" -emb_size 32 -hidden_size 64 -bs 512 -ep 50 -num_workers 0 -max_samples 21 -fusion "cat"
N.B: When training only on a CPU, the max_samples args is required to train only on a subset of the dataset.

w/ answer conditioning

python src/train/launch_train.py -task "policy" -dataset "clevr" -data_path "data" -out_path "output/policy_pre_training" -emb_size 32 -hidden_size 64 -bs 512 -ep 50 -num_workers 0 -max_samples 21 -fusion "cat" -condition_answer "after_fusion"

VQA

No answer conditioning

python src/train/launch_train.py -task "policy" -dataset "vqa" -data_path "data" -out_path "output/policy_pre_training" -emb_size 32 -hidden_size 64 -bs 512 -ep 50 -num_workers 0 -fusion "average"

w/ answer conditioning

python src/train/launch_train.py -task "policy" -dataset "vqa" -data_path "data" -out_path "output/policy_pre_training" -emb_size 32 -hidden_size 64 -bs 512 -ep 50 -num_workers 0 -fusion "average" -condition_answer "after_fusion"

Training the RL Agent

See examples in src/scripts/sh.
The folder "debug" allows to run small experiments on each of the algo for the 2 CLEVR tasks (Levenshtein & VQA rewards).

logging on tensorboard to display results:

cd output/2000_img_len_20"
tensorboard --logdir=experiments/train

With GCP VM (VM Instance name here = alice_martindonati@pytorch-3-vm), on local machine:

gcloud compute ssh alice_martindonati@pytorch-3-vm -- -NfL 6006:localhost:6006

Name		Name	Last commit message	Last commit date
Latest commit History 1,589 Commits
config/log		config/log
data		data
output		output
sh		sh
src		src
.gitignore		.gitignore
README.md		README.md
extract_html.py		extract_html.py
extract_threads.py		extract_threads.py
gapi.py		gapi.py
merge_html.py		merge_html.py
merge_html_1.py		merge_html_1.py
merge_metrics.py		merge_metrics.py
merge_stats.py		merge_stats.py
merge_stats_1.py		merge_stats_1.py
requirements.txt		requirements.txt
rl-nlp.yml		rl-nlp.yml

AMDonati/RL-NLP

Folders and files

Latest commit

History

Repository files navigation

RL-NLP

Get data

Requirements

File architecture

Data preprocessing

CLEVR

Preprocessing the dataset questions

Extracting the image features

VQA

First, extract the vocab:

Extracting full vocab

Extract a reduced vocab (on smaller train and val datasets)

Secondly, get the preprocessed pkl file for each dataset

Full datasets (with vocab.json)

Reduced datasets on reduced vocab (train_dataset = 20,000 questions & val_dataset = 5,000 questions)

if you want to use vilbert:

Training the models

Link to the pre-trained models

CLEVR

VQAV2

Training the Language Model on the Dataset of Questions

CLEVR

VQA

Pre-training of the Policy with Supervised Learning

CLEVR

No answer conditioning

w/ answer conditioning

VQA

No answer conditioning

w/ answer conditioning

Training the RL Agent

logging on tensorboard to display results:

About

Topics

Resources

Stars

Watchers

Forks

Languages