Skip to content

Official repo for "Imagination-Augmented Natural Language Understanding", NAACL 2022.


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



8 Commits

Repository files navigation

Imagination-Augmented Natural Language Understanding

This is the official repor for our paper titled "Imagination-Augmented Natural Language Understanding", which is accepted to NAACL 2022.


Human brains integrate linguistic and perceptual information simultaneously to understand natural language, and hold the critical ability to render imaginations. Such abilities enable us to construct new abstract concepts or concrete objects, and are essential in involving practical knowledge to solve problems in low-resource scenarios. However, most existing methods for Natural Language Understanding (NLU) are mainly focused on textual signals. They do not simulate human visual imagination ability, which hinders models from inferring and learning efficiently from limited data samples. Therefore, we introduce an Imagination-Augmented Cross-modal Encoder (iACE) to solve natural language understanding tasks from a novel learning perspective---imagination-augmented cross-modal understanding. iACE enables visual imagination with external knowledge transferred from the powerful generative and pre-trained vision-and-language models. Extensive experiments on GLUE and SWAG show that iACE achieves consistent improvement over visually-supervised pre-trained models. More importantly, results in extreme and normal few-shot settings validate the effectiveness of iACE in low-resource natural language understanding circumstances.

Overview of iACE. The generator G visualize imaginations close to the encoded texts by minimizing LGAN. The cross-modal encoder Ec learns imagination-augmented language representation. Two-step learning procedure consists of: 1) pre-train a Transformer with visual supervision from large-scale language corpus and image set, 2) fine-tune the visually supervised pre-trained Transformer and the imagination-augmented cross-modal encoder on downstream tasks.

Set up

This example uses Anaconda to manage virtual Python environments.

Clone our repository:

git clone --recursive ''

Create a new virtual Python environment for iACE-NLU:

conda create --name iace python=3.9
conda activate iace

Install Pytorch in the new enviroment:

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f

Install other required Python packages:

pip install -r requiresments.txt

To install NVIDIA's apex:

git clone
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Download Experiment Datasets

Download GLUE data

cd tools
python --data_dir ../data/glue

Get SWAG data from

Visual Imaginary Generation

Based on CLIP-Guided GAN

You will need at least 1 VQGAN pretrained model. E.g.

cd ImaGen
mkdir checkpoints

curl -L -o checkpoints/vqgan_imagenet_f16_16384.yaml -C - ''
curl -L -o checkpoints/vqgan_imagenet_f16_16384.ckpt -C - ''

The script in ImaGen directory is an optional way to download a number of models. By default, it will download just 1 model.

See for more information about VQGAN+CLIP.

Based on CLIP-Guided Diffusion

You will need at least 1 Diffusion pretrained model. E.g.

cd ImaGen/checkpoints
curl -OL --http1.1 ''
curl -OL ''

See for more information about VQGAN+CLIP.

Imagination Construction

You can generate imagination data for the GLUE and SWAG by specifying the dataset and output directory:

# Using GAN
CUDA_VISIBLE_DEVICES=0 python -rp glue_task_split
# For glue_mnli_train: CUDA_VISIBLE_DEVICES=0 python -rp glue_mnli_train
CUDA_VISIBLE_DEVICES=0 python -rp swag
# Using Diffusion
CUDA_VISIBLE_DEVICES=0 python -rp glue_task_split
CUDA_VISIBLE_DEVICES=0 python -rp swag

To generate one example, you can specify your text prompt as shown in the example below:

python -p "Down by the salley gardens my love and I did meet"
python -p "Down by the salley gardens my love and I did meet"

You can then extract features of the glue and swag dataset by (use 'eval' option for dev split):

cd tools
python -nlu_dataset glue
python -nlu_dataset swag

Visually-Supervised Transformer

More details for pre-training the visually-supervised transformer can be found at Vokenization.

Imagination-Augmented NLU

To fine-tune over glue, you can do by:

bash scripts/run_glue_at_epoch.bash 0,1,2,3 30 bert_base 30 langvis loading bert bert-base-uncased 5 32 100

Similary, to fine-tune over swag, you can:

bash scripts/run_swag_at_epoch.bash 0,1,2,3 30 bert_base 30 langvis loading bert bert-base-uncased 5 32 100


If you find our repo useful, please cite this paper:

  doi = {10.48550/ARXIV.2204.08535},
  url = {},
  author = {Lu, Yujie and Zhu, Wanrong and Wang, Xin Eric and Eckstein, Miguel and Wang, William Yang},
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Imagination-Augmented Natural Language Understanding},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}


We thank the authors of vokenization, VQGAN-CLIP, CLIP-Guided-Diffusion for releasing their code.


Official repo for "Imagination-Augmented Natural Language Understanding", NAACL 2022.







No releases published


No packages published