Recently, there has been an increased interest in evaluating whether neural models are able to generalize compositionally. Previous work had shown that seq2seq models such as LSTMs lack the inductive biases required for compositional generalization. We show that by modifying the training data distributions, neural sequence models such as LSTMs and Transformers achieve near perfect accuracies on compositional generalization benchmarks such as SCAN and Colors.
- compatible with python 3.6
- dependencies can be installed using
Compositional-Generalization-Seq2Seq/code/requirements.txt
Install VirtualEnv using the following (optional):
$ [sudo] pip install virtualenv
Create and activate your virtual environment (optional):
$ virtualenv -p python3 venv
$ source venv/bin/activate
Install all the required packages:
at Compositional-Generalization-Seq2Seq/code:
$ pip install -r requirements.txt
To create the relevant directories, run the following command in the corresponding directory of that model:
for eg, at Compositional-Generalization-Seq2Seq/code/transformer:
$ sh setup.sh
Then transfer all the data folders to the data subdirectory of that model, which in case is Compositional-Generalization-Seq2Seq/code/transformer/data/
The current repository includes implementations of 2 Models:
- Transformer in
Compositional-Generalization-Seq2Seq/code/transformer
- Sequence-to-Sequence Transformer Model
- LSTM in
Compositional-Generalization-Seq2Seq/code/lstm
- Sequence-to-Sequence LSTM Model
We work with the following datasets:
-
SCAN add jump split
- Train Data Size: 13,204 examples
- Test Data Size: 7,706 examples
- Available online at: https://github.com/brendenlake/SCAN
-
Colors
- Train Data Size: 14 examples
- Test Data Size: 8 examples
-
COGS
- Train Data Size: 24,155 examples
- Test Data Size: 21,000 examples
- Available online at: https://github.com/najoungkim/COGS
The set of command line arguments available can be seen in the respective args.py
file. Here, we illustrate running a Transformer on the SCAN add_jump dataset in which the train set was modified to include 100 extra primitives. Follow the same methodology for running any experiment over any model.
at Compositional-Generalization-Seq2Seq/code/transformer:
$ python -m src.main -mode train -project_name test_runs -model_selector_set val -pretrained_model_name none -finetune_data_voc none -dev_set -no-test_set -no-gen_set -dataset add_jump_100_prims_controlled -dev_always -no-test_always -no-gen_always -epochs 150 -save_model -no-show_train_acc -embedding random -no-freeze_emb -no-freeze_emb2 -no-freeze_transformer_encoder -no-freeze_transformer_decoder -no-freeze_fc -d_model 64 -d_ff 512 -decoder_layers 3 -encoder_layers 3 -heads 2 -batch_size 64 -lr 0.0005 -emb_lr 0.0005 -dropout 0.1 -run_name RUN-train_try -gpu 1
at Compositional-Generalization-Seq2Seq/code/transformer:
$ python -m src.main -mode test -project_name test_runs -pretrained_model_name RUN-train_try -finetune_data_voc none -no-dev_set -no-test_set -gen_set -dataset add_jump_100_prims_controlled_10_prims_test -batch_size 1024 -run_name RUN-test_try -gpu 1
If you use our data or code, please cite our work:
@misc{https://doi.org/10.48550/arxiv.2203.07402,
doi = {10.48550/ARXIV.2203.07402},
url = {https://arxiv.org/abs/2203.07402},
author = {Patel, Arkil and Bhattamishra, Satwik and Blunsom, Phil and Goyal, Navin},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Revisiting the Compositional Generalization Abilities of Neural Sequence Models},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
For any clarification, comments, or suggestions please contact Arkil or Satwik.