This repository contains the official implementation code of the ACL 2023 short paper: Analyzing Text Representations by Measuring Task Alignment.
Requires: python>=3.12
To set up the environment:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtOr using conda:
conda create --name task_alignment python=3.12
conda activate task_alignment
pip install -r requirements.txtThe embedding command generates transformer-based embedding representations. See task_alignment/representation.py
for detailed documentation.
To generate a BERT-base sentence embedding using the second-to-last layer with mean token pooling:
python -m task_alignment embedding --dataset=wnli@glue --model=bert-base-uncased \
--layer_pooling=2nd-to-last --token_pooling=meanTo generate sparse BoWs, GloVe static embeddings, and fastText static embeddings:
python -m task_alignment ngrams --dataset=wnli@glue --tokenizer=bert-base-uncased --vectorizer=tf --max_ngrams=1
python -m task_alignment glove --dataset=wnli@glue --model=840B.300d
python -m task_alignment fasttext --dataset=wnli@glue --model=crawl-300d-2M-subwordThe alignment command computes hierarchical clustering alignment between a representation and its corresponding
labels.
Example using a dataset and generated representation:
python -m task_alignment alignment --dataset=wnli@glue --representation=bert-base-uncased_2nd-to-last_meanAlternatively, specify paths to precomputed representations and labels:
python -m task_alignment alignment --representation=data/representation/X.npy --labels=data/labels/dataset/train.txtThe intrinsic command computes hierarchical clustering intrinsic metrics of a representation.
You can use a dataset-representation pair or a direct representation path:
python -m task_alignment intrinsic --dataset=wnli@glue --representation=bert-base-uncased_2nd-to-last_mean
# Or using a direct file path:
python -m task_alignment intrinsic --representation=data/representation/X.npyThe probing command probes train-test representations with their respective labels.
The default strategy is low-annotation probing using learning curves, i.e., it uses max-entropy classifiers trained on subsamples of 100 to 1000 examples (in steps of 100) to generate learning curves.
To probe using the pool and test partitions from a dataset with generated representations:
python -m task_alignment probing --dataset=wnli@glue --representation=bert-base-uncased_2nd-to-last_meanOr specify train/test representations and labels explicitly:
python -m task_alignment probing --representation=data/representation/X.npy --labels=data/labels/dataset/train.txt \
--test_representation=data/representation/X_test.npy --test_labels=data/labels/dataset/test.txtTo reproduce the experiments from the paper, run the following commands:
# Generate representations
python -m task_alignment embedding --dataset=imdb,s140,wiki_toxic_clean,civil_comments_clean \
--model=bert-base-uncased --layer_pooling=2nd-to-last --token_pooling=mean,cls # (GPU)
python -m task_alignment ngrams --dataset=imdb,s140,wiki_toxic_clean,civil_comments_clean \
--tokenizer=bert-base-uncased --vectorizer=tf --max_ngrams=1
python -m task_alignment glove --dataset=imdb,s140,wiki_toxic_clean,civil_comments_clean
python -m task_alignment fasttext --dataset=imdb,s140,wiki_toxic_clean,civil_comments_clean
# Compute metrics
python -m task_alignment alignment --dataset=imdb,s140,wiki_toxic_clean,civil_comments_clean \
--representation=tf_1-grams,glove-840B.300d_mean,fasttext-crawl-300d-2M-subword_mean,bert-base-uncased_2nd-to-last_mean,bert-base-uncased_2nd-to-last_cls
python -m task_alignment intrinsic --dataset=imdb,s140,wiki_toxic_clean,civil_comments_clean \
--representation=tf_1-grams,glove-840B.300d_mean,fasttext-crawl-300d-2M-subword_mean,bert-base-uncased_2nd-to-last_mean,bert-base-uncased_2nd-to-last_cls
python -m task_alignment probing --dataset=imdb,s140,wiki_toxic_clean,civil_comments_clean \
--representation=tf_1-grams,glove-840B.300d_mean,fasttext-crawl-300d-2M-subword_mean,bert-base-uncased_2nd-to-last_mean,bert-base-uncased_2nd-to-last_cls
# Plot results
python -m task_alignment.plot scatter --dataset=imdb,wiki_toxic_clean,s140,civil_comments_clean
python -m task_alignment.plot curve --metric=probing,clust_auc_1,clust_dbi --dataset=imdb,wiki_toxic_clean,s140,civil_comments_clean
⚠️ Note: Random seeds may differ from the original paper, so results may exhibit some statistical variation. However, the key qualitative trends should still be observable: such as the strong correlation between ALC and task alignment (THAS), and the weak correlation between ALC and ADBI.
Please cite our paper if you find our work useful for your research:
@inproceedings{gonzalez-gutierrez-etal-2023-analyzing,
title = "Analyzing Text Representations by Measuring Task Alignment",
author = "Gonzalez-Gutierrez, Cesar and
Primadhanty, Audi and
Cazzaro, Francesco and
Quattoni, Ariadna",
editor = "Rogers, Anna and
Boyd-Graber, Jordan and
Okazaki, Naoaki",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-short.7/",
doi = "10.18653/v1/2023.acl-short.7",
pages = "70--81"
}