Modular Training for Transformer Compression

This repository houses the source code for the paper "Modular Training for Transformer Compression". Modular training is an approach to transformer compression that trains low rank submodules in isolation before integrating them into a full model. This model achieves 31% compression and 2.5x inference speedup (on CPU) over its baseline DistilBERT while retaining 99% of it's task performance.

Setup

All libraries required for running this project are specified in the file install.sh. The script can be run directly if a virtual environment is presented in the parent directory and the code is being run in a slurm cluster. Note that the transformers library is installed from source (version 4.42.0.dev0) but can be installed directly as well.

Reproducing results.

The project pipeline consists of several stages.

capture_data.py passes a dataset through the baseline model and captures intermediate activations, storing them on disk.
mha_modular.py and ffn_modular.py uses the generated data to train low rank versions of MHA and FFN blocks
run_superglue.py integrates the trained submodules, fine tunes, and evaluates them on the specific dataset

A full end-to-end run can be triggered by using the script ./modular_pipeline.sh. This only evaluates the model on the dataset for one seed. The script ./rs\_moded.sh can be used to evaluate the model on 5 seeds and the python script consolidate.pycan be used to tabulate the median of 5 results.

Results

+---------------------------------+---------+-------------+--------+-------+-------+-------+-------------+-------------+-----------------+
| Model                           |   boolq | cb          |   copa |   rte |   wic |   wsc | stsb        | mrpc        |   Average Score |
+=================================+=========+=============+========+=======+=======+=======+=============+=============+=================+
| bert-base-uncased               |   74.46 | 73.21/51.09 |     63 | 70.04 | 66.61 | 63.46 | 89.42/88.91 | 85.29/89.76 |           73.19 |
+---------------------------------+---------+-------------+--------+-------+-------+-------+-------------+-------------+-----------------+
| distilbert-base-uncased         |   73.18 | 80.36/66.39 |     57 | 63.54 | 65.2  | 64.42 | 86.81/86.61 | 85.78/89.97 |           72.04 |
+---------------------------------+---------+-------------+--------+-------+-------+-------+-------------+-------------+-----------------+
| moddistilbert-base-uncased      |   72.75 | 80.36/66.31 |     58 | 63.9  | 63.48 | 63.46 | 86.05/85.73 | 86.52/90.79 |           71.81 |
+---------------------------------+---------+-------------+--------+-------+-------+-------+-------------+-------------+-----------------+
| t5-small                        |   67.25 | 64.29/44.82 |     49 | 58.84 | 63.79 | 64.42 | 84.49/84.56 | 83.82/88.74 |           66.99 |
+---------------------------------+---------+-------------+--------+-------+-------+-------+-------------+-------------+-----------------+
| squeezebert/squeezebert-uncased |   75.35 | 67.86/47.23 |     57 | 70.04 | 65.2  | 64.42 | 88.73/88.34 | 86.27/90.14 |           71.86 |
+---------------------------------+---------+-------------+--------+-------+-------+-------+-------------+-------------+-----------------+

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
low_rank_modules		low_rank_modules
.gitignore		.gitignore
README.md		README.md
alliance-instr.txt		alliance-instr.txt
capture_data.py		capture_data.py
consolidate.py		consolidate.py
data_recovery.py		data_recovery.py
dis_rg_nt.py		dis_rg_nt.py
ffn_modular.py		ffn_modular.py
ffn_modular_l2.py		ffn_modular_l2.py
file_check.py		file_check.py
install.sh		install.sh
loss_landscape_run.py		loss_landscape_run.py
mha_modular.py		mha_modular.py
mha_modular_l2.py		mha_modular_l2.py
modular_pipeline.sh		modular_pipeline.sh
no_mod_res.txt		no_mod_res.txt
nores.txt		nores.txt
p_modular_pipeline.sh		p_modular_pipeline.sh
pipeline.sh		pipeline.sh
plot_loss.py		plot_loss.py
req.txt		req.txt
rs_moded.sh		rs_moded.sh
run_baseline.sh		run_baseline.sh
run_loss.sh		run_loss.sh
run_speed.py		run_speed.py
run_superglue.py		run_superglue.py
run_superglue_baselines.py		run_superglue_baselines.py
run_superglue_moded.py		run_superglue_moded.py
run_superglue_no_trainer.py		run_superglue_no_trainer.py
speed.sh		speed.sh
temp.log		temp.log
test.py		test.py
test.sh		test.sh
tutorial.py		tutorial.py
viz2d_loss_landscape.py		viz2d_loss_landscape.py
viz_loss.sh		viz_loss.sh
viz_loss_landscape.py		viz_loss_landscape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Modular Training for Transformer Compression

Setup

Reproducing results.

Results

About

Uh oh!

Releases

Packages

Languages

SDMuhsin/Modular-Training

Folders and files

Latest commit

History

Repository files navigation

Modular Training for Transformer Compression

Setup

Reproducing results.

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages