Embedding Recycling

Preprint on ArXiv: Embedding Recycling for Language Models

Datasets

To access our datasets for text classification, please go to the folder titled text_classification.

To access our datasets for named-entity recognition (NER), please go to the folder titled ner.

To access our datasets for question answering (QA), please access the TriviaQA and SQuAD datasets on HuggingFace at the following links:

Setup Environment

Run the following commands to setup a conda environment:

conda create --name embedding_recycling --file requirements.txt
conda activate embedding_recycling

Experiment Replication

Standard Embedding Recycling

To replicate our results for standard embedding recycling, use the conda environment listed above and run the following scripts for each dataset group:

For text classification, use the TextClassificationScripts/GeneralLinearClassifier_PaperResults.py script
For NER, use the NER_Scripts/General_NER_Classifier_PaperResults.py script
For QA, use the QA_Scripts/GeneralQuestionAnswering_PaperResults.py script to replicate the TriviaQA results. For TriviaQA, use the QA_Scripts/PrepareTriviaQADataset.py for preprocessing the TriviaQA dataset using the selected model. For SQuAD, please use the "run_squad.py" script included on the HuggingFace Transformers repository, which we also include in our main directory.

The hyperparameters for replicating each experiment are included in the HyperparameterSelection folder.

Adapter-Based Embedding Recycling

To replicate our results for adapter-based embedding recycling, use the conda environment listed above and run the following scripts for each dataset group:

For text classification, use the TextClassificationScripts/Adapters_PaperResults.py script
For NER, use the NER_Scripts/Adapters_NER_PaperResults.py script
For QA, use the QA_Scripts/Adapters_QA_PaperResults.py script to replicate the TriviaQA results. For TriviaQA, use the QA_Scripts/PrepareTriviaQADataset.py for preprocessing the TriviaQA dataset using the selected model. For SQuAD, please use the "run_squad.py" script included on the HuggingFace Transformers repository, which we also include in our main directory.

Citing

@misc{https://doi.org/10.48550/arxiv.2207.04993,
  doi = {10.48550/ARXIV.2207.04993},
  url = {https://arxiv.org/abs/2207.04993},
  author = {Saad-Falcon, Jon and Singh, Amanpreet and Soldaini, Luca and D'Arcy, Mike and Cohan, Arman and Downey, Doug},
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Embedding Recycling for Language Models},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

EmbeddingRecycling is an open-source project developed by the Allen Institute for Artificial Intelligence (AI2). AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
Hyperparameter_Selection		Hyperparameter_Selection
NER_Scripts		NER_Scripts
NLI_Scripts		NLI_Scripts
OldScripts		OldScripts
QA_Scripts		QA_Scripts
TextClassificationScripts		TextClassificationScripts
ner		ner
recycling-code-demo		recycling-code-demo
testing_scripts		testing_scripts
text_classification		text_classification
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
modeling_distilbert.py		modeling_distilbert.py
pytorchtools.py		pytorchtools.py
requirements.txt		requirements.txt
run_squad.py		run_squad.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Embedding Recycling

Datasets

Setup Environment

Experiment Replication

Standard Embedding Recycling

Adapter-Based Embedding Recycling

Citing

About

Releases

Packages

Contributors 2

Languages

License

allenai/EmbeddingRecycling

Folders and files

Latest commit

History

Repository files navigation

Embedding Recycling

Datasets

Setup Environment

Experiment Replication

Standard Embedding Recycling

Adapter-Based Embedding Recycling

Citing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages