Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.

Experimental Standards for Deep Learning in Natural Language Processing Research

This repository contains supplementary material from our paper of the same title. Since our paper can only capture the state of affairs at the time of publication, the idea here is to keep a more up-to-date version of the resources in the appendix here, and invite the community to collaborate in a transparent manner.

We maintain a version of Table 1 in the original paper, giving an overview over useful resources for different stages of the research process, namely Data, Codebase & Models, Experiments & Analysis, and Publication.

In, we distil the actionable points at the end of the core paper sections into a reusable and modifiable checklist to ensure replicability.

In, we transparently document changes to the repository and versioning. The current version is v0.1.

πŸŽ“ Citing

If you find the resources helpful or are using the checklist for one of your academic projects, please cite us in the following way:

title = "Experimental Standards for Deep Learning in Natural Language Processing Research",
author = {Ulmer, Dennis  and
  Bassignana, Elisa  and
  M{\"u}ller-Eberstein, Max  and
  Varab, Daniel  and
  Zhang, Mike  and
  van der Goot, Rob  and
  Hardmeier, Christian  and
  Plank, Barbara},
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "",
pages = "2673--2692",

In your paper, you could cite our work for instance as follows:

For our experimental design, we follow many of the guidelines laid out by \citet{ulmer2022experimental}.

🧩 Contributing

Contributing can come in two forms: Opening an issue to correct mistakes or improve the existing content, or adding new content by opening pull requests.

When opening an issue, please label the issue accordingly:

  • enhancement-resources for issues improving or correcting entries in
  • enhancement-standards for issues improving or correcting entries in
  • duplicate for indicating duplicate entries.
  • general for general questions / issues with the repository.

To contribute and add new content, please first check the file and read the contributing guideline before opening a pull request. Use the label

  • enhancement-resources for pull requests adding new resources and
  • enhancement-standards for pull requests adding new points to the checklist.

The pull request template can be checked under


We split up Table 1 from the paper into section specific resources below.

πŸ“Š Data

Name Description Link / Reference
Data Version Control (DVC) Command line tool to version datasets and models Link / Paper
Hugging Face datasets Hub to store and share (NLP) data set. Link / Paper
European Language Resources Association Public institution for language and evaluation resources. About / Link
LINDAT/CLARIN Open access to language resources and other data and services for the support of research in digital humanities and social sciences. Link Paper
Zenodo General-purpose open-access repository for research papers, data sets, research software,reports, and any other research related digital artifacts. Link

πŸ’» Codebase & Model

Name Description Link / Reference
Anonymous Github Website to double-anonymize a Github repository. Link
BitBucket A website and cloud-based service that helps developers store and manage their code, as well as track and control changes to their code. Link
Conda Open Source package management systemand environment management system. Link
codecarbon Python package estimating and tracking carbon emission of various kind of computer programs. Link
ONNX Open format built to represent Machine Learning models. Link
Pipenv Virtual environment for managing Python packages. Link
Releasing Research Code Github repository including many tips and templates for releasing research code Β Link
Virtualenv Tool to create isolated Python environments. Link

πŸ”¬ Experiments & Analysis

Name Description Link / Reference
baycomp Python implementation of Bayesian tests for the comparison of classifiers. Link / Paper
BayesianTestML As baycomp, but also including Julia and R implementations. Link / Paper
confidenceinterval Python package that computes confidence intervals for common evaluation metrics. Link
deep-significance Python package implementing the ASO test by Dror et al. (2019) and other utilities. Link
HyBayes Python package implementing a variety of frequentist and Bayesian significance tests. Link
Hugging Face evaluate Library that implements standardized versions of evaluation metrics and significance tests Link
pingouin Python package implementing various parametric and non-parametric statistical tests. Link / Paper
Protocol buffers Data structure for model predictions Link
RankingNLPSystems Python package to create a fair global ranking of models across multiple tasks (and eval metrics) Link / Paper

πŸ“„ Publication

Name Description Link / Reference
dlpd Computer science bibliography to find correct versions of papers. Link
impact Online calculator of carbon emissions based on GPU type Link / Paper
Google scholar Scientific publication search engine. Link
Semantic Scholar Scientific publication search engine. Link
rebiber Python tool to check and normalize the bib entries to the official published versions of the cited papers. Link