Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

Mind the Gap: Assessing Temporal Generalization in Neural Language Models

This repository contains the dataset splits used in Mind the Gap: Assessing Temporal Generalization in Neural Language Models (Lazaridou, Kuncoro, Gribovskaya et al., 2021).

Datasets

We provide splits of two public datasets used in the paper: WMT News Crawl and arXiv abstracts.

Each subset is stored on Google Cloud Storage as gzipped text file specifying publication dates (in the YYYYMMDD format) and IDs of documents contained in the subset.

arXiv

arXiv abstracts and publication dates were obtained through arXiv's OAI-PMH service on January 2, 2021. We used the value in the created field as the article's publication date. The arXiv dataset can also be downloaded from Kaggle.

WMT

We downloaded document-split versions of the English and German WMT News Crawl dataset. As the dataset does not provide document IDs, we used SHA256 hashes of the Base64 encoded unsplit texts of articles as their IDs, i.e.:

import gzip
import hashlib

with gzip.open('news-docs.2007.en.filtered.gz', 'rb') as gz_file:
  for line in gz_file:
    date, sentence_split_text, unsplit_text = line.decode('utf-8').strip().split('\t')
    docid = hashlib.sha256(unsplit_text.encode('utf-8')).hexdigest()
    yield docid, (date, sentence_split_text, unsplit_text)

We trained models on sentence split article texts. Some articles may appear multiple times in the dataset with different publication dates; we used each article's earliest publication date.

Splits used in experiments

Experiments	Dataset	Splits
Sections 3-5	WMT	control: train, validation time-stratified: train, validation test
	arXiv	control: train, validation time-stratified: train, validation test
Appendix B: The effect of outdated models persists beyond the 2018/2019 test period	WMT	test period 2017/2018: control: train, validation; time-stratified: train, validation; test test period 2016/2017: control: train, validation; time-stratified: train, validation; test test period 2015/2016: control: train, validation; time-stratified: train, validation; test test period 2014/2015: control: train, validation; time-stratified: train, validation; test test period 2013/2014: control: train, validation; time-stratified: train, validation; test
Appendix C: The effect of outdated models persists beyond the two-year gap	WMT	test: same as the one for Sections 3-5 validation: same as the one for the time-stratified setup for Sections 3-5 train until: 2017-09-30, 2017-03-31, 2016-09-30, 2016-03-31, 2015-09-30, 2015-03-31, 2014-09-30, 2014-03-31, 2013-09-30, 2013-03-31, 2012-09-30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pitfalls_static_language_models

pitfalls_static_language_models

README.md

Mind the Gap: Assessing Temporal Generalization in Neural Language Models

Datasets

arXiv

WMT

Splits used in experiments

Files

pitfalls_static_language_models

Directory actions

More options

Directory actions

More options

Latest commit

History

pitfalls_static_language_models

Folders and files

parent directory

README.md

Mind the Gap: Assessing Temporal Generalization in Neural Language Models

Datasets

arXiv

WMT

Splits used in experiments