This repository contains the datasets for the paper "Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams" (arxiv), accepted to ACL 2026 Main.
- OAKS-BABI (OAKS-B): A synthetic dataset derived from the BABILong benchmark. Questions focus on tracking, counting, bridge, and comparison across evolving facts. Contains 1.2k questions.
- OAKS-Novel (OAKS-N): A human-curated dataset sourced from 19 public domain novels with rich narratives and dynamically interacting characters. Contains 870 multiple-choice questions (avg. 5.5 options).
Code to run the OAKS evaluation and a detailed explanation of the OAKS datasets can be found on our project page.
