TwentyQuestions is an environment for evaluating agents on playing the classic Twenty Questions game against an LLM gamemaster. This environment wraps the TwentyQuestions implementation from TextArena, a framework for text-based game environments.
- Strategic question formulation to narrow search space
- Binary search and deductive reasoning
- Information-efficient questioning strategies
- Testing knowledge representation and category understanding
TwentyQuestions does not require a sandbox. It has minimal compute requirements.
MIT.
There are two splits: train (300 tasks) and test (300 tasks). Each split contains 50 tasks across each of 6 variants:
- TwentyQuestions-v0
- TwentyQuestions-v0-train
- TwentyQuestions-v0-raw
- TwentyQuestions-v0-hardcore
- TwentyQuestions-v0-hardcore-train
- TwentyQuestions-v0-hardcore-raw
Each task is seeded for reproducibility.
This is a sparse reward environment. Rewards are mapped from TextArena's native range of {-1, 0, 1} to {0.0, 0.5, 1.0} via (raw + 1) / 2.
We do not use LLM graders for this environment; reward is determined programmatically.
Game state is generated procedurally by the TextArena engine using seeded randomness. No external data files are required.
Agents are given a single tool:
send_message(message): Ask a yes-or-no question, or guess the word with [answer].
TwentyQuestions is a multi-turn environment.
Medium. Twenty Questions requires strategic questioning, category reasoning, and efficient search space reduction. Success depends on formulating informative yes-or-no questions.
This environment requires an OpenAI API key (passed via secrets) to power the LLM gamemaster.
Agents in TwentyQuestions interact only with a word guessing game and have no access to external systems, the internet, or sensitive data. The environment does not present safety risks.
@software{textarena2024,
author = {Guertler, Leon and Banting, Wilfried and Pignatelli, Eduardo},
title = {TextArena},
year = {2024},
publisher = {GitHub},
url = {https://github.com/LeonGuertler/TextArena}
}