ChildAgentEval: Evaluating Cognitive Age Alignment in Interactive AI Agents

Yifan Shen^1,2,*, Jiawen Zhang^2,4,*, Jian Xu², Junho Kim², Ismini Lourentzou², Xu Cao^1,2,†, Meihuan Huang^1,3,5,†

Overview

ChildAgentEval is an interactive evaluation framework for studying whether multimodal large language model (MLLM) agents can align their reasoning, memory, language, and error patterns with human developmental stages.

The benchmark is inspired by the Wechsler Intelligence Scale for Children (WISC), but it does not reproduce protected clinical test items. Instead, it translates psychometrically grounded cognitive constructs into web-based interactive tasks that agents must complete through browser actions such as clicking, selecting, typing, and responding under time or memory constraints.

Why Cognitive Age Alignment?

Most agent benchmarks reward maximal task performance. Child-facing AI systems require a different evaluation target: developmental appropriateness. A tutor or assistant that gives technically correct but adult-level explanations may fail to meet the needs of a younger user.

ChildAgentEval studies cognitive age alignment: whether an agent can exhibit age-appropriate behavior across language abstraction, working memory, visual and fluid reasoning, processing speed, and social explanation style.

Benchmark Design

ChildAgentEval is designed as a model-agnostic evaluation infrastructure rather than a static public dataset. It administers controlled web tasks, records detailed interaction traces, and reports both task-level and factor-level results.

Cognitive factor	Public high-level description
Gc	Verbal abstraction, vocabulary, and comprehension
Gf/Gv	Fluid reasoning, visual reasoning, and spatial problem solving
WM	Information retention and manipulation across interaction steps
PSI	Time-constrained visual-symbolic execution and response efficiency

The paper evaluates agents across representative developmental anchors: ages 7, 10, 13, and 16. The skill-guided setting uses age bands of 6-8, 9-11, 12-14, and 15-17.

This figure is a public overview from the paper. It is not a release of the complete item set, protected administration protocol, answer keys, or scoring materials.

Age-Specific Cognitive Skill Distillation

To avoid relying on subjective role prompts such as "act like a child," ChildAgentEval introduces a data-grounded skill distillation pipeline. The method extracts developmental markers from age-stratified child and adolescent corpora, then distills them into structured cognitive skill cards.

These skill cards constrain the agent through modules for vocabulary abstraction, working memory, reasoning budget, visual reliance, and social perspective.

Main Findings

Standard age prompting does not reliably produce age-ordered behavior. General purpose agents tend to default to their strongest available capabilities, even when assigned a younger target age.

Skill-guided agents show clearer developmental differentiation in stronger models. In the reported experiments, targeted cognitive filters produce more monotonic score trajectories from younger to older age bands, especially in language-mediated dimensions.

Alignment remains uneven across cognitive domains. Language and crystallized knowledge are easier to calibrate, while working memory, perceptual reasoning, and processing speed remain harder to align with human developmental norms.

Evaluation Access

Because parts of the evaluation protocol are derived from or constrained by copyrighted Wechsler scale materials, we cannot publicly release the complete evaluation protocol, protected item administration details, answer keys, scoring rubrics, or materials that could be used to reconstruct the original clinical assessment.

If you would like to evaluate your model with ChildAgentEval, please contact:

yifan26@illinois.edu

Material	Public status
Paper summary and figures	Released in this repository
High-level benchmark design	Released in this repository
Complete evaluation protocol	Restricted
Protected item content and answer keys	Restricted
External model evaluation	Available by controlled request

We can coordinate a controlled evaluation of your model under the standardized ChildAgentEval environment. Depending on your setup, this may involve an API endpoint, hosted model access, or a secure checkpoint-sharing arrangement. Please do not post API keys, private model weights, or credentials in GitHub issues.

See docs/evaluation_access.md for the recommended request format.

Citation

If you use ChildAgentEval or discuss the benchmark in your work, please cite:

@misc{shen2026childagenteval,
  title        = {Evaluating Cognitive Age Alignment in Interactive AI Agents},
  author       = {Yifan Shen and Jiawen Zhang and Jian Xu and Junho Kim and Ismini Lourentzou and Xu Cao and Meihuan Huang},
  year         = {2026},
  note         = {Preprint}
}

Contact

For evaluation requests and collaboration inquiries, contact yifan26@illinois.edu.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
docs		docs
CITATION.bib		CITATION.bib
CITATION.cff		CITATION.cff
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChildAgentEval: Evaluating Cognitive Age Alignment in Interactive AI Agents

Overview

Why Cognitive Age Alignment?

Benchmark Design

Age-Specific Cognitive Skill Distillation

Main Findings

Evaluation Access

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ChildAgentEval: Evaluating Cognitive Age Alignment in Interactive AI Agents

Overview

Why Cognitive Age Alignment?

Benchmark Design

Age-Specific Cognitive Skill Distillation

Main Findings

Evaluation Access

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages