Agent Benchmark – Example Task

This repository contains an example task used in AIMultiple’s agentic benchmarks.

The full task specification is available in task-6-web.md.

Purpose

This task is designed to evaluate autonomous coding agents on:

Agents are expected to complete the task in a one-shot setting without human intervention.

Full benchmark results and methodologies:

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md
task-6-web.md		task-6-web.md
task-6.yaml		task-6.yaml