Dynamic CAPTCHA generation, CAPTCHA benchmarks, unified evaluation framework, and trace generation pipelines for reasoning-action supervision.
Modern GUI agents can navigate websites, apps, and interfaces, but CAPTCHAs still break many real workflows. ReCAP-Agent is a practical stack for studying that failure mode end to end: generate CAPTCHA tasks, benchmark agents against them, and convert runs into training traces that support better reasoning and recovery behavior.
This repository brings together:
- dynamic CAPTCHA environment and benchmark with diverse interaction patterns;
- static real-world CAPTCHA benchmarks (contributed by Teoh et al.);
- direct reasoning-action trace generation;
- self-correction trace generation from failed attempts;
- cross-provider evaluation for multiple model families.
| Module | Purpose |
|---|---|
dynamic_captchas/ |
Dynamically generated CAPTCHA tasks used to probe transfer across layouts and interaction styles. |
halligan_captchas/ |
Static benchmark set based on real-world CAPTCHAs, contributed by Teoh et al. Included here for convenient local evaluation. |
captcha_eval_framework/ |
Unified benchmarking framework for running GUI agents across providers and model families. |
trace_generation/ |
Pipelines for generating direct traces, self-correction traces, and model-specific training data formats. |
The dynamic CAPTCHA system covers seven representative interactive types:
textcompact_texticon_matchicon_selectionpagedsliderimage_grid
These tasks collectively target four broad capabilities shown above:
- optical character recognition,
- continuous control,
- spatial localization, and
- visual-semantic comprehension.
cd dynamic_captchas
pip install -r requirements.txt
python download_datasets.py
python app.pycd halligan_captchas
conda env create --file environment.yml --name halligan-benchmark
conda activate halligan-benchmark
python server.pycd captcha_eval_framework
pip install -r requirements.txt
cp .env.example .env
python3 ./main.py --provider dynamic --test-mode once --model-family qwen3cd ..
pip install -r captcha_eval_framework/requirements.txt
python -m playwright install chromium
python -m trace_generation direct
python -m trace_generation self-correction
python -m trace_generation convertFor setup details, environment variables, and advanced usage, refer to the component READMEs linked above.
ReCAP-Agent/
├── dynamic_captchas/
├── halligan_captchas/
├── captcha_eval_framework/
├── trace_generation/
├── images/
└── README.md
- Dynamic CAPTCHA generation and verification server
- Static benchmark integration
- Unified cross-provider evaluation framework
- Trace generation module with direct and self-correction traces
Contributions are welcome.
- Fork the repository and create a branch for your change.
- Make the change with clear commits and any necessary documentation updates.
- Push your branch and open a pull request describing the motivation and behavior change.
This project is licensed under the MIT License. See the LICENSE file for details.

