This is the official implementation of Mango: Multi-Agent Web Navigation via Global-View Optimization (ACL 2026).
Mango is a web navigation framework that constructs a global view of a website's structure before navigation begins. It identifies query-relevant entry-point URLs using lightweight BFS crawling and Google Search, then models URL selection as a multi-armed bandit problem with Thompson Sampling to allocate the navigation budget efficiently. An episodic memory component prevents the agent from repeating unsuccessful actions across navigation attempts.
- Global Structure Analysis — BFS-crawls the target website and augments with Google Search results. Scores all candidate URLs with BM25 against the user query.
- MAB URL Selection — Models URL selection as a multi-armed bandit. Initializes Beta distribution priors from BM25 relevance scores, then uses Thompson Sampling to adaptively allocate the navigation budget.
- Web Navigation Agent — Navigates from the selected URL using a browser tool, then hands off to a reflection agent.
- Reflection Agent — Evaluates the navigation trajectory. Updates the bandit posterior and stores the trajectory in episodic memory.
pip install -r requirements.txt
playwright installCopy .env.example to .env and fill in your credentials:
cp .env.example .envKey variables:
| Variable | Description |
|---|---|
OPENAI_API_KEY |
Required for GPT models and agent tracing |
DASHSCOPE_API_KEY |
Required for Qwen3 models |
SEARCH_API_KEY |
Google Custom Search API key (for preprocessing) |
SEARCH_ENGINE_ID |
Google Custom Search Engine ID (for preprocessing) |
├── llm_web_scraper/
│ ├── navigator/ # Web navigation agent (Crawl4AI-based)
│ ├── prompts/ # LLM prompts (navigation, reflection, final answer)
│ ├── selector/ # Thompson Sampling and greedy URL selectors
│ └── url_preprocessing/ # URL candidate generation (crawl, Google search, random)
├── evaluation/
│ ├── datasets/ # WebVoyager dataset
│ ├── preprocess/ # Preprocessing scripts
│ └── scripts/ # Evaluation scripts (WebVoyager, WebWalkerQA)
└── requirements.txt
Evaluation runs in two steps: preprocessing (URL candidate generation) then navigation.
Generate the candidate URL sets for each task before running navigation:
# WebVoyager
bash evaluation/preprocess/webvoyager.sh
# WebWalkerQA
bash evaluation/preprocess/webwalker.shThis runs both the Google Search and BFS crawl preprocessing for all backbone models. Results are saved under evaluation_results/*/preprocess/.
To run a single benchmark/method/model:
python evaluation/preprocess/run_preprocessing.py \
--benchmark webvoyager \ # webvoyager | webwalker
--method google \ # google | crawl | random
--model gpt-5-mini# WebVoyager
bash evaluation/scripts/webvoyager/run.sh
# WebWalkerQA
bash evaluation/scripts/webwalker/run.shTo run a specific model or method:
python evaluation/scripts/webvoyager/run.py \
--model gpt-5-mini \
--methods ours google random \
--navigator simpleKey arguments:
| Argument | Default | Description |
|---|---|---|
--model |
gpt-5-mini |
Backbone LLM |
--methods |
ours google random |
URL selection strategies |
--navigator |
simple |
Browser environment (simple = Crawl4AI, mcp = Playwright) |
# WebVoyager
bash evaluation/scripts/webvoyager/evaluate.sh
# WebWalkerQA
bash evaluation/scripts/webwalker/evaluate.shbash evaluation/scripts/webwalker/run_webwalker_baseline.sh