llms.txt aggregate walker only descends one level, undercounts deeply-nested indexes


## Context

The aggregate `.txt` walker in `src/helpers/get-page-urls.ts` (`walkAggregateLinks`) only descends one level into nested `llms.txt` indexes. Sub-link `.txt` references at depth 2+ are explicitly filtered out (see “skip further `.txt` nesting” comment in the code).

This works for two-level patterns (Cloudflare per-product files, Supabase aggregate content files) but undercounts sites that use deeper progressive disclosure — which the spec encourages for large sites that would otherwise exceed `llms-txt-size`.

The visible symptom is `llms-txt-freshness` reporting very low coverage on sites whose `llms.txt` is actually exhaustive — the walker just isn’t reaching the leaves.

## Concrete example

**Site:** `alchemy.com/docs`

Three-level structure (correctly organized per spec — sections split because a unified file would exceed `llms-txt-size`):

```text id="6o9p2e"
/docs/llms.txt                           # 6 section links, all .txt
  └─ /docs/get-started/llms.txt          # .md page links     [walked]
  └─ /docs/node/llms.txt                 # .md page links     [walked]
  └─ /docs/data/llms.txt                 # .md page links     [walked]
  └─ /docs/wallets/llms.txt              # .md page links     [walked]
  └─ /docs/rollups/llms.txt              # .md page links     [walked]
  └─ /docs/chains/llms.txt               # ~80 chain .txt links [walked, children dropped]
      └─ /docs/chains/ethereum/llms.txt  # eth_call, eth_chainId, …  [NOT REACHED]
      └─ /docs/chains/solana/llms.txt    # getBalance, getSlot, …    [NOT REACHED]
      └─ … 80 more chains                                            [NOT REACHED]
```

Five sections have a flat layout, so their pages are counted (~311 page URLs total). The **Chains** section has another nesting level for per-chain files, and all ~5,100 method pages live at depth 2. Those never make it into the URL pool.

Verbose `afdocs` output:

```text id="jmdw8f"
✗ llms-txt-freshness: llms.txt covers 311/5452 sitemap doc pages (6%); 5141 missing
      Fix: Your llms.txt covers less than 80% of your site's pages. ...
```

The 5,141 “missing” pages are mostly chain-specific RPC method docs:

* `/docs/chains/ethereum/ethereum-api-endpoints/eth-call`
* `/docs/chains/solana/solana-api-endpoints/get-balance`
* …

These *are* in the `llms.txt` tree — just one level deeper than the walker currently explores.

This also biases sampling for any check that flows through `getUrlsFromCachedLlmsTxt`:

* `markdown-content-parity`
* `llms-txt-directive`
* etc.

All sample only from the 5 flat sections, never from the ~5,100 chain method pages.

## Suggested behaviors (in priority order)

1. **Walk recursively, with bounded depth and file count**

   * Stop at e.g. depth = 5 and ≤200 total fetches
   * Deduplicate visited `.txt` URLs so cycles and shared sub-indexes don’t cause repeated fetches
   * Current behavior is effectively “depth = 1 hardcoded”

2. **Treat the aggregate walk uniformly**
   The seed URLs from the canonical `llms.txt` and discovered sub-links should go through the same classification logic (page URL vs aggregate-to-walk), instead of the current setup where the inner loop uses a different filter than the outer one.

3. **Surface the walked tree in `details`**
   When verbose:

   * List which aggregate files were fetched
   * Include depth information
   * Show whether safety caps were hit

## Workarounds tried

* None viable from the user side.
  The site’s `llms.txt` is structured exactly as the spec recommends; the bug is on the consumer (`afdocs`) side.

* Flattening `llms.txt` is technically possible, but defeats the purpose of the size-based recommendation to split files.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llms.txt aggregate walker only descends one level, undercounts deeply-nested indexes #57

Context

Concrete example

Suggested behaviors (in priority order)

Workarounds tried

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

llms.txt aggregate walker only descends one level, undercounts deeply-nested indexes #57

Description

Context

Concrete example

Suggested behaviors (in priority order)

Workarounds tried

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions