Spec-conformant HTML pages with properly-encoded multi-parameter URLs hand the agent a broken URL

### Bug
`_extract_links` in `web/fetcher.py` reads `href` attribute values directly from raw HTML without decoding HTML entities. Because valid HTML requires `&` inside attribute values to be written as `&amp;`, a page containing something as:
```html
<a href="https://example.com/search?q=foo&amp;lang=en">Search</a>
```

causes the fetcher to surface `https://example.com/search?q=foo&amp;lang=en` as a link. When the web agent later calls fetch_url with that string, the request is sent with a literal `&amp;` in the query string, which many servers either reject or silently misparse, so the agent ends up fetching the wrong page or getting a 400.

Repro
Any real-world page whose query-string links are correctly HTML-encoded (which is the spec-required form) triggers this. For example, Google Search results, GitHub search pages, and most CMS-generated pages encode & as &amp; in href attributes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spec-conformant HTML pages with properly-encoded multi-parameter URLs hand the agent a broken URL #170

Bug

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Spec-conformant HTML pages with properly-encoded multi-parameter URLs hand the agent a broken URL #170

Description

Bug

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions