Skip to content

PlaywrightCrawler extract_links doesn't account for base href. #1589

@phughesion-h3

Description

@phughesion-h3

Title. If the page contains a <base href="anything">, then that url should be used as the base for all relative urls, not the current page.

https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/base

Metadata

Metadata

Assignees

Labels

bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions