Skip to content

OpenBookPublishers/geturls

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

geturls

Extact all URLs from anchor and image tags within a html/xhtml page and its children.

Relative paths are prefixed with the root of the URL provided, i.e. full URLs are provided in all cases.

The URL provided must point to a file, so that this script can recursively obtain all the linked URLs.

Usage

Simply provide the URL of the page you would like to get URLs from, e.g.:

geturls https://www.openbookpublishers.com/htmlreader/978-1-78374-388-9/main.html

The main purpose of this script at OBP is to obtain all URLs needed to display our html books properly so that these can be submitted to the Wayback Machine.

About

Extact all URLs from anchor and image tags within a html/xhtml page and its children.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages