Skip to content

Xwarli/wget2-sitemap-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

wget2-sitemap-generator

Simple script to generate a text file (.txt) of URLs from a given website using the wget2 command (with mirror and spider opitions enabled).

---- USAGE ----

sudo python3 wget2-sitemap-generator.py

You will then be prompted to enter a valid URL

The script will run silently, and initially create a single "-log.txt" file. Once the website has been crawled, it will use this file to generate a "-sitemap.txt" file containing a list of URLs.

These may require further sorting as desired as they occationally have non-target URLs creep in; but in general it only grabs desired URLs. If they do, then use:

sudo python3 url-list-sorter.py

And then enter the URL list .txt file, and the desired target URL (i.e. https://example.com), and it'll create a "-sorted.txt" file which will only include all the lines that start with your desired URL.

Once the list has been generated, I'd strongly recommend passing every URL into Monolith for archiving. A simple script to achive this is here: https://github.com/Xwarli/urls-to-monolith/tree/main

About

Generate a list of URLs with wget2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages