FURLAT: Find URL Archiving Tool
Furlat is a tool and library that discovers and analyzes URL shortcodes generated by URL shorteners.
You will need:
- Python 3.2 or greater
- Selenium (Python 3 Package)
You can install the dependent Python packages using
pip. For example on Ubuntu:
pip3 install selenium
You can run the package as a script:
python3 -m furlat find bit.ly --verbose
To just search Twitter:
python3 -m furlat find bit.ly --verbose --source twitter
--help to see details about arguments.
Results are currently stored into a text file. For example, if you run bit.ly, a folder called
bitly will be created with the text files inside the folder. The text files contain the discovered URLs.
Infinitely running commands check for a sentinel file called
STOP. If the modified file is newly modified or created after starting the command, the command will stop gracefully:
- Print statistics about the URL shortcodes
- Launch a find URL project
- Sort the URLs by length, then value
The library is not yet stable as an API, but you can read the
__main__.py file to get a overview of how it works.
The goal of Furlat is to find valid shortcodes as much as possible, without brute-force discovery, using 3rd party sources such as search engines and microblogs.
- Homepage: https://github.com/chfoo/furlat
- Chat: irc://irc.efnet.org/archiveteam-bs (I'll be on #archiveteam-bs on EFnet)
The unit tests can be run with
This software is currently in experimental-but-could-be-useful state.
- Launching a real web browser.
- Searching through Google, Yahoo, Bing, and Twitter.
- Random keyword search term generation using word lists and MediaWiki page title dump files.
- Searching Identica
- Nicer result output options
- Configurable options such as fetch rate and number of jobs run concurrently
- Travis CI setup
- PyPI and other websites setup
- Inline documentation
- Launching a fake web browser.