Find URL Archiving Tool. Furlat is a tool and library that discovers URL shortcodes generated by URL shorteners.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
furlat
.gitignore
COPYING.txt
README.rst
setup.py

README.rst

FURLAT: Find URL Archiving Tool

Furlat is a tool and library that discovers and analyzes URL shortcodes generated by URL shorteners.

Quick Start

Installation

You will need:

  • Python 3.2 or greater
  • Firefox
  • Selenium (Python 3 Package)

You can install the dependent Python packages using pip. For example on Ubuntu:

pip3 install selenium

Running

You can run the package as a script:

python3 -m furlat find bit.ly --verbose

To just search Twitter:

python3 -m furlat find bit.ly --verbose --source twitter

Use the --help to see details about arguments.

Results are currently stored into a text file. For example, if you run bit.ly, a folder called bitly will be created with the text files inside the folder. The text files contain the discovered URLs.

Infinitely running commands check for a sentinel file called STOP. If the modified file is newly modified or created after starting the command, the command will stop gracefully:

touch STOP

Commands

analyze
Print statistics about the URL shortcodes
find
Launch a find URL project
sort
Sort the URLs by length, then value

Library

The library is not yet stable as an API, but you can read the __main__.py file to get a overview of how it works.

About

The goal of Furlat is to find valid shortcodes as much as possible, without brute-force discovery, using 3rd party sources such as search engines and microblogs.

Links

  • Chat: irc://irc.efnet.org/archiveteam-bs (I'll be on #archiveteam-bs on EFnet)

Testing

The unit tests can be run with nosetests:

nosetests3

Roadmap

This software is currently in experimental-but-could-be-useful state.

What's Available

  • Launching a real web browser.
  • Searching through Google, Yahoo, Bing, and Twitter.
  • Random keyword search term generation using word lists and MediaWiki page title dump files.

What's To-Do

  • Searching Identica
  • Nicer result output options
  • Configurable options such as fetch rate and number of jobs run concurrently
  • Travis CI setup
  • PyPI and other websites setup
  • Inline documentation
  • Launching a fake web browser.

See also