Skip to content
Software behind tracker.tinyarchive.org - Warning: Very hacky code
Python PHP JavaScript CSS
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
tinyarchive
tinyback @ 15807ff
tracker
trim-old
.gitmodules
LICENSE
README.md
cleanup.py
code_to_file.json
create_release.py
create_trim-old_db.py
fetch_finished.py
import.py
import_tnyim.py
redo.py
release_import.py
stats.py
task_create.py
twitter_spritzer_import.py

README.md

Introduction

The tinyarchive repository is a loose collection of scripts to help with backing up URL shorteners. Most scripts are written in Python.

Concepts

Tinyarchive database

The very core of the whole thing. It consists of multiple Berkely DB B-Tree databases that contain mappings from short url codes to long URLs. For each shortener there is one database. For example, the database bitly.db might contain the following mappings:

Tracker

The tracker is a completely separate application that hands out tasks to tinyback instances.

trim-old

When tr.im shut down, part of it's database was preserved. In 2013 tr.im was relaunched by Matthew Kelly, but all the old shortlinks were lost. With a little magic, it was possible to refill the new tr.im database with links from the old tr.im database. One such magic trick is trim-old.tinyarchive.org: Since tr.im had trouble with some URLs (for whatever reason), instead of directly linking to the URL, it was created to redirect to trim-old.tinyarchive.org/$UUID and then is redirected to the real URL from there.

Scripts

Database scripts

create_release.py

Creates a new release from the database. By specifying the location of a previous release, the create_release.py script can check which files have not changed and avoid recompressing them, which would waste time and possibly change their hashsum. The code_to_file.json file is used to map from a shortener name and code to a specific output file.

create_trim-old_db.py

Creates the sqlite3 database used by the trim-old website.

import.py

Imports finished tasks from the tracker into the database.

import_tnyim.py

One-off script to import CSV dumps from the URL shortener at tny.im.

release_import.py

Opposite of create_release.py: Takes a release and imports it into the database, using the code_to_file.json file to map from input file to URL shortener name.

stats.py

Outputs a JSON structure containing a mapping from URL shortener name to number of shorturls in the database.

Tracker scripts

cleanup.py

Calls the tracker's cleanup admin function, which removes finished tasks and resets assignments for tasks assigned over 30 minutes ago.

fetch_finished.py

Fetches a list of finished tasks from the tracker, then for each task first downloads the payload and then tells the tracker to mark the task as deleted. For each task, a JSON file with the task metadata and a corresponding txt.gz with the payload is stored in the output directory.

redo.py

Takes a JSON file containing task metadata and registers a new task with the same parameters at the tracker.

task_create.py

File with some helper functions to create new tasks at the tracker.

twitter_spritzer_import.py

Untested and unfinished tool to import the unrolled URLS from the Twitter spritzer provided by swebb.

You can’t perform that action at this time.