Skip to content
Hydriz Scholz edited this page Aug 6, 2016 · 4 revisions

Balchivist is a Python library for archiving datasets to the Internet Archive. It automates the process of gathering a list of data available for a type of dataset and uploading them to the Internet Archive, using an SQL database as a backend for storing the list.

Usage

Balchivist alone only provides the infrastructure for accessing the backend SQL database and the tools for interacting with the Internet Archive, along with other supporting modules such as a processing daemon, etc. The actual data processing is implemented via individual modules (available under the modules directory), making it easy to extend the library for working on other datasets.

Installation

Clone this repository using Git and run the following command in the Balchivist root directory:

pip install -r requirements.txt

Likewise, install the MySQL server and the python-mysqldb library so as to provide the SQL database functionality. Update settings.conf accordingly, using settings.conf.example as a template.

Issues

Encountered any issues? Feel free to report them on the issues tracker.

Clone this wiki locally