• A project to add site crawling, file normalization, natural language processing and increased scalability to the current Cinch project. Cinch is a project to develop a bulk download service to a central repository that will maintain original file timestamps, virus check, extract file level metadata, create file checksums and periodically validat…

    Ruby 2 3 Unlicense Updated Dec 6, 2012
  • A project to develop a bulk download service to a central repository that will maintain original file timestamps, virus check, extract file level metadata, create file checksums and periodically validate checksums for continued file integrity. Users merely need to upload a list of URLs to download and when the process completes they can download…

    PHP 12 5 Updated Dec 6, 2012
  • A project to merge and wrap content from the Internet Archive and CONTENTdm using OAI-PMH harvesting as well as the CONTENTdm API. The goal is to amalgamate disparate content and make it full text searchable using Apache Solr (currently uses the Zend Search Lucene library).

    PHP 1 Updated Nov 20, 2012
  • Simple constraint analysis tool for Archive-it crawls.

    PHP 5 Unlicense Updated Aug 22, 2012