Skip to content

digipres/awesome-digital-preservation

Repository files navigation

Awesome Digital Preservation Awesome

Carefully curated list of awesome digital preservation resources.

This Awesome List is one a suite of community-owned resources for digital preservation. See digipres.org or the digipres.org discussion forum for more information.

Contributions are welcome. Please add links through pull requests, or create an issue to start a discussion. Please refer to CONTRIBUTING.md for detailed guidance. And if obsolescence claims something awesome, there's always the Archive.

The text of an annual reminder email about these resources is also held here, in reminder.md. This will be sent around various mailings list once per year, ahead of World Digital Preservation Day.

Contents

Get Started

Save Digital Stuff Right Now

Spotted digital data at risk, but don't know who can save it?

Learn About Digital Preservation

Find Formats

We need to understand the file formats of the resources we care for, and the software they depend on.

If you have good examples of digital resources and their risks, please consider adding them to a test corpus.

Experiment with Tools

There are a lot of tools out there (see the tools section below), but some tools are particularly great for early experimentation. These tools can be used right in your web browser, so you can get started without installing software locally.

Remote Services

These tools are accessed using your browser, and work by sending a copy of your files to a remote server.

In-Browser Tools

These tools run entirely in your web browser, so no data is sent anywhere.

  • Siegfried JS - This runs the Siegfried format identification tool on your files in your browser.
  • CyberChef - The Cyber Swiss Army Knife. Capable of running lots of basic data operations on text or files, including computing things like MD5 or SHA hashes.
  • warc-analyser - Proof-of-concept that analyses WARC files in your browser. See https://github.com/edsu/warc-analyzer for more information.

Engage Stakeholders

Become Part Of The Digital Preservation Community

Advance digital preservation by pooling our experience, sharing our stories and finding the answers to the big questions.

Store Digital Content

Create Preservation Metadata

Find Test Files

To improve our digital preservation tools, we need to be able to test them and evaluate of their performance. Publicly available sample files make this much easier. Tool developers can use them to test their work, discover bugs, and hone their tools ready for others to use. A test corpus can contain real digital objects from a collection, or be created specifically for exhibiting certain characteristics for testing purposes. Real data, particularly with examples of broken, badly formed or corrupted files can be particularly useful.

Multi-format Corpora

Format-specific Corpora

PDF

ePub

TIFF

JPEG2000

Web Archives

Databases

Building Corpora

If the existing corpora aren't cutting it, perhaps you can contribute to the OPF Format Corpus hosted on GitHub. There's a guide here on how to contribute (archived version) or you can contact OPF for help on how to get involved.

Sourcing test files from web archives

Web archives can provide a useful source of files of particular formats. For example, search via the UKWA interface. Note that UKWA is offline at present.

Find More Tools

Software tools give us the means the interrogate, manipulate, understand and ultimately preserve our digital data. The Community Owned digital Preservation Tool Registry, COPTR has unified five isolated tool registries. It provides an easy-to-edit wiki interface where we can share our knowledge about, and experiences with, tools used for digital preservation purposes.

Build Workflows

Resources to help build up preservation workflows, e.g. templates for how to use command-line tools, and how to chain things together.

Improve The Tools

Contributing to the development and improvement of tools is easy, even if you're not technical. Check out this guide to making small documentation edits, or raising issues on GitHub

Improving Identification

Identifying file formats is the bread and butter of digital preservation characterisation and assessment. Identification tool coverage and accuracy could be much better, and this primarily comes down to the signatures, or file format "magic", used to identify each format. You can help contribute and make our identification tools more effective here:

Improving Characterisation/Metadata Extraction

Deep file characterisation enables validation, identification of preservation risks and extraction of metadata. In developing a new characterisation capability, begin with thorough research to identify existing code to re-use or build on, develop a focused command line tool, then consider turning it into a JHOVE module.