Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
211 lines (160 sloc) 18.8 KB
title subtitle layout
DigiPres Commons
Community-owned digital preservation resources
frontpage

Digital Preservation Community Resources

{:.no_toc}

  • TOC {:toc}

Get Started

Save Digital Stuff Right Now

Spotted digital data at risk, but don't know who can save it?

Preserve Your Own Stuff

Become Part Of The Digital Preservation Community

Advance digital preservation by pooling our experience, sharing our war stories and finding the answers to the big questions.

Real Data and Requirements

Real data, real challenges and real requirements make your and others digital preservation developments far more useful and effective.

Test Corpora

To improve our digital preservation tools, we need to be able to test them and evaluate of their performance. Publicly available sample files make this much easier. Tool developers can use them to test their work, discover bugs, and hone their tools ready for others to use. A test corpus can contain real digital objects from a collection, or be created specifically for exhibiting certain characteristics for testing purposes. Real data, particularly with examples of broken, badly formed or corrupted files can be particularly useful.

Note that OPF also has it's own corpora page.

Multi-format Corpora

Format-specific Corpora

Building Corpora

If the existing corpora aren't cutting it, perhaps you can contribute to the OPF Format Corpus (hosted on GitHub). There's a guide here on how to contribute or you can contact OPF for help on how to get involved.

Sourcing test files from web archives

Web archives can provide a useful source of files of particular formats. For example, search via the UKWA interface.

Tools

Software tools give us the means the interrogate, manipulate, understand and ultimately preserve our digital data.

Building Workflows

Resources to help build up preservation workflows, e.g. templates for how to use command-line tools, and how to chain things together.

Understanding Formats

We need to understand the file formats of the resources we care for, and the software they depend on.

Improving Identification

Identifying file formats is the bread and butter of digital preservation characterisation and assessment. Identification tool coverage and accuracy could be much better, and this primarily comes down to the signatures, or file format "magic", used to identify each format. You can help contribute and make our identification tools more effective here:

If you want to start to put this into practice you can identify file formats right now (with no installation or setup) using FIDOO or alternatively check out stand alone file format identification tools.

Improving Characterisation/Metadata Extraction

Deep file characterisation enables validation, identification of preservation risks and extraction of metadata. In developing a new characterisation capability, begin with thorough research to identify existing code to re-use or build on, develop a focused command line tool, then consider turning it into a JHOVE module.