title	category	layout	docs_css
Digital Preservation	Research-Data-Management	default	markdown

Definition

Digital preservation means taking certain measures to ensure that digital material can be found and can be accessed in the long term ("long-term accessibility of data"). It aims to preserve information in a way that is understandable and reusable for a specific community and to prove its authenticity.

Digital preservation for researchers

The sustainable handling of data by researchers naturally facilitates the long-term accessibility of data. Best practice methods are:

Cleaning data / data structures - see also: Data Organisation
Validating data - see also: Data Quality Control
Documenting data with metadata and context information to ensure reusability: commenting, adding descriptive, administrative and technical metadata, asigning user license.
Using well-known open file formats during the project phase - see below - or transfering data into reusable file formats (needs documenting: original file or derivative)
Storing data following the 3-2-1 rule:
- Keeping 3 copies of any important file
- Storing files on 2 different media types
- Keeping at least 1 copy off site.

Data selection

To decide well-founded on data selection we recommend reading the how-to guide of the Edinburgh Digital Curation Centre {% cite dcc_five_2014 %}. The suggested steps are:

Step 1: Identify purposes that the data could fulfill: consider the purpose or ‘reuse case’ of your data, including reuse outside your research group.
Step 2: Identify data that must be kept: consider legal or policy compliance risks, as well as funder requirements.
Step 3: Identify data that should be kept: as it may have long-term value.
Step 4: Weigh up the costs and identify any need for external advice in case of shortfall in the budget.
Step 5: Complete the data appraisal, i.e. list what data must, should or could be kept to fulfill which potential reuse purposes. Summarize any actions needed to prepare the data for deposit - or justification for not keeping it.

Recommended file formats for preservation

Making your research available in recommended file formats additional to the original software format supports highly the reusability and long-term accessibility of your data. Attributes of those file formats are:

Open rather than proprietary (examples for open files formats)
Well-documented
In widespread use
Simple (e.g. csv rather than xlsx)
Text-based (i.e. any file you can open with a text editor and read) rather than binary (e.g. txt files rather than doc files)
Exportable to / unpackable into an open format (e.g. xlsx, docx, etc. can be unpacked into folders of xml files)
Machine-readable

For biomaterial data, recommended formats are CSV, TXT and XML.

Digital preservation for repository operators

Specific preservation measures depend on the digital objects, needs of the user community, and various other conditions. Repositories usually contain publications as files, making file format identification and validation relevant.

Bitstream preservation

Preservation on the bitstream level is the basis for digital preservation. It covers e. g.

Checking checksums of transferred files upon receiving them (or generating file checksums) and conducting regular fixity checks
Redundant storage of data
Generating backups (e. g. offline backups of the underlying repository database)
Strategies for updating storage media (according to e. g. server lifetime)

Preservation beyond bitstream

Preservation of file content, being able to open and render it correctly in a software is part of logical {% cite lindlar_2020_3672773 %} or technical preservation, also called digital curation. Semantic preservation is concerned with e. g. semantic drift impacting metadata.

Obtaining sufficient rights allowing e. g. format migrations, file repairs and re-use over the long-term like re-publication in other infrastructures
File format identification, based format-specific bit patterns, e. g. via DROID during publication process
File format validation, based on format specifications, e. g. using XML validators during publication process (for validators see also COPTR)
Automated extraction of technical metadata from files (see also COPTR)
Virus scans
Replacing files with problems (e. g. invalid files) as early as possible
Obtaining sufficient metadata
- storing unique file identifiers, machine-readable version information and relations between files
- indexing rights information in a machine usable way
- indexing of identification-, validation-, metadata extraction- and virus check-output
- preserving and updating of descriptive metadata, according to user community needs
Migrating at-risk files, e. g. files with obsolete formats (see also migration tools)
Providing versioning of files and publications and possibility to rollback to earlier versions

Many digital preservation criteria applying to repositories are also present in the certification criteria of the CoreTrustSeal and the nestor seal {% cite coretrustseal_standards_and_certificatio_2022_7051012 harmsen_henk_explanatory_2013 %}.

References

{% bibliography --cited_in_order %}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

25-digital-preservation.md

25-digital-preservation.md

Definition

Digital preservation for researchers

Data selection

Recommended file formats for preservation

Digital preservation for repository operators

Bitstream preservation

Preservation beyond bitstream

References

Files

25-digital-preservation.md

Latest commit

History

25-digital-preservation.md

File metadata and controls

Definition

Digital preservation for researchers

Data selection

Recommended file formats for preservation

Digital preservation for repository operators

Bitstream preservation

Preservation beyond bitstream

References