# Validating files with Siegfried

[Siegfried](https://www.itforarchivists.com/siegfried) is a signature-based file format identification tool, implementing:

- the National Archives UK's PRONOM file format signatures
- freedesktop.org's MIME-info file format signatures
- the Library of Congress's FDD file format signatures (beta).

It is used by digital preservation professionals to validate that the precise format of all stored digital objects, and to link that identification to a central registry of technical information about that format and its dependencies.



### Be Kind


Ideally, don't use my colab notebook (and hence google resources) by make your own. To do this:

1. Copy this notebook to your Google Drive to keep it and save your changes. (File -> Save a Copy in Drive)
2. If there is a problem, try running the notebook in Google Chrome.

## Setup

The next three steps install Siegfried within your Colab Notebook.

In [None]:
!curl -sL "http://keyserver.ubuntu.com/pks/lookup?op=get&search=0x20F802FE798E6857" | gpg --dearmor | sudo tee /usr/share/keyrings/siegfried-archive-keyring.gpg

In [None]:
!echo "deb [signed-by=/usr/share/keyrings/siegfried-archive-keyring.gpg] https://www.itforarchivists.com/ buster main" | sudo tee -a /etc/apt/sources.list.d/siegfried.list

In [None]:
!sudo apt-get update && sudo apt-get install siegfried

## Upload some data

Grab the following files from [Discmaster](http://discmaster.textfiles.com/) - a site that hosts vintage computer files:

- [clinton.gif](http://discmaster.textfiles.com/view/8183/nightowlsharewarenopv10.iso/023a/clinton.gif)
- [X-FILES.AU](http://discmaster.textfiles.com/view/7028/Current%20Shareware%20Volume%205%20(January%201996).ISO/sound/x_fileso.zip/X-FILES.AU)
- [BUSH.FLI](http://discmaster.textfiles.com/view/8197/no.zip/no/027A/BUSHFLIC.ZIP/BUSH.FLI)
- [shellnew.lwp](http://discmaster.textfiles.com/view/4931/Hobby%20PC%2005.iso/ViaVoice/WordPro_1/lotuspro/wordpro/shellnew.lwp)

*Note: you will need to right/CMD click on the file extensions (e.g. .mp4) to grab the files.*

Then in the Colaboratory Notebook sidebar on the left of the screen, select Files (the folder icon), hit the upload icon, and upload your files to your notebook. Once the file appears in the sidebar you are ready to go.

These two step open our text files `machine_text.txt` and `Human_text.txt` and put them into their respective message_text variables for use in various analysis functions below.

In [None]:
!sf BUSH.FLI

In [None]:
!sf clinton.gif

In [None]:
!sf shellnew.lwp

In [None]:
!sf X-FILES.AU

### Challenge Task

Having run the file validation and examined the output, rank the four files by how much at risk of loss you think they are (from most at risk to least at risk).

*Tip: think about how much data Siegfried can find about each file format, and how robust that data looks to you*

### Rights

This notebook was produced by [James Baker](https://www.southampton.ac.uk/people/5yrbp5/doctor-james-baker) for the lecture 'Digital Heritage', given as part of the [HERI6002 'Global Cultural Heritage'](https://www.southampton.ac.uk/courses/modules/heri6002) module in November 2022.

This notebook is released under a [CC-BY](https://creativecommons.org/licenses/by/4.0/deed.en) license.