Skip to content
A repository to capture submissions and share samples during PRONOM Research Week 2019
Branch: master
Clone or download
DavidUnderdown Merge pull request #6 from preservica/tweet-json
Added Tweet JSON file format signature, descriptive notes and sample …
Latest commit c05f669 Nov 20, 2019

PRONOM Research Week 2019

After iPRES 2019 and with an eye on World Digital Preservation Day, we looked at ways we as a community to contribute to digital preservation efforts. We decided to focus on addressing gaps in PRONOM by organizing a PRONOM Research Week. During the week of 18-24 November, volunteers are encouraged to help with PRONOM’s research backlog. You can enhance documentation, supply sample files, or create a signature, among other things.

We’re posting the PRONOM research backlog in a GitHub repository, which will also be the central location to share sample files and submissions. We also kindly ask for people to sign up via this Google spreadsheet, so that people can coordinate. We’ve also posted some resources here, which include blog entries and webinars to help people learn more and get started.


  • PRONOM can be found here:
  • A list of blogs, presentations, and other resources to assist with PRONOM research and file format signature development can be found here
  • If you would like help or advice on conducting your research, crafting your submission, creating a signature, or if you’re having difficulty finding samples, please create a conversation thread on our Google Group


For each submission, please include as much of the following information as possible. It’s okay if you don’t have everything, but please include what you can:

  • Format name - Use the official name where known. Please capitalise each word unless the format name is stylised in some alternative way, e.g. Apple iBook.
  • Version number (where relevant)
  • PUID - if it exists already and you’re providing an enhanced description
  • Extensions - any extensions known to be associated with the format
  • MIME/Media Type - the MIME or Media Type associated with the format. This should be an official Media Type, either registered and listed via the IANA ( or listed in official format documentation produced by the vendor
  • Description - a concise, objective description of the file format.
  • Format type - What type of format is it? (see below)
  • Vendor (if known) - which vendor created the format? Which vendor currently supports it?
  • File format identification signatures (for the brave!)


We like to credit all submissions on our Release Notes page ( You can be credited as an individual or we can credit your institution. We also keep track of international contributions via the contributors map ( so please let us know how you’d prefer to be credited.


PRONOM data is published under the Open Government License 2.0 ( so please ensure you are happy with the terms of this license before submitting any descriptive information.

All samples shared here are available under Creative Commons CC0 unless otherwise stated. Please ensure you have the right to share any samples you wish to submit, and that you are happy to share these under CC0 license (

You may prefer to submit your samples to the larger OPF Format Corpus (

Alternatively, if you need your samples to remain private or are unhappy with the licensing terms of this repository, you can submit them directly to the PRONOM mailbox: - samples submitted via the mailbox will not be shared online and we can provide a formal NDA if required. We will use these solely for the purpose of file format research and signature validation.


Format descriptions must be objective - avoid using phrases like “This is the best format for…” and avoid comparisons with other formats.

Format Types

The current list of format classifications within PRONOM are:

  • Audio
  • Database - the formats of database software, such as MS Access, MySQL
  • Email
  • GIS - Geographic Information System (geospatial data formats)
  • Image (Raster) - images based on pixel grids, such as JPG, GIF, PNG
  • Image (Vector) - images based on mathematical primitives, such as SVG, Adobe Illustrator, CorelDraw, WMF
  • Page Description - the language of printers ( Examples include HP-GL, PDF, PostScript
  • Presentation - such as Powerpoint, Impress, Apple Keynote
  • Spreadsheet
  • Text (Unstructured) - plain text formats with no formal structure
  • Text (Structured) - plain text formats with defined, regular structure
  • Text (Mark-up) - such as XML, SGML, MD
  • Word Processor
  • Video
  • Aggregate - such as zip, WARC, 7z, rar, iso
  • Dataset - structured forms of data
  • Model - 3d formats such as CAD and 3d models
  • Font

Your format may not easily fit into any of the above categories, so feel free to reach out for advice!

Anti-harassment policy

pronom-research-week-2019 is dedicated to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, age, race, or religion. We do not tolerate harassment of participants in any form.

This code of conduct applies to all pronom-research-week-2019 spaces, including Google Docs, Google Groups, our GitHub repository, and e-mails, both online and off. Anyone who violates this code of conduct may be sanctioned or expelled from these spaces at the discretion of the RESPONSE TEAM (can be reached at

Some pronom-research-week-2019 spaces may have additional rules in place, which will be made clearly available to participants. Participants are responsible for knowing and abiding by these rules.

This anti-harassment policy text has been taken and modified from

You can’t perform that action at this time.